externally stored columns.
innodb-zip.test: Correct the test case. Without the fixes, the test
would fail, because the BLOB would be prepended with a 768-byte prefix
of the data.
row_upd_index_replace_new_col_vals_index_pos(),
row_upd_index_replace_new_col_vals(): Use only one "heap"
parameter that must be non-NULL. When fetching externally
stored columns, use upd_field_t::orig_len.
upd_get_field_by_field_no(): New accessor function, for retrieving
an field from an update vector by field_no.
row_upd_index_replace_new_col_val(): New function, for replacing the
value from an update vector. This used to be duplicated code in
row_upd_index_replace_new_col_vals_index_pos() and
row_upd_index_replace_new_col_vals().
variable innodb_file_format. Implement file format version stamping of
*.ibd files and SYS_TABLES.TYPE.
This change breaks introduces an incompatible change for for
compressed tables. We can do this, as we have not released yet.
innodb-zip.test: Add tests for stricter KEY_BLOCK_SIZE and ROW_FORMAT
checks.
DICT_TF_COMPRESSED_MASK, DICT_TF_COMPRESSED_SHIFT: Replace with
DICT_TF_ZSSIZE_MASK, DICT_TF_ZSSIZE_SHIFT.
DICT_TF_FORMAT_MASK, DICT_TF_FORMAT_SHIFT, DICT_TF_FORMAT_51,
DICT_TF_FORMAT_ZIP: File format version, stored in table->flags,
in the .ibd file header, and in SYS_TABLES.TYPE.
dict_create_sys_tables_tuple(): Write the table flags to SYS_TABLES.TYPE
if the format is at least DICT_TF_FORMAT_ZIP. For old formats
(DICT_TF_FORMAT_51), write DICT_TABLE_ORDINARY as the table type.
DB_TABLE_ZIP_NO_IBD: Remove the error code. The error handling is done
in ha_innodb.cc; as a failsafe measure, dict_build_table_def_step() will
silently clear the compression and format flags instead of returning this
error.
dict_mem_table_create(): Assert that no extra bits are set in the flags.
dict_sys_tables_get_zip_size(): Rename to dict_sys_tables_get_flags().
Check all flag bits, and return ULINT_UNDEFINED if the combination is
unsupported.
dict_boot(): Document the SYS_TABLES columns N_COLS and TYPE.
dict_table_get_format(), dict_table_set_format(),
dict_table_flags_to_zip_size(): New accessors to table->flags.
dtuple_convert_big_rec(): Introduce the auxiliary variables
local_len, local_prefix_len. Store a 768-byte prefix locally
if the file format is less than DICT_TF_FORMAT_ZIP.
dtuple_convert_back_big_rec(): Restore the columns.
srv_file_format: New variable: innodb_file_format.
fil_create_new_single_table_tablespace(): Replace the parameter zip_size
with table->flags.
fil_open_single_table_tablespace(): Replace the parameter zip_size_in_k
with table->flags. Check the flags.
fil_space_struct, fil_space_create(), fil_op_write_log():
Replace zip_size with flags.
fil_node_open_file(): Note a TODO item for InnoDB Hot Backup.
Check that the tablespace flags match.
fil_space_get_zip_size(): Rename to fil_space_get_flags(). Add a
wrapper for fil_space_get_zip_size().
fsp_header_get_flags(): New function.
fsp_header_init_fields(): Replace zip_size with flags.
FSP_SPACE_FLAGS: New name for the tablespace flags. This field used
to be called FSP_PAGE_ZIP_SIZE, or FSP_LOWEST_NO_WRITE. It has always
been written as 0 in MySQL/InnoDB versions 4.1 to 5.1.
MLOG_ZIP_FILE_CREATE: Rename to MLOG_FILE_CREATE2. Add a 32-bit
parameter for the tablespace flags.
ha_innobase::create(): Check the table attributes ROW_FORMAT and
KEY_BLOCK_SIZE. Issue errors if they are inappropriate, or warnings
if the inherited attributes (in ALTER TABLE) will be ignored.
PAGE_ZIP_MIN_SIZE_SHIFT: New constant: the 2-logarithm of PAGE_ZIP_MIN_SIZE.
lock_get_table(), locks_row_eq_lock(), buf_page_get_mutex(): Add return
after ut_error. On Windows, ut_error is not declared as "noreturn".
Add explicit type casts when assigning ulint to byte to get rid of
"possible loss of precision" warnings.
struct i_s_table_cache_struct: Declare rows_used, rows_allocd as ulint
instead of ullint. 32 bits should be enough.
fill_innodb_trx_from_cache(), i_s_zip_fill_low(): Cast 64-bit unsigned
integers to longlong when calling Field::store(longlong, bool is_unsigned).
Otherwise, the compiler would implicitly convert them to double and
invoke Field::store(double) instead.
recv_truncate_group(), recv_copy_group(), recv_calc_lsn_on_data_add():
Cast ib_uint64_t expressions to ulint to get rid of "possible loss of
precision" warnings. (There should not be any loss of precision in
these cases.)
log_close(), log_checkpoint_margin(): Declare some variables as ib_uint64_t
instead of ulint, so that there won't be any potential loss of precision.
mach_write_ull(): Cast the second argument of mach_write_to_4() to ulint.
OS_FILE_FROM_FD(): Cast the return value of _get_osfhandle() to HANDLE.
row_merge_dict_table_get_index(): Cast the parameter of mem_free() to (void*)
in order to get rid of the bogus MSVC warning C4090, which has been reported
as MSVC bug 101661:
<http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=101661>
row_mysql_read_blob_ref(): To get rid of a bogus MSVC warning C4090,
drop a const qualifier.
suggested by Heikki, because it breaks row_vers_impl_x_locked_off_kernel();
see Mantis issue #10.
However, now that Heikki's fix has been removed, the code may break elsewhere
when it tries to dereference half-freed or completely freed externally
stored columns.
about undo_rec possibly being uninitialized. When trx_undo_get_undo_rec()
leaves undo_rec uninitialized, both functions will return DB_MISSING_HISTORY
without dereferencing undo_rec.
This closes Mantis issue #7.
in r2276. Now the following symbols will be exported when InnoDB is built
as a dynamic plugin:
* the virtual method pointer table of class ha_innodb
* the three variables that MySQL will reference when linking at runtime:
_mysql_plugin_declarations_
_mysql_plugin_interface_version_
_mysql_sizeof_struct_st_plugin_
Furthermore, the following symbols are weak globals, to allow us to access
the built-in InnoDB in the mysqld executable, in case it contains a statically
linked InnoDB:
builtin_innobase_plugin
innodb_hton_ptr
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
btr_cur_pessimistic_update(): Note why the externally stored columns
of a record on a latched page cannot have been purged.
trx_undo_get_undo_rec(): Clarify that the stack of versions is locked
all the way down to the purge view.
trx_undo_prev_version_build(): Set *old_vers = NULL also when the record
could have been purged already. Add some clarifying comments.
creating indexes. Lock the user table inside the user transaction.
enum trx_dict_op: Remove TRX_OP_INDEX_MAY_WAIT.
ha_innobase::add_index(): Lock the user tables within prebuilt->trx.
Commit the data dictionary transaction before creating indexes.
ha_innobase::final_drop_index(): Lock the user table within prebuilt->trx.
to the undo log, also store the original length of the column, so that the
changes will be correctly undone in transaction rollback or when fetching
previous versions of the row.
innodb-zip.test: New file, for tests of the compression.
upd_field_t: Add orig_len, the original length of new_val.
btr_push_update_extern_fields(): Restore the original prefix of the column.
Add the parameter heap where memory will be allocated if necessary.
trx_undo_rec_get_col_val(): Add the output parameter orig_len.
trx_undo_page_report_modify_ext(): New function: Write an externally
stored column to the undo log. This is only called from
trx_undo_page_report_modify(), and this is the only caller of
trx_undo_page_fetch_ext().
trx_undo_update_rec_get_update(): Read the original length of the column
prefix to upd_field->orig_len.
row_build(), row_upd_index_replace_new_col_vals_index_pos(), and
row_upd_index_replace_new_col_vals() are safe.
btr_cur_optimistic_update(), btr_cur_pessimistic_update(): Note that
the B-tree page of the clustered index record is latched in mtr.
trx_undo_prev_version_build(): Add const qualifiers to index_rec
and rec. Note that the page of index_rec is latched in index_mtr.
row_vers_impl_x_locked_off_kernel(), row_vers_old_has_index_entry():
Note that the stack of versions is locked by mtr and thus it is
safe to call row_build().
Change the format of TRX_IDs in INFORMATION_SCHEMA tables from DEC to
HEX.
The current TRX_IDs are hard to remember and track down: 426355, 428466,
428566, etc.
In HEX:
* there are less "digits", the strings are shorter;
* since there are 16 instead of 10 "digits", the chance of having
repeating ones are smaller.
The above look like 68173, 689B2, 68A16 in HEX.
Discussed with: Ken
Approved by: Heikki (via IM)
btr_copy_blob_prefix(), btr_copy_externally_stored_field_prefix_low():
Document the return value as "number of bytes written", not "bytes written".
trx_undo_page_fetch_ext(): Explain the assertion ut_a(ext_len).
row_build_index_entry(): Explain the assertion ut_a(!ext).
Bugfix: Lock the MySQL mutex LOCK_thread_count before accessing
trx->mysql_query_str to avoid race conditions where MySQL sets it to
NULL after we have checked that it is not NULL and before we access it.
Approved by: Marko
enough prefixes of externally stored columns, so that purge will not have
to dereference any BLOB pointers, which may be invalid. This will not be
necessary for logging inserts, because inserts are no-ops in purge, and
the record will remain locked during transaction rollback.
TODO: in dict_build_table_def_step() or dict_build_index_def_step(),
prevent the creation of tables with too many columns for which a
prefix index is defined. This is because there is a size limit of undo
log records, and for each prefix-indexed column, the log must store
REC_MAX_INDEX_COL_LEN + BTR_EXTERN_FIELD_REF_SIZE bytes.
trx_undo_page_report_insert(): Assert that the index is clustered.
trx_undo_page_fetch_ext(): New function, for fetching the BLOB prefix
in trx_undo_page_report_modify().
trx_undo_page_report_modify(): Write long enough prefixes of the externally
stored columns to the undo log.
trx_undo_rec_get_partial_row(): Remove the parameter "ext". Assert that
the undo log contains long enough prefixes of the externally stored columns.
purge_node_t: Remove the field "ext".
* Change terminology:
wait lock -> requested lock
waited lock -> blocking lock
new: requesting transaction (the trx what owns the requested lock)
new: blocking transaction (the trx that owns the blocking lock)
* Add transaction ids to INFORMATION_SCHEMA.INNODB_LOCK_WAITS. This is
somewhat redundant because transaction ids can be found in INNODB_LOCKS
(which can be joined with INNODB_LOCK_WAITS) but would help users to
write shorter joins (one table less) in some cases where they want to
find which transaction is blocking which.
Suggested by: Ken
Approved by: Heikki
Only add indexed BLOBs to row_ext.
trx_undo_rec_get_partial_row(): Move the BLOB fetching to row_ext_create().
row_build(): Pass only those BLOBs to row_ext_create() that are referenced by
ordering columns of some indexes, similar to trx_undo_rec_get_partial_row().
row_ext_create(): Add the parameter "tuple". Move the implementation
from row0ext.ic to row0ext.c.
row_ext_lookup_ith(), row_ext_lookup(): Return a const pointer. Remove
the parameters "field" and "f_len". Make the row_ext_t* parameter const.
row_ext_t: Remove the field zip_size.
field_ref_zero[]: Declare in btr0types.h instead of btr0cur.h.
row_ext_lookup_low(): Rename to row_ext_cache_fill() and change the
signature.
only for those externally stored columns that occur in the ordering columns
of indexes. Prefetch the prefixes of those columns, because the clustered
index record and the BLOBs may have been deleted by the time when the
purge thread needs to read the BLOB prefixes.
row_ext_create(): Add the debug assertion ut_ad(ut_is_2pow(zip_size)).
column prefix of an externally stored column.
row_upd_ext_fetch(): New function.
row_upd_index_replace_new_col_vals(),
row_upd_index_replace_new_col_vals_index_pos(): Fetch prefixes of
externally stored columns when they are needed for column prefix
indexes. For memory allocation, add the parameter ext_heap. Avoid
repeating the inner loop after finding a matching upd_field->field_no.
set the "external storage" flag. When parsing the undo log, do not
misinterpret a SQL NULL column for externally stored.
These bugs were spotted by Heikki and Sunny.
trx_undo_page_report_modify(): Set the UNIV_EXTERN_STORAGE_FIELD flag
when needed.
trx_undo_rec_get_partial_row(): Check for len == UNIV_SQL_NULL.
Implement a limit on the memory used by the INNODB_TRX, INNODB_LOCKS and
INNODB_LOCK_WAITS tables. The maximum allowed memory is defined with the
macro TRX_I_S_MEM_LIMIT.
Approved by: Marko (via IM)
Add the query in information_schema.innodb_trx.trx_query. Add it even
though it is available in information_schema.processlist.info to make
inconsistencies between those two tables obvious.
It is rather confusting to see a transaction shown in innodb_trx and
innodb_locks that holds a lock on one table and the corresponding query
in processlist executing INSERT on another table. We do not want users
to contact us asking to explain that. It is caused by the fact that the
data for innodb_* tables and processlist is fetched at different time.
Approved by: Marko
Introduce a generic soultion to the common problem that MySQL do not add
functions needed by us in a reasonable time.
Start with a function that retrieves THD::thread_id, this is needed for
the information_schema.innodb_trx.mysql_thread_id column.
Approved by: Marko
recovered transactions from new ones. Until r1594, they were distinguished
by trx->sess == NULL.
trx_t: Add the bitfield is_recovered.
trx_lists_init_at_db_start(): Set trx->is_recovered.
trx_create(): Initialize trx->is_recovered = 0.
trx_print(): Display information about trx->is_recovered.
trx_rollback_or_clean_all_without_sess(): Skip new transactions.
Protect all accesses of trx_sys->trx_list with kernel_mutex.
trx_roll_crash_recv_trx, trx_roll_max_undo_no, trx_roll_progress_printed_pct:
Made these variables static.
Add innodb_locks.lock_data column and some relevant tests.
For record locks this column represents the ordering fields of the
locked row in a human readable, SQL-valid, format.
Approved by: Marko
enum trx_dict_op: dictionary operation modes
trx_get_dict_operation(), trx_set_dict_operation(): Accessors for
trx->dict_operation.
lock_table_enqueue_waiting(), lock_rec_enqueue_waiting(): Do not complain
about lock waits if the dictionary mode is TRX_DICT_OP_INDEX_MAY_WAIT.
row_merge_lock_table(): Remove the work-around for avoiding the warning
in lock_table_enqueue_waiting().
trx_undo_mark_as_dict_operation(): Do not write trx->table_id to the
undo log unless the dict_operation is TRX_DICT_OP_TABLE.
ha_innobase::add_index(): Set the dict_operation mode initially to
TRX_DICT_OP_INDEX_MAY_WAIT, then lock the table exclusively, and set the
mode to TRX_DICT_OP_INDEX, and optionally to TRX_DICT_OP_TABLE when
creating a temporary table.
fix the bugs introduced in r1591.
row_rec_to_index_entry_low(): Clear "n_ext". Do not allow it to be NULL.
Add const qualifier to dict_index_t*.
row_rec_to_index_entry(): Add the parameters "offsets" and "n_ext".
btr_cur_optimistic_update(): Add an assertion that there are no externally
stored columns. Remove the unreachable call to btr_cur_unmark_extern_fields()
and the preceding unnecessary call to rec_get_offsets().
btr_push_update_extern_fields(): Remove the parameters index, offsets.
Only report the additional externally stored columns of the update vector.
row_build(), trx_undo_rec_get_partial_row(): Flag externally stored columns
also with dfield_set_ext().
rec_copy_prefix_to_dtuple(): Assert that there are no externally stored
columns in the prefix.
row_build_row_ref(): Note and assert that the index is a secondary index,
and assert that there are no externally stored columns.
row_build_row_ref_fast(): Assert that there are no externally stored columns.
rec_offs_get_n_alloc(): Expose the function.
row_build_row_ref_in_tuple(): Assert that there are no externally stored
columns in a record of a secondary index.
row_build_row_ref_from_row(): Assert that there are no externally stored
columns.
row_upd_check_references_constraints(): Add the parameter offsets, to
avoid a redundant call to rec_get_offsets().
row_upd_del_mark_clust_rec(): Add the parameter offsets. Remove
duplicated code.
row_ins_index_entry_set_vals(): Copy the external storage flag.
sel_pop_prefetched_row(): Assert that there are no externally stored
columns.
row_scan_and_check_index(): Copy offsets to a temporary heap across
the invocation of row_rec_to_index_entry().
and use it for flagging externally stored columns in the data tuple.
The data tuple contains the same columns as the clustered index record,
but in a different order. This error was introduced in r1591.
TODO: the assertion ut_ad(!dfield_is_ext()) may fail in
btr_cur_pessimistic_update().
offsets_[] arrays, as suggested by Vasil.
rec_offs_set_n_alloc(): Declare as a public function. Assert that
n_alloc > REC_OFFS_HEADER_SIZE.
rec_offs_get_n_alloc(): Assert that n_alloc > REC_OFFS_HEADER_SIZE.
Copy any data (currently table name and table index) that may be
destroyed after releasing the kernel mutex into internal cache's
storage.
This is done in efficient manner using ha_storage type and a given
string is copied only once into the cache's storage. Later additions of
the same string use the already stored string, thus allocating memory
only once per unique string.
Approved by: Marko
Use the newly introduced mem_alloc2() to use the memory that has been
allocated in addition to the requested memory. This is done in order to
avoid wasting memory.
Do not calculate the sizes and offsets of the chunks in advance in
table_cache_init() because it is unknown how much bytes will actually
be allocated by mem_alloc2(). Rather calculate these on the run: after
each chunk is allocated set its size and the offset of the next chunk.
Similar patch approved by: Marko
innodb_lock_waits. See
https://svn.innodb.com/innobase/InformationSchema/TransactionsAndLocks
for design notes.
Things that need to be resolved before this goes live:
* MySQL must add thd_get_thread_id() function to their code
http://bugs.mysql.com/30930
* Allocate memory from mem_heap instead of using mem_alloc()
* Copy table name and index name into the cache because they may be
freed later which will result in referencing freed memory
Approved by: Marko