We introduce a NO_ROLLBACK flag for InnoDB tables. This flag only works
for tables that have a single index. Apart from undo logging, this flag
will also prevent locking and the assignment of DB_ROW_ID or DB_TRX_ID,
and imply READ UNCOMMITTED isolation. It is assumed that the SQL layer
is guaranteeing mutual exclusion.
After the initial insert of the single record during CREATE SEQUENCE,
InnoDB will be updating the single record in-place. This is crash-safe
thanks to the redo log. (That is, after a crash after CREATE SEQUENCE
was committed, the effect of sequence operations will be observable
fully or not at all.)
When it comes to the durability of the updates of SEQUENCE in
InnoDB, there is a clear analogy to MDEV-6076 Persistent AUTO_INCREMENT.
The updates would be made persistent by the InnoDB redo log flush
at transaction commit or rollback (or XA PREPARE), provided that
innodb_log_flush_at_trx_commit=1.
Similar to AUTO_INCREMENT, it is possible that the update of a SEQUENCE
in a middle of transaction becomes durable before the COMMIT/ROLLBACK of
the transaction, in case the InnoDB redo log is being flushed as a result
of the a commit or rollback of some other transaction, or as a result of
a redo log checkpoint that can be initiated at any time by operations that
are writing redo log.
dict_table_t::no_rollback(): Check if the table does not support rollback.
BTR_NO_ROLLBACK: Logging and locking flags for no_rollback() tables.
DICT_TF_BITS: Add the NO_ROLLBACK flag.
row_ins_step(): Assign 0 to DB_ROW_ID and DB_TRX_ID, and skip
any locking for no-rollback tables. There will be only a single row
in no-rollback tables (or there must be a proper PRIMARY KEY).
row_search_mvcc(): Execute the READ UNCOMMITTED code path for
no-rollback tables.
ha_innobase::external_lock(), ha_innobase::store_lock():
Block CREATE/DROP SEQUENCE in innodb_read_only mode.
This probably has no effect for CREATE SEQUENCE, because already
ha_innobase::create() should have been called (and refused)
before external_lock() or store_lock() is called.
ha_innobase::store_lock(): For CREATE SEQUENCE, do not acquire any
InnoDB locks, even though TL_WRITE is being requested. (This is just
a performance optimization.)
innobase_copy_frm_flags_from_create_info(), row_drop_table_for_mysql():
Disable persistent statistics for no_rollback tables.
Working features:
CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name
[ INCREMENT [ BY | = ] increment ]
[ MINVALUE [=] minvalue | NO MINVALUE ]
[ MAXVALUE [=] maxvalue | NO MAXVALUE ]
[ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ]
ENGINE=xxx COMMENT=".."
SELECT NEXT VALUE FOR sequence_name;
SELECT NEXTVAL(sequence_name);
SELECT PREVIOUS VALUE FOR sequence_name;
SELECT LASTVAL(sequence_name);
SHOW CREATE SEQUENCE sequence_name;
SHOW CREATE TABLE sequence_name;
CREATE TABLE sequence-structure ... SEQUENCE=1
ALTER TABLE sequence RENAME TO sequence2;
RENAME TABLE sequence TO sequence2;
DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names
Missing features
- SETVAL(value,sequence_name), to be used with replication.
- Check replication, including checking that sequence tables are marked
not transactional.
- Check that a commit happens for NEXT VALUE that changes table data (may
already work)
- ALTER SEQUENCE. ANSI SQL version of setval.
- Share identical sequence entries to not add things twice to table list.
- testing insert/delete/update/truncate/load data
- Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence)
- Write documentation for NEXT VALUE / PREVIOUS_VALUE
- NEXTVAL in DEFAULT
- Ensure that NEXTVAL in DEFAULT uses database from base table
- Two NEXTVAL for same row should give same answer.
- Oracle syntax sequence_table.nextval, without any FOR or FROM.
- Sequence tables are treated as 'not read constant tables' by SELECT; Would
be better if we would have a separate list for sequence tables so that
select doesn't know about them, except if refereed to with FROM.
Other things done:
- Improved output for safemalloc backtrack
- frm_type_enum changed to Table_type
- Removed lex->is_view and replaced with lex->table_type. This allows
use to more easy check if item is view, sequence or table.
- Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers
that want's to support sequences
- Added handler calls:
- engine_name(), to simplify getting engine name for partition and sequences
- update_first_row(), to be able to do efficient sequence implementations.
- Made binlog_log_row() global to be able to call it from ha_sequence.cc
- Added handler variable: row_already_logged, to be able to flag that the
changed row is already logging to replication log.
- Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify
deny_updates_if_read_only_option()
- Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy
- Moved code for add_table_options() out from sql_show.cc::show_create_table()
- Added String::append_longlong() and used it in sql_show.cc to simplify code.
- Added extra option to dd_frm_type() and ha_table_exists to indicate if
the table is a sequence. Needed by DROP SQUENCE to not drop a table.
use CMAKE_CXX_STANDARD to set C++11 flags with CMake 3.1+ (apples flags are somehow different from standard clang)
port htonbe16/32/64 macros for rocksdb
use reinterpret_cast<size_t> to cast macOS's pthread_t (pointer type) to size_t , for rocksdb
ha_innobase::defragment_table(): Skip corrupted indexes and
FULLTEXT INDEX. In InnoDB, FULLTEXT INDEX is implemented with
auxiliary tables. We will not defragment them on OPTIMIZE TABLE.
buf_dblwr_create(): Remove a bogus check for the buffer pool size.
Theoretically, there is no problem if the doublewrite buffer is
larger than the buffer pool. It could only cause trouble on crash
recovery, and on recovery the doublewrite buffer is read to a buffer
that is allocated outside of the buffer pool. Moreover, this check
was only performed when the database was initialized for the first
time.
On a normal startup, buf_dblwr_init() would not enforce any
rule on the innodb_buffer_pool_size.
Furthermore, in case of an error, commit the mini-transaction in order
to avoid an assertion failure on shutdown. Yes, this will leave the
doublewrite buffer in a corrupted stage, but the doublewrite buffer
should only be initialized when the data files are being initialized
from the scratch in the first place.
Fixes compile error that highlights problem:
/source/storage/innobase/fil/fil0crypt.cc: In function 'void fil_crypt_rotate_page(const key_state_t*, rotate_thread_t*)':
/source/storage/innobase/fil/fil0crypt.cc:1770:15: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
if (space == TRX_SYS_SPACE && offset == TRX_SYS_PAGE_NO) {
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Also, some MDEV-11738/MDEV-11581 post-push fixes.
In MariaDB 10.1, there is no fil_space_t::is_being_truncated field,
and the predicates fil_space_t::stop_new_ops and fil_space_t::is_stopping()
are interchangeable. I requested the fil_space_t::is_stopping() to be added
in the review, but some added checks for fil_space_t::stop_new_ops were
not replaced with calls to fil_space_t::is_stopping().
buf_page_decrypt_after_read(): In this low-level I/O operation, we must
look up the tablespace if it exists, even though future I/O operations
have been blocked on it due to a pending DDL operation, such as DROP TABLE
or TRUNCATE TABLE or other table-rebuilding operations (ALTER, OPTIMIZE).
Pass a parameter to fil_space_acquire_low() telling that we are performing
a low-level I/O operation and the fil_space_t::is_stopping() status should
be ignored.
remove hard-coded paths (that assumed we're in a source tree)
remove various shell/perl/awk/whatsnot scripts, use mysqltest and perl
remove numerous --exec /some/unix/tool commands, use mysqltest and perl
namely, restart_mysqld_with_option.inc and kill_and_restart_mysqld.inc -
use restart_mysqld.inc instead.
Also remove innodb_wl6501_crash_stripped.inc that wasn't used anywhere.
InnoDB divides the allocation of undo logs into rollback segments.
The DB_ROLL_PTR system column of clustered indexes can address up to
128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only
created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin
for MySQL 5.1, all 128 rollback segments were created.
MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs.
On upgrade, unless a slow shutdown (innodb_fast_shutdown=0)
was performed on the old server instance, these rollback segments
could be in use by transactions that are in XA PREPARE state or
transactions that were left behind by a server kill followed by a
normal shutdown immediately after restart.
Persistent tables cannot refer to temporary undo logs or vice versa.
Therefore, we should keep two distinct sets of rollback segments:
one for persistent tables and another for temporary tables. In this way,
all 128 rollback segments will be available for both types of tables,
which could improve performance. Also, MariaDB 10.2 will remain more
compatible than MySQL 5.7 with data files from earlier versions of
MySQL or MariaDB.
trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary
rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will
be solely for persistent undo logs.
srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS.
srv_available_undo_logs: Change the type to ulong.
trx_rseg_get_on_id(): Remove. Instead, let the callers refer to
trx_sys directly.
trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters.
These functions only deal with persistent undo logs.
trx_temp_rseg_create(): New function, to create all temporary rollback
segments at server startup.
trx_rseg_t::is_persistent(): Determine if the rollback segment is for
persistent tables.
trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on
context (such as table handle) whether the DB_ROLL_PTR is referring to
a persistent undo log.
trx_sys_create_rsegs(): Remove all parameters, which were always passed
as global variables. Instead, modify the global variables directly.
enum trx_rseg_type_t: Remove.
trx_t::get_temp_rseg(): A method to ensure that a temporary
rollback segment has been assigned for the transaction.
trx_t::assign_temp_rseg(): Replaces trx_assign_rseg().
trx_purge_free_segment(), trx_purge_truncate_rseg_history():
Remove the redundant variable noredo=false.
Temporary undo logs are discarded immediately at transaction commit
or rollback, not lazily by purge.
trx_purge_mark_undo_for_truncate(): Remove references to the
temporary rollback segments.
trx_purge_mark_undo_for_truncate(): Remove a check for temporary
rollback segments. Only the dedicated persistent undo log tablespaces
can be truncated.
trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the
parameter is_temp.
trx_rseg_mem_restore(): Split from trx_rseg_mem_create().
Initialize the undo log and the rollback segment from the file
data structures.
trx_sysf_get_n_rseg_slots(): Renamed from
trx_sysf_used_slots_for_redo_rseg(). Count the persistent
rollback segment headers that have been initialized.
trx_sys_close(): Also free trx_sys->temp_rsegs[].
get_next_redo_rseg(): Merged to trx_assign_rseg_low().
trx_assign_rseg_low(): Remove the parameters and access the
global variables directly. Revert to simple round-robin, now that
the whole trx_sys->rseg_array[] is for persistent undo log again.
get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg().
srv_undo_tablespaces_init(): Remove some parameters and use the
global variables directly. Clarify some error messages.
Adjust the test innodb.log_file. Apparently, before these changes,
InnoDB somehow ignored missing dedicated undo tablespace files that
are pointed by the TRX_SYS header page, possibly losing part of
essential transaction system state.