Problem:
========
The test now fails with the following trace:
CURRENT_TEST: rpl.rpl_parallel_temptable
--- /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.result
+++ /mariadb/10.4/mysql-test/suite/rpl/r/rpl_parallel_temptable.reject
@@ -194,7 +194,6 @@
30 conservative
31 conservative
32 optimistic
-33 optimistic
Analysis:
=========
The part of test which fails with result content mismatch is given below.
CREATE TEMPORARY TABLE t4 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t4 VALUES (32);
INSERT INTO t4 VALUES (33);
INSERT INTO t1 SELECT a, "optimistic" FROM t4;
slave_parallel_mode=optimistic
The expectation of the above test script is, INSERT FROM SELECT should read both
32, 33 and populate table 't1'. But this expectation fails occasionally.
All three INSERT statements are handed over to three different slave parallel
workers. Temporary tables are not safe for parallel replication. They were
designed to be visible to one thread only, so have no table locking. Thus there
is no protection against two conflicting transactions committing in parallel and
things like that.
So anything that uses temporary tables will be serialized with anything before
it, when using parallel replication by using a "wait_for_prior_commit" function
call. This will ensure that the each transaction is executed sequentially.
But there exists a code path in which the above wait doesn't happen. Because of
this at times INSERT from SELECT doesn't wait for the INSERT (33) to complete
and it completes its executes and enters commit stage. Hence only row 32 is
found in those cases resulting in test failure.
The wait needs to be added within "open_temporary_table" call. The code looks
like this within "open_temporary_table".
Each thread tries to open temporary table in 3 different ways:
case 1: Find a temporary table which is already in use by using
find_temporary_table(tl) && wait_for_prior_commit()
case 2: If above failed then try to look for temporary table which is marked for
free for reuse. This internally calls "wait_for_prior_commit()" if table
is found.
find_and_use_tmp_table(tl, &table)
case 3: If none of the above open a new table handle from table share.
if (!table && (share= find_tmp_table_share(tl)))
{ table= open_temporary_table(share, tl->get_table_name(), true); }
At present the "wait_for_prior_commit" happens only in case 1 & 2.
Fix:
====
On slave add a call for "wait_for_prior_commit" for case 3.
The above wait on slave will solve the issue. A more detailed fix would be to
mark temporary tables as not safe for parallel execution on the master side.
In order to do that, on the master side, mark the Gtid_log_event specific flag
FL_TRANSACTIONAL to be false all the time. So that they are not scheduled
parallely.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.
Do not register intermediate tables created by inplace ALTER TABLE in
THD::temporary_tables.
Regular ALTER TABLE doesn't create .frm for temporary and discoverable
tables anymore. For inplace ALTER TABLE moved .frm creation to
create_table_for_inplace_alter().
Removed open_in_engine argument of create_and_open_tmp_table() and
open_temporary_table(): it became unused after this patch.
Part of MDEV-17805 - Remove InnoDB cache for temporary tables.
CREATE TEMPORARY TABLE locks SE plugin 6 times. 5 of these locks are
released by the end of the statement. And only 1 acquired by
init_from_binary_frm_image() / plugin_lock() remains.
The lock removed in this patch was clearly redundant.
Part of MDEV-17805 - Remove InnoDB cache for temporary tables.
This was caused by a combination of factors:
* MyISAM/Aria temporary tables historically never saved the state
to disk (MYI/MAI), because the state never needed to persist
* certain ALTER TABLE operations modify the original TABLE structure
and if they fail, the original table has to be reopened to
revert all changes (m_needs_reopen=1)
as a result, when ALTER fails and MyISAM/Aria temp table gets reopened,
it reads the stale state from the disk.
As a fix, MyISAM/Aria tables now *always* write the state to disk
on close, *unless* HA_EXTRA_PREPARE_FOR_DROP was done first. And
the server now always does HA_EXTRA_PREPARE_FOR_DROP before dropping
a temporary table.
THD::close_temporary_tables(): Revert the change.
ha_innobase::delete_table(): Move the work-around inside
a debug assertion, and check thd_kill_level() instead of thd_killed(),
because the latter would not hold for KILL_CONNECTION.
THD::close_temporary_tables(): Assign lex->sql_command so that
the debug assertion will not fail in ha_innobase::delete_table().
Alternatively, we could ensure that thd_killed() holds inside
ha_innobase::delete_table().
There should be no impact for the non-debug build. The thd_sql_command()
inside ha_innobase::delete_table() only affects the treatment of
persistent FOREIGN KEY metadata. There is no persistent metadata
nor foreign key constraints for temporary tables.
No test case was added, because the failure is nondeterministic.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
Other changes done to get this to work:
- Added 'internal_tables' to TABLE object to list which sequence tables
is needed to use the table.
- Mark any expression using DEFAULT() with LEX->default_used.
This is needed when deciding if we should open internal sequence
tables when a table is opened (we don't need to open sequence tables
if the main table is only used with SELECT).
- Create_and_open_temporary_table() can now also open all internal
sequence tables.
- Added option MYSQL_LOCK_USE_MALLOC to mysql_lock_tables()
to force memory allocation to be used with malloc instead of
memroot.
- Added flag to MYSQL_LOCK to remember if allocation was done with
malloc or memroot (makes code simpler and safer).
- init_one_table_for_prelocking() now takes argument for what lock to
use instead of it's a routine or something else.
- Renamed prelocking placeholders to make them more understandable as
they are now used in more code.
- Changed test in check_lock_and_start_stmt() if found table has correct
locks. The old test didn't work for tables that has lock
TL_WRITE_ALLOW_WRITE, which is what sequence tables are using.
- Added VCOL_NOT_VIRTUAL option to ensure that sequence functions can't
be used with virtual columns
- More sequence tests
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
- Added TABLE_SHARE->not_usable_by_query_cache
- Moved TABLE->no_replicate to TABLE_SHARE->no_replicate as it's same for
all TABLE instances
- Renamed TABLE_SHARE->cached_row_logging_check to can_do_row_logging
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
Working features:
CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name
[ INCREMENT [ BY | = ] increment ]
[ MINVALUE [=] minvalue | NO MINVALUE ]
[ MAXVALUE [=] maxvalue | NO MAXVALUE ]
[ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ]
ENGINE=xxx COMMENT=".."
SELECT NEXT VALUE FOR sequence_name;
SELECT NEXTVAL(sequence_name);
SELECT PREVIOUS VALUE FOR sequence_name;
SELECT LASTVAL(sequence_name);
SHOW CREATE SEQUENCE sequence_name;
SHOW CREATE TABLE sequence_name;
CREATE TABLE sequence-structure ... SEQUENCE=1
ALTER TABLE sequence RENAME TO sequence2;
RENAME TABLE sequence TO sequence2;
DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names
Missing features
- SETVAL(value,sequence_name), to be used with replication.
- Check replication, including checking that sequence tables are marked
not transactional.
- Check that a commit happens for NEXT VALUE that changes table data (may
already work)
- ALTER SEQUENCE. ANSI SQL version of setval.
- Share identical sequence entries to not add things twice to table list.
- testing insert/delete/update/truncate/load data
- Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence)
- Write documentation for NEXT VALUE / PREVIOUS_VALUE
- NEXTVAL in DEFAULT
- Ensure that NEXTVAL in DEFAULT uses database from base table
- Two NEXTVAL for same row should give same answer.
- Oracle syntax sequence_table.nextval, without any FOR or FROM.
- Sequence tables are treated as 'not read constant tables' by SELECT; Would
be better if we would have a separate list for sequence tables so that
select doesn't know about them, except if refereed to with FROM.
Other things done:
- Improved output for safemalloc backtrack
- frm_type_enum changed to Table_type
- Removed lex->is_view and replaced with lex->table_type. This allows
use to more easy check if item is view, sequence or table.
- Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers
that want's to support sequences
- Added handler calls:
- engine_name(), to simplify getting engine name for partition and sequences
- update_first_row(), to be able to do efficient sequence implementations.
- Made binlog_log_row() global to be able to call it from ha_sequence.cc
- Added handler variable: row_already_logged, to be able to flag that the
changed row is already logging to replication log.
- Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify
deny_updates_if_read_only_option()
- Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy
- Moved code for add_table_options() out from sql_show.cc::show_create_table()
- Added String::append_longlong() and used it in sql_show.cc to simplify code.
- Added extra option to dd_frm_type() and ha_table_exists to indicate if
the table is a sequence. Needed by DROP SQUENCE to not drop a table.
Temporary table being created by outer statement
should not be visible to inner statement. And if
inner statement creates a table with same name.
The whole statement should fail with
ER_TABLE_EXISTS_ERROR.
Implemented by temporarily de-linking the TABLE_SHARE
being created by outer statement so that it remains
hidden to the inner statement.
mysqld maintains a list of TABLE objects for all temporary
tables created within a session in THD. Here each table is
represented by a TABLE object.
A query referencing a particular temporary table for more
than once, however, failed with ER_CANT_REOPEN_TABLE error
because a TABLE_SHARE was allocate together with the TABLE,
so temporary tables always had only one TABLE per TABLE_SHARE.
This patch lift this restriction by separating TABLE and
TABLE_SHARE objects and storing TABLE_SHAREs for temporary
tables in a list in THD, and TABLEs in a list within their
respective TABLE_SHAREs.