Added capability to create a trigger associated with several trigger events.
For this goal, the syntax of the CREATE TRIGGER statement was extended
to support the syntax structure { event [ OR ... ] } for the `trigger_event`
clause. Since one trigger will be able to handle several events it should be
provided a way to determine what kind of event is handled on execution of a
trigger. For this goal support of the clauses INSERTING, UPDATING , DELETING
was added by this patch. These clauses can be used inside a trigger body
to detect what kind of trigger action is currently processed using the following
boilerplate:
IF INSERTING THEN ...
ELSIF UPDATING THEN ...
ELSIF DELETING THEN ...
In case one of the clauses INSERTING, UPDATING, DELETING specified in
a trigger's body not matched with a trigger event type, the error
ER_INCOMPATIBLE_EVENT_FLAG is emitted.
After this patch be pushed, one Trigger object will be associated with several
trigger events. It means that the array Table_triggers_list::triggers can
contain several pointers to the same Trigger object in array members
corresponding to different events. Moreover, support of several trigger events
for the same trigger requires that the data members `next` and `action_order`
of the Trigger class be converted to arrays to store relating information per
trigger event base.
Ability to specify the same trigger for different event types results in
necessity to handle invalid cases on execution of the multi-event trigger,
when the OLD or NEW qualifiers doesn't match a current event type against that
the trigger is run. The clause OLD should produce the NULL value for INSERT event,
whereas the clause NEW should produce the NULL value for DELETE event.
Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done by renaming the
original table to temporary name, as a backup and restoring it if the
CREATE fails. When the command is complete and logged the backup
table is deleted.
Atomic replace algorithm
Two DDL chains are used for CREATE OR REPLACE:
ddl_log_state_create (C) and ddl_log_state_rm (D).
1. (C) Log rename of ORIG to TMP table (Rename TMP to original).
2. Rename orignal to TMP.
3. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
4. Do everything with ORIG (like insert data)
5. (D) Log drop of TMP
6. Write query to binlog (this marks (C) to be closed in
case of failure)
7. Execute drop of TMP through (D)
8. Close (C) and (D)
If there is a failure before 6) we revert the changes in (C)
Chain (D) is only executed if 6) succeded (C is closed on
crash recovery).
Foreign key errors will be found at the 1) stage.
Additional notes
- CREATE TABLE without REPLACE and temporary tables is not affected
by this commit.
set @@drop_before_create_or_replace=1 can be used to
get old behaviour where existing tables are dropped
in CREATE OR REPLACE.
- CREATE TABLE is reverted if binlogging the query fails.
- Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
this commit. Conflicting tables marked with this flag will be
deleted with CREATE OR REPLACE.
- Replication execution is not affected by this commit.
- Replication will first drop the conflicting table and then
creating the new one.
- CREATE TABLE .. SELECT XID usage is fixed and now there is no need
to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
do_postlock()). XID is now correctly updated so it disables
DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
final stage when the table is ready. So if we have XID in the
binary log we don't need to drop the table.
- Three variations of CREATE OR REPLACE handled:
1. CREATE OR REPLACE TABLE t1 (..);
2. CREATE OR REPLACE TABLE t1 LIKE t2;
3. CREATE OR REPLACE TABLE t1 SELECT ..;
- Test case uses 6 combinations for engines (aria, aria_notrans,
myisam, ib, lock_tables, expensive_rename) and 2 combinations for
binlog types (row, stmt). Combinations help to check differences
between the results. Error failures are tested for the above three
variations.
- expensive_rename tests CREATE OR REPLACE without atomic
replace. The effect should be the same as with the old behaviour
before this commit.
- Triggers mechanism is unaffected by this change. This is tested in
create_replace.test.
- LOCK TABLES is affected. Lock restoration must be done after new
table is created or TMP is renamed back to ORIG
- Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
checkpoint was not executed before for normal CREATE TABLE but is
executed now.
- CREATE TABLE will now rollback also if writing to the binary
logging failed. See rpl_gtid_strict.test
backup ddl log changes
- In case of a successfull CREATE OR REPLACE we only log
the CREATE event, not the DROP TABLE event of the old table.
ddl_log.cc changes
ddl_log_execute_action() now properly return error conditions.
ddl_log_disable_entry() added to allow one to disable one entry.
The entry on disk is still reserved until ddl_log_complete() is
executed.
On XID usage
Like with all other atomic DDL operations XID is used to avoid
inconsistency between master and slave in the case of a crash after
binary log is written and before ddl_log_state_create is closed. On
recovery XIDs are taken from binary log and corresponding DDL log
events get disabled. That is done by
ddl_log_close_binlogged_events().
On linking two chains together
Chains are executed in the ascending order of entry_pos of execute
entries. But entry_pos assignment order is undefined: it may assign
bigger number for the first chain and then smaller number for the
second chain. So the execution order in that case will be reverse:
second chain will be executed first.
To avoid that we link one chain to another. While the base chain
(ddl_log_state_create) is active the secondary chain
(ddl_log_state_rm) is not executed. That is: only one chain can be
executed in two linked chains.
The interface ddl_log_link_chains() was defined in "MDEV-22166
ddl_log_write_execute_entry() extension".
Atomic info parameters in HA_CREATE_INFO
Many functions in CREATE TABLE pass the same parameters. These
parameters are part of table creation info and should be in
HA_CREATE_INFO (or whatever). Passing parameters via single
structure is much easier for adding new data and
refactoring.
InnoDB changes
Added ha_innobase::can_be_renamed_to_backup() to check if
a table with foreign keys can be renamed.
Aria changes:
- Fixed issue in Aria engine with CREATE + locked tables
that data was not properly commited in some cases in
case of crashes.
Known issues:
- InnoDB tables with foreign key definitions are not fully supported
with atomic create and replace:
- ha_innobase::can_be_renamed_to_backup() can detect some cases
where InnoDB does not support renaming table with foreign key
constraints. In this case MariaDB will drop the old table before
creating the new one.
The detected cases are:
- The new and old table is using the same foreign key constraint
name.
- The old table has self referencing constraints.
- If the old and new table uses the same name for a constraint the
create of the new table will fail. The orignal table will be
restored in this case.
- The above issues will be fixed in a future commit.
- CREATE OR REPLACE TEMPORARY table is not full atomic. Any conflicting
table will always be dropped before creating a new one. (Old behaviour).
There is a need in MDEV-25292 to have both C_ALTER_TABLE and
select_field_count in one call. Semantically creation mode and field
count are two different things. Making creation mode negative
constants and field count positive variable into one parameter seems
to be a lazy hack for not making the second parameter.
select_count does not make sense without alter_info->create_list, so
the natural way is to hold it in Alter_info too. select_count is now
stored in member select_field_count.
Merged and updated by: Monty
The purpose of this commit is to ensure that creation and changes of
temporary tables are properly and predicable logged to the binary
log. It also fixes some bugs where ROW logging was used in MIXED mode,
when STATEMENT would be a better (and expected) choice.
In this comment STATEMENT stands for logging to binary log in
STATEMENT format, MIXED stands for MIXED binlog format and ROW for ROW
binlog format.
New rules for logging of temporary tables
- CREATE of temporary tables are now by default binlogged only if
STATEMENT binlog format is used. If it is binlogged, 1 is stored in
TABLE_SHARE->table_creation_was_logged. The user can change this
behavior by setting create_temporary_table_binlog_formats to
MIXED,STATEMENT in which case the create is logged in statement
format also in MIXED mode (as before).
- Changes to temporary tables are only binlogged if and only if
the CREATE was logged. The logging happens under STATEMENT or MIXED.
If binlog_format=ROW, temporary table changes are not binlogged. A
temporary table that are changed under ROW are marked as 'not up to
date in binlog' and no future row changes are logged. Any usage of
this temporary table will force row logging of other tables in any
future statements using the temporary table to be row logged.
- DROP TEMPORARY is binlogged only of the CREATE was binlogged.
Changes done:
- Row logging is forced for any statement using temporary tables that
are not up to date in the binary log.
(Before the row logging was forced if the user has a temporary table)
- If there is any changes to the temporary table that is not binlogged,
the table is marked as not up to date.
- TABLE_SHARE->table_creation_was_logged has a new definition for
temporary tables:
0 Table creating was not logged to binary log
1 Table creating was logged to binary log and table is up to date.
2 Table creating was logged to binary log but some changes where
not logged to binary log.
Table is not up to date in binary log is defined as value 0 or 2.
- If a multi-table-update or multi-table-delete fails then
all updated temporary tables are marked as not up to date.
- Enforce row logging if the query is using temporary tables
that are not up to date.
Before row logging was enforced if the user had any
temporary tables.
- When dropping temporary tables use IF EXISTS. This ensures
that slave will not stop if it had crashed and lost the
temporary tables.
- Remove comment and version from DROP /*!4000 TEMPORARY.. generated when
a connection closes that has open temporary tables. Added 'generated by
server' at the end of the DROP.
Bugs fixed:
- When using temporary tables with commands that forced row based,
like INSERT INTO temporary_table VALUES (UUID()), this was never
logged which causes the temporary table to be inconsistent on
master and slave.
- Used binlog format is now clearly defined. It is now only depending
on the current binlog_format and the tables used.
Before it was depending on the user had ANY temporary tables and
the state of 'current_stmt_binlog_format' set by previous queries.
This also caused temporary tables to be logged to binary log in
some cases.
- CREATE TABLE t1 LIKE not_logged_temporary_table caused replication
to stop.
- Rename of not binlogged temporary tables where binlogged to binary log
which caused replication to stop.
Changes in behavior:
- By default create_temporary_table_binlog_formats=STATEMENT, which
means that CREATE TEMPORARY is not logged to binary log under MIXED
binary logging. This can be changed by setting
create_temporary_table_binlog_formats to MIXED,STATEMENT.
- Using temporary tables that was not logged to the binary log will
cause any query using them for updating other tables to be logged in
ROW format. Before all queries was logged in ROW format if the user had
any temporary tables, even if they were not used by the query.
- Generated DROP TEMPORARY TABLE is now always using IF EXISTS and
has a "generated by server" comment in the binary log.
The consequences of the above is that manipulations of a lot of rows
through temporary tables will by default be be slower in mixed mode.
For example:
BEGIN;
CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM
large_table1 JOIN large_table2 ON ...;
INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
DROP TEMPORARY TABLE tmp;
COMMIT;
By default this will create a huge entry in the binary log, compared
to just a few hundred bytes in statement mode. However the change in
this commit will make usage of temporary tables more reliable and
predicable and is thus worth it. Using statement mode or
create_temporary_table_binlog_formats can be used to avoid this issue.
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
disable the assert.
also, use the same check for check_that_all_fields_are_given_values()
as it's used in not_null_fields_have_null_values() - to avoid
issuing the same warning twice.
Implementation of this task adds ability to raise the signal with
SQLSTATE '02TRG' from a BEFORE INSERT/UPDATE/DELETE trigger and handles
this signal as an indicator meaning 'to throw away the current row'
on processing the INSERT/UPDATE/DELETE statement. The signal with
SQLSTATE '02TRG' has special meaning only in case it is raised inside
BEFORE triggers, for AFTER trigger's this value of SQLSTATE isn't treated
in any special way. In according with SQL standard, the SQLSTATE class '02'
means NO DATA and sql_errno for this class is set to value
ER_SIGNAL_NOT_FOUND by current implementation of MariaDB server.
Implementation of this task assigns the value ER_SIGNAL_SKIP_ROW_FROM_TRIGGER
to sql_errno in Diagnostics_area in case the signal is raised from a trigger
and SQLSTATE has value '02TRG'.
To catch signal with SQLTSATE '02TRG' and handle it in special way, the methods
Table_triggers_list::process_triggers
select_insert::store_values
select_create::store_values
Rows_log_event::process_triggers
and the overloaded function
fill_record_n_invoke_before_triggers
were extended with extra out parameter for returning the flag whether
to skip the current values being processed by INSERT/UPDATE/DELETE
statement. This extra parameter is passed as nullptr in case of AFTER trigger
and BEFORE trigger this parameter points to a variable to store a marker
whether to skip the current record or store it by calling write_record().
Problem was that in case of INSERT DELAYED thd->query() is
freed before we call trans_rollback where WSREP_DEBUG
could access thd->query() in wsrep_thd_query().
Fix is to reset thd->query() to NULL in delayed_insert
destructor after it is freed. There is already
null guard at wsrep_thd_query().
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
it's incorrect to zero out table->triggers->extra_null_bitmap
before a statement, because if insert uses an explicit field list
and omits a field that has no default value, the field should
get NULL implicitly. So extra_null_bitmap should have 1s for all
fields that have no defaults
* create extra_null_bitmap_init and initialize it as above
* copy extra_null_bitmap_init to extra_null_bitmap for inserts
* still zero out extra_null_bitmap for updates/deletes where
all fields definitely have a value
* make not_null_fields_have_null_values() to send
ER_NO_DEFAULT_FOR_FIELD for fields with no default and no value,
otherwise creation of a trigger with an empty body would change the
error message
system versioned table
For versioned table REPLACE first tries to insert a row, if it gets
duplicate key error and optimization is possible it does UPDATE +
INSERT history. If optimization is not possible it goes normal branch
for UPDATE to history and repeats the cycle of INSERT.
The failure was in normal branch when we tried UPDATE to history but
such history already exists from previous cycles. There is no such
failures in optimized branch because vers_insert_history_row() already
ignores duplicates.
The fix ignores duplicate errors for UPDATE to history and does DELETE
instead.
DELAYED with virtual columns
Segfault was cause by two different copies of same Field instance in
prepared delayed insert. One was made by
Delayed_insert::get_local_table() (see make_new_field()). That copy
went through parse_vcol_defs() and received new vcol_info->expr.
Another one was made by copy_keys_from_share() by this code:
/*
We are using only a prefix of the column as a key:
Create a new field for the key part that matches the index
*/
field= key_part->field=field->make_new_field(root, outparam, 0);
field->field_length= key_part->length;
So, key_part and table got different objects of same field and the
crash was because key_part->field->vcol_info->expr is NULL.
The fix does update_keypart_vcol_info() to update vcol_info->expr in
key_part->field.
Cleanup: memdup_vcol() is static inline instead of macro + check OOM.
The problems were that:
1) resources was freed "asimetric" normal execution in send_eof,
in case of error in destructor.
2) destructor was not called in case of SP for result objects.
(so if the last SP execution ended with error resorces was not
freeded on reinit before execution (cleanup() called before next
execution) and destructor also was not called due to lack of
delete call for the object)
Result cleanup() renamed to reset_for_next_ps_execution() to better
reflect function().
All result method revised and freeing resources made "symetric".
Destructor of result object called for SP.
Added skipped invalidation in case of error in insert.
Removed misleading naming of reset(thd) (could be mixed with
with reset()).
This problem occured for statements like `INSERT INTO t1 SELECT 1`,
which do not have tables in the SELECT part. In such scenarios
SELECT_LEX::insert_tables was not properly set at `setup_tables()`,
and this led to either incorrect execution or a crash
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
This bug has the same nature as the issues
MDEV-34718: Trigger doesn't work correctly with bulk update
MDEV-24411: Trigger doesn't work correctly with bulk insert
To fix the issue covering all use cases, resetting the thd->bulk_param
temporary to the value nullptr before invoking triggers and restoring
its original value on finishing execution of a trigger is moved to the method
Table_triggers_list::process_triggers
that be invoked ultimately for any kind of triggers.
Similarly to "ALTER TABLE fixes for high-level indexes", don't enable bulk
insert when issuing create ... insert into a table containing vector
index. InnoDB can't handle situation when bulk insert is enabled for
one table but disabled for another. We can't do bulk insert on vector
index as it does table updates currently.
MDEV-33407 Parser support for vector indexes
The syntax is
create table t1 (... vector index (v) ...);
limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported
MDEV-33404 Engine-independent indexes: subtable method
added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.
MDEV-33406 basic optimizer support for k-NN searches
for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.
In particular, internal tmp tables created in the slave thread
are perfectly thread specific
MDEV-27277 added warnings on truncation during sorting for SELECTs
but did not for DML operations. However, UPDATEs and DELETEs may also
perform sorting and thus produce warnings. This commit fixes that
The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.
It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.
The following code shows the issue
Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
(connect=0x5080000027b8,
put_in_cache=<optimized out>) at sql/sql_connect.cc:1399
THD *thd;
1399 thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0
The address of thd is 24M away from the stack pointer
(gdb) info reg
...
rsp 0x7fffef4e7bc0 0x7fffef4e7bc0
...
r13 0x7fffedee7060 140737185214560
r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer
I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.
To solve this issue in a portable way, I have added two functions:
my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()
As a fallback for other compilers/arch we use the address of a local
variable.
my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.
Server changes are:
- Moving setting of thread_stack to THD::store_globals() using
my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
allocates a lot on the stack before calling store_globals(). When
using estimates for stack start, we reduce stack_size with
MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
before calling store_globals().
I also added a unittest, stack_allocation-t, to verify the new code.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).
Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.
Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.
Hash index is vcol-based wrapper (MDEV-371). row_end is added to
unique index. So when row_end is updated unique hash index must be
recalculated via vcol_update_fields(). DELETE did not update virtual
fields, so DELETE HISTORY was getting wrong hash value.
The fix does update_virtual_fields() on vers_update_end() so in every
case row_end is updated virtual fields are updated as well.
work consistently on replication
Row-based replication does not execute CREATE .. SELECT but instead
CREATE TABLE. CREATE .. SELECT creates implict system fields on
unusual place: in-between declared fields and select fields. That was
done because select_field_pos logic requires select fields go last in
create_list.
So, CREATE .. SELECT on master and CREATE TABLE on slave create system
fields on different positions and replication gets field mismatch.
To fix this we've changed CREATE .. SELECT to create implicit system
fields on usual place in the end and updated select_field_pos for
handling this case.