String if buffer is not enough String class does realloc and then
frees realloced buffer at String destruction.
The proper way would be to implement MEM_ROOT-based StringMemRoot with
overridden virtual real_alloc(), realloc_raw() and free_buffer(). But
the problem is String class is not properly constructed as C++ class
(f.ex. table->alias), so there is no vtable for virtual calls.
The patch allows the local work with dynamical buffer and then copy it
to table mem_root at the return.
While ALTER thread tries to notify SELECT thread about lock conflict
it accesses its TABLE object (THD::notify_shared_lock()) and lock data
(mysql_lock_abort_for_thread()). As part of accessing lock data it
calls ha_partition::store_lock() which iterates over all partitions
and does their store_lock().
The problem is SELECT opened 2 read partitions, but
ha_partition::store_lock() tries to access all partitions as indicated
in m_tot_parts which is 4. So the last 2 partitions m_file[2] and
m_file[3] are uninitialized and store_lock() accesses uninitialized
data.
The code in ha_partition::store_lock() does this wrong handling to use
all partitions specifically for the case of
mysql_lock_abort_for_thread(), this is conducted with comment:
/*
This can be called from get_lock_data() in mysql_lock_abort_for_thread(),
even when thd != table->in_use. In that case don't use partition pruning,
but use all partitions instead to avoid using another threads structures.
*/
if (thd != table->in_use)
{
for (i= 0; i < m_tot_parts; i++)
to= m_file[i]->store_lock(thd, to, lock_type);
}
The explanation is "to avoid using another threads structures" does
not really explain why this change was needed.
The change was originally introduced by:
commit 9b7cccaf319
Author: Mattias Jonsson <mattias.jonsson@oracle.com>
Date: Wed May 30 00:14:39 2012 +0200
WL#4443:
final code change for dlenevs review.
- Don't use pruning in lock_count().
- Don't use pruning in store_lock() if not owning thd.
- Renamed is_fields_used_in_trigger to
is_fields_updated_in_trigger() and check if they
may be updated.
- moved out mark_fields_used(TRG_EVENT_UPDATE)
from mark_columns_needed_for_update().
And reverted the changed call order. And call
mark_fields_used(TRG_EVENT_UPDATE) instead.
which also fails to explain the rationale of the change. The original
idea of WL#4443 is to reduce locks and this change does not happen to
serve this goal.
So reverting this change restores original behaviour of using only
partitions marked for use and fixes invalid access to uninitialized
data.
used as key fields
Disable compression for tmp fields. Key fields for view are created
later from tmp fields so we cannot disable compression just for key
fields because failure goes onto original tmp field.
forever, cannot be killed
Virtual_column_info::fix_and_check_expr() first does fix_expr() which
finds all fields in query tables and then check_vcol_func_processor()
which prohibits SELECT expression. So before we get the error
prohibiting SELECT in vcol expression we must satisfy fix_expr() with
all the opened tables. The patch achieves this by iterating
query_tables from parsed vcol expression and by assigning opened
tables from the main parser context (as they are already preopened if
the parser sees some SELECT expressions).
But the problem is, we cannot use these TABLE objects fully for vcol
expression, at least the comment about MERGE tables states so:
/* MERGE tables need to access parent and child TABLE_LISTs. */
DBUG_ASSERT(tables->table->pos_in_table_list == tables);
Therefore after we have done vcol check we should revert back
TABLE_LISTs to the original TABLE-less state.
As CREATE OR REPLACE first drops the original table (at least prior to
MDEV-25292) we can use the above hack from the currently opening
table. And that is possible only after bitmaps initialized, so we move
their execution to a little earlier stage, before vcol parsing. But
partitioning depends on uninitialized bitmaps, so we temporarily
revert some to make partitioning initialization happy.
Note that plain CREATE TABLE just fails this case in parser with
NO_SUCH_TABLE, CREATE OR REPLACE doesn't fail in parser as the old
table still exists.
Now to the hang, mysql_rm_table_no_locks() does TDC_RT_REMOVE_ALL
which waits while share is closed. The table normally is open only as
OPEN_STUB, this is what parser does for CREATE TABLE. But for SELECT
the table is opened not as a stub. If it is the same table name we
anyway have two TABLE_LIST objects: stub and not stub. So for "not
stub" TDC_RT_REMOVE_ALL sees open count and decides to wait until it
is closed. And of course it hangs, because that was opened in the same
thread. Now we force close such TABLE objects before
mysql_rm_table_no_locks().
And the third, condition for sequences was wrong: we have to check
TABLE_LIST::sequence property to make sure we processing sequence.
Although the `my_thread_id` type is 64 bits, binlog format specs
limits it to 32 bits in practice. (See also: MDEV-35706)
The writable SQL variable `pseudo_thread_id` didn’t realize this though
and had a range of `ULONGLONG_MAX` (at least `UINT64_MAX` in C/C++).
It consequentially accepted larger values silently, but only the lower
32 bits of whom gets binlogged; this could lead to inconsistency.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
it's incorrect to zero out table->triggers->extra_null_bitmap
before a statement, because if insert uses an explicit field list
and omits a field that has no default value, the field should
get NULL implicitly. So extra_null_bitmap should have 1s for all
fields that have no defaults
* create extra_null_bitmap_init and initialize it as above
* copy extra_null_bitmap_init to extra_null_bitmap for inserts
* still zero out extra_null_bitmap for updates/deletes where
all fields definitely have a value
* make not_null_fields_have_null_values() to send
ER_NO_DEFAULT_FOR_FIELD for fields with no default and no value,
otherwise creation of a trigger with an empty body would change the
error message
Item:print_for_table_def() uses QT_TO_SYSTEM_CHARSET to print
the DEFAULT expression into FRM file during CREATE TABLE.
Therefore, the expression is encoded in utf8 in FRM.
get_field_default_value() erroneously used field->charset() to
print the DEFAULT expression at SHOW CREATE TABLE time.
Fixing get_field_default_value() to use &my_charset_utf8mb4_general_ci instead.
This makes DEFAULT work in the way way with:
- virtual column expressions:
if (field->vcol_info)
{
StringBuffer<MAX_FIELD_WIDTH> str(&my_charset_utf8mb4_general_ci);
field->vcol_info->print(&str);
- check constraint expressions:
if (field->check_constraint)
{
StringBuffer<MAX_FIELD_WIDTH> str(&my_charset_utf8mb4_general_ci);
field->check_constraint->print(&str);
Additional cleanup:
Fixing system_charset_info to &my_charset_utf8mb4_general_ci in a few
places to make non-BMP characters work in DEFAULT, virtual column,
check constraint expressions.
A part of the test, that tests that a frame of recursive SP takes the
same amount of stack, relies on the fact that 3 calls to that
SP are executed in the same OS thread. This is not guaranteed with
threadpool(not with Windows native threadpool, anyway)
Fix the test to execute SP calls in the same thread, as semicolon-separated
batched command.
(Variant 2: use stack for buffers)
my_malloc_size_cb_func() has a call to thd->alloc() to produce an error message.
thd->alloc() calls alloc_root(), so one can end up with this stack trace:
alloc_root()
THD::alloc()
my_malloc_size_cb_func()
my_malloc()
alloc_root()
where alloc_root() calls itself. This is a problem, as alloc_root() is not
reenterable.
Fixed this by switching my_malloc_size_cb_func() to use space on the stack
instead.
In case of error last_pos points to null_element and there is no any
other children. tree_search_next() walks the children from last_pos
until the leaves (null_element) ignoring the case the topmost parent
in search state is the leaf itself.
Quick read record uses different handler (H1) for finding records. It
cannot use ha_delete_row() handler (H2) as it is different search
mode: inited == INDEX for H1, inited == RND for H2. So, read handler
H1 uses index while write handler H2 uses random access.
For going next record in H1 there is info->last_pos optimization for
stepping index via tree_search_next(). This optimization can work with
deleted rows only if delete is conducted in the same handler, there
is:
67 int hp_rb_delete_key(HP_INFO *info, register HP_KEYDEF *keyinfo,
68 const uchar *record, uchar *recpos, int flag)
69 {
...
74 if (flag)
75 info->last_pos= NULL; /* For heap_rnext/heap_rprev */
But this cannot work for different handler. So, last_pos in H1 after
delete in H2 contains stale info->parents array and last_pos points
into that parents. In the specific test case last_pos' parent is
already freed node and tree_search_next() steps into it.
The fix invalidates local savings of info->parents and info->last_pos
based on key_version. Record deletion increments share->key_version in
H2, so in H1 we know the tree might be changed.
Another good measure would be to use H1 for delete. But this is bigger
refactoring than just bug fixing.
mysql_compare_tables() failed because of no long hash index generated
fields in prepared tmp_create_info. In comparison we should skip these
fields in the original table by:
1. skipping INVISIBLE_SYSTEM fields;
2. getting key_info from table->s instead of table as TABLE_SHARE
contains unwrapped key_info.
system versioned table
For versioned table REPLACE first tries to insert a row, if it gets
duplicate key error and optimization is possible it does UPDATE +
INSERT history. If optimization is not possible it goes normal branch
for UPDATE to history and repeats the cycle of INSERT.
The failure was in normal branch when we tried UPDATE to history but
such history already exists from previous cycles. There is no such
failures in optimized branch because vers_insert_history_row() already
ignores duplicates.
The fix ignores duplicate errors for UPDATE to history and does DELETE
instead.
`limit >= trx_id' failed in purge_node_t::skip
For fast alter partition ALTER lost hash fields in frm field
count. mysql_prepare_create_table() did not call add_hash_field()
because the logic of ALTER-ing field types implies automatic
promotion/demotion to/from hash index. So we don't pass hash algorithm
to mysql_prepare_create_table() and let it decide itself, but it
cannot decide it correctly for fast alter partition.
So now mysql_prepare_alter_table() is a bit more sophisticated on what
to pass in the algorithm. If not changed any fields it will force
mysql_prepare_create_table() to re-add hash fields by setting
HA_KEY_ALG_HASH.
The problem with the original logic is mysql_prepare_alter_table()
does not care 100% about hash property so the decision is blurred
between mysql_prepare_alter_table() and mysql_prepare_create_table().
MDEV-28127 did is_equal() which compared vcol expressions
literally. But another table vcol expression is not equal because of
different table name.
We implement another comparison method is_identical() which respects
different table name in vcol comparison. If any field item points to
table_A and compared field item points to table_B, such items are
treated as equal in (table_A, table_B) comparison. This is done by
cloning table_B expression and renaming any table_B entries to table_A
in it.
DELAYED with virtual columns
Segfault was cause by two different copies of same Field instance in
prepared delayed insert. One was made by
Delayed_insert::get_local_table() (see make_new_field()). That copy
went through parse_vcol_defs() and received new vcol_info->expr.
Another one was made by copy_keys_from_share() by this code:
/*
We are using only a prefix of the column as a key:
Create a new field for the key part that matches the index
*/
field= key_part->field=field->make_new_field(root, outparam, 0);
field->field_length= key_part->length;
So, key_part and table got different objects of same field and the
crash was because key_part->field->vcol_info->expr is NULL.
The fix does update_keypart_vcol_info() to update vcol_info->expr in
key_part->field.
Cleanup: memdup_vcol() is static inline instead of macro + check OOM.
- Needless engaged_ removed;
- SCOPE_VALUE, SCOPE_SET, SCOPE_CLEAR macros for neater declaration;
- IF_CLASS / IF_NOT_CLASS SFINAE checkers to pass arg by value or
reference;
- inline keyword;
- couple of refactorings of temporary free_list.
Example:
{
auto _= make_scope_value(var, tmp_value);
}
make_scope_value(): a function which returns RAII object which temporary
changes a value of a variable
detail::Scope_value: actual implementation of such RAII class.
It shouldn't be used directly! That's why it's inside a namespace detail.
* Innobase `os0file.cc`: use `PRIu64` over `llu`
* These came after I prepared #3485.
* MyISAM `mi_check.c`: in impossible block length warning
* I missed this one in #3485 (and #3360 too?).
The fix in MDEV-34825/#3484/dff354e7df2f originally just took the FreeBSD carried
patch of inline ASM as it didn't have the __ppc_get_timebase function.
What clang does have is the __builtin_ppc_get_timebase, which was
replaced in the same commit, which was the fix taken from Alpine.
To reduce complexity - we only need one working function rather
than an equivalent asm implementation.
Noted by Marko, thanks!
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Row-injection updates don’t correctly set the historical partition
for tables with system versioning and system_time partitions. This
results in inconsistencies between the master and slave when
replicating transactions that target such tables (i.e. the primary
server would correctly distribute archived rows amongst its
partitions, whereas the replica would have all archived rows in a
single partition). The function
partition_info::vers_set_hist_part(THD*) is used to set the
partition; however, its initial check for
vers_require_hist_part(THD*) returns false, bypassing the rest of
the function (which sets up the partition to use). This is because
the actual check uses the LEX sql_command (via
LEX::vers_history_generating()) to determine if the command is valid
to generate history. Row injections don’t have sql_commands though.
This patch provides a fix which extends the check in
vers_history_generating() to additionally allow row injections to be
history generating (via the function LEX::is_stmt_row_injection()).
Special thanks to Jan Lindstrom <jan.lindstrom@galeracluster.com>
for his work in reproducing the bug, and providing an initial test
case.
Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Aleksey Midenkov <midenok@mariadb.com>
Problem was caused by MDEV-31413 commit 277968aa where
mysql.gtid_slave_pos table was replicated by Galera.
However, as not all nodes in Galera cluster are replica
nodes, rows were not deleted from table.
In this fix this is corrected so that mysql.gtid_slave_pos
table is not replicated by Galera. Instead when Galera
node receives GTID event and wsrep_gtid_mode=1, this event
is stored to mysql.gtid_slave_pos table.
Added test case galera_2primary_replica for 2 async
primaries replicating to galera cluster.
Added test case galera_circular_replication where
async primary replicates to galera cluster and
one of the galera cluster nodes is master
to async replica.
Modified test case galera_restart_replica to monitor
gtid positions and rows in mysql.gtid_pos_table.
This is the test case from commit 46aaf328ce
(MDEV-35830) to cover backup_undo_trunc() in the regression tests of
earlier major versions of mariadb-backup.
value.type_handler()->result_type()'
failed in virtual bool Item_param::get_date(THD*, MYSQL_TIME*, date_mode_t)
This is a cleanup for MDEV-25593. When binding from NULL, IGNORE or DEFAULT,
value.type_handler should be set to &type_handler_null,
to satisfy the DBUG_ASSERT in Item_param::get_date().
The problems were that:
1) resources was freed "asimetric" normal execution in send_eof,
in case of error in destructor.
2) destructor was not called in case of SP for result objects.
(so if the last SP execution ended with error resorces was not
freeded on reinit before execution (cleanup() called before next
execution) and destructor also was not called due to lack of
delete call for the object)
Result cleanup() renamed to reset_for_next_ps_execution() to better
reflect function().
All result method revised and freeing resources made "symetric".
Destructor of result object called for SP.
Added skipped invalidation in case of error in insert.
Removed misleading naming of reset(thd) (could be mixed with
with reset()).
TABLE::use_all_columns turn on all bits of read_set, which is
interpreted by innodb as a request to read all columns. Without doing
so before calling init_read_record(), innodb will not retrieve any
columns if mysql.servers table has been altered to use innodb as the
engine, and any foreign servers stored in the table are "lost".
storage/maria/ma_open.c:352:7: runtime error: call to function debug_sync(THD*, char const*, unsigned long)
through pointer to incorrect function type 'void (*)(void *, const char *, unsigned long)'
The THD argument is a void *. Because of the way myisam is .c files the
function prototype is mismatched.
As Marko pointed out the MYSQL_THD is declared as void * in C.
Thanks Jimmy Hú for noting that struct THD is the equalivalant in C to
the class THD. The C NULL was also different to the C++ nullptr.
Corrected the definations of MYSQL_THD and DEBUG_SYNC_C to be C and C++
compatible.
Fix regression introduced by commits 9588526 which attempted to address
MDEV-27037. With the regression, mariadb-binlog cannot process multiple
log files when --stop-datetime is specified.
The change is to keep recording timestamp of last processed event, and
after all log files are processed, if the last recorded timestamp has not
reached specified --stop-datetime, it will emit a warning. This applies
when processing local log files, or log files from remote servers.
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.
Co-authored-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
The mismatch occurs on the function calls as in the sql/sql_udf.h the
types of "error" and "is_null" are unsigned char rather than char.
This is corrected for the udf functions:
* spider_direct_sql
* spider_direct_bg_sql
* spider_flush_table_mon_cache
* spider_copy_tables
* spider_ping_table
Reviewer: Yuchen Pei
There were too many C casts rather than C++ casts here.
Started a partial conversion covering within-file scoped
changes.
Suggested by Brandon Nesterenko
In plugins, use the correct resolver for ULONG and ULONGLONG
types.
InnoDB has a UINT type as evidenced by "Unknown variable type code 0x182
in plugin 'InnoDB'." so the implementation for UNSIGNED INT was added.
Any InnoDB mtr test that changes lock_wait_timeout,
spider_param_force_commit and some MyRocks unsigned thread server
variables can verify this change is correct.
Reviewer: Brandon Nesterenko
through pointer to incorrect function type.
The argument is void* rather than char* and was missing
system_status_var * as an argument.
This shows up with UBSAN testing under clang.
Reviewer: Brandon Nesterenko
cannot have an assert in Warning_info::push_warning()
because SQL command SIGNAL can set an absolutely arbitrary
message, even an empty one or ending with '\n'
move the assert into push_warning() and my_message_sql().
followup for 9508a44c37
The issue is caused by a logic error in Item_sum::get_tmp_table_item() method:
it resets arguments of the item to point to the result fields during
change_ref_to_tmp_fields() call. However, Item_sum arguments must not be modified.
It is enough for Item_sum objects to call ancestor's implementation
Item::get_tmp_table_item().
This fix is in accordance with MySQL commit 2e3dc09087c24798c90e05163ed3d931f6b93db3
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>