opt_calc_index_goodness(): Correct an inaccurate condition.
We can very well use a clustered index of a table that is subject
to online rebuild. But we must not choose an index that has not been
committed (it is a secondary index that was not fully created)
or that is corrupted or not a normal B-tree index.
opt_search_plan_for_table(): Remove some redundant code, now that
opt_calc_index_goodness() checks against corrupted indexes.
The test case allows this code to be exercised. The main observation
in the following:
./mtr --rr innodb.stats_persistent
rr replay var/log/mysqld.1.rr/latest-trace
should be that when opt_search_plan_for_table() is being invoked by
dict_stats_update_persistent() on the being-altered statistics table
in the 2nd call after ha_innobase::inplace_alter_table(),
and the fix in opt_calc_index_goodness() is absent,
it would choose the code path if (n_fields == 0), that is, a full
table scan, instead of searching for the record. The GDB commands to
execute in "rr replay" would be as follows:
break ha_innobase::inplace_alter_table
continue
break opt_search_plan_for_table
continue
continue
next
next
…
Reviewed by: Vladislav Lesin
os_innodb_umask was of the incorrect type resulting in warnings
in clang-19. The correct type is mode_t.
As os_innodb_umask was set during innnodb_init from my_umask,
corrected the type there along with its companion my_umask_dir.
Because of this, the defaults mask values in innodb never
had an effect.
The resulting change allow found signed differences in
my_create{,_nosymlink}, open_nosymlinks:
mysys/my_create.c:47:20: error: operand of ?: changes signedness from ‘int’ to ‘mode_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
47 | CreateFlags ? CreateFlags : my_umask);
Ref: clang-19 warnings:
[55/123] Building CXX object storage/innobase/CMakeFiles/innobase.dir/os/os0file.cc.o
storage/innobase/os/os0file.cc:1075:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1075 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1249:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1249 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1381:45: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1381 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
We periodically observe assertion failures in the mtr tests,
specifically in the /storage/innobase/row/row0ins.cc file,
following a WSREP error. The error message is: 'WSREP: record
locking is disabled in this thread, but the table being modified
is not mysql/wsrep_streaming_log: mysql/innodb_table_stats.'"
This issue seems to occur because, upon opening the table,
innodb_stats_auto_recalc may trigger, which Galera does not
anticipate. This commit should fix this bug.
- Replace statement fails with duplicate key error when multiple
unique key conflict happens. Reason is that Server expects the
InnoDB engine to store the confliciting keys in ascending order.
But the InnoDB doesn't store the conflicting keys
in ascending order.
Fix:
===
- Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine
only when unique index order is different in .frm and innodb dictionary.
Problem:
========
- dict_stats_table_clone_create() does not initialize the
flag stats_error_printed in either dict_table_t or dict_index_t.
Because dict_stats_save_index_stat() is operating on a copy
of a dict_index_t object, it appears that
dict_index_t::stats_error_printed will always be false
for actual metadata objects, and uninitialized in
dict_stats_save_index_stat().
Solution:
=========
dict_stats_table_clone_create(): Assign stats_error_printed
for table and index while copying the statistics
When MariaDB Server is run in a container under
Windows Subsystem for Linux, the fstat(2) system calls that InnoDB
invokes in os_file_set_size() or os_file_get_size() are causing a
failure in case the file had been renamed in the past while the file
handle was open. This affects at least ALTER TABLE and OPTIMIZE TABLE.
os_file_get_size(): Invoke lseek(2) instead of fstat(2). We do not mind
if the file pointer is moving to the end of the file, because InnoDB
exclusively invokes positioned reads and writes, or in some rare cases,
appends to an existing file.
os_file_set_size(): Invoke os_file_get_size() instead of fstat(2).
Define the POSIX and Windows versions separately. Formerly, the
Windows version was called os_file_change_size_win32().
fil_node_t::read_page0(): Use os_file_get_size() to determine the
size, and do not crash on error.
fil_node_t::read_metadata(): Remove the non-Windows stat* parameter
and always invoke fstat(2) outside Windows, but do tolerate errors.
Because fstat(2) is more likely to fail than lseek(2), and this is
not time critical code, we can afford the extra lseek(2) system call.
Reviewed by: Vladislav Vaintroub
- InnoDB fulltext rebuilds the FTS COMMON table while adding the
new fulltext index. This can be optimized by avoiding rebuilding
the FTS COMMON table in case of FTS COMMON TABLE already exists.
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.
Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
Problem:
=======
- Redundant table fails to insert into the table after
instant drop blob column. Instant drop column only marking
the column as hidden and consecutive insert statement tries
to insert NULL value for the dropped BLOB column and returns
the fixed length of the blob type as 65535. This lead to
row size too large error.
Fix:
====
For redundant table, if the non-fixed dropped column can be null
then set the length of the field type as 0.
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.
row_build_spatial_index_key(): Initialize the tmp_mbr array completely.
check_if_supported_inplace_alter(): Fix the spelling mistake of alter
row_purge_reset_trx_id(): Reserve large enough offsets for accomodating
the maximum width PRIMARY KEY followed by DB_TRX_ID,DB_ROLL_PTR.
Reviewed by: Thirunarayanan Balathandayuthapani
Don't allow the referencing key column from NULL TO NOT NULL
when
1) Foreign key constraint type is ON UPDATE SET NULL
2) Foreign key constraint type is ON DELETE SET NULL
3) Foreign key constraint type is UPDATE CASCADE and referenced
column declared as NULL
Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values
get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.
fk_check_column_changes(): Enforce the above rules for COPY
algorithm
innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation
innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm
dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
The method was declared to return an unsigned integer, but it is
really a boolean (and used as such by all callers).
A secondary change is the addition of "const" and "noexcept" to this
method.
In ha_mroonga.cpp, I also added "inline" to the two helper methods of
referenced_by_foreign_key(). This allows the compiler to flatten the
method.
GCC 12.2.0 could issue -Wnonnull for an unreachable call to
strlen(new_path). Let us prevent that by replacing the condition
(type == FILE_RENAME) with the equivalent (new_path).
This should also optimize the generated code, because the life time
of the parameter "type" will be reduced.
In the bug report MDEV-32817 it occurred that the function
row_mysql_get_table_status() is outputting a fil_space_t*
as if it were a numeric tablespace identifier.
ib_push_warning(): Remove. Let us invoke push_warning_printf() directly.
innodb_decryption_failed(): Report a decryption failure and set the
dict_table_t::file_unreadable flag. This code was being duplicated in
very many places. We return the constant value DB_DECRYPTION_FAILED
in order to avoid code duplication in the callers and to allow tail calls.
innodb_fk_error(): Report a FOREIGN KEY error.
dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove.
This code was being used in dict_create_add_foreign_to_dictionary()
in an apparently uncovered code path. That ib_push_warning() call
would pass the integer i+1 instead of a pointer to NUL terminated
string ("%s"), and therefore the call should have resulted in a crash.
dict_print_info_on_foreign_key_in_create_format(),
innobase_quote_identifier(): Add const qualifiers.
row_mysql_get_table_error(): Replaces row_mysql_get_table_status().
Display no message on DB_CORRUPTION; it should be properly reported at
the SQL layer anyway.
recv_recovery_from_checkpoint_start(): Abort startup due to log
corruption if we were unable to parse the entire log between
the latest log checkpoint and the corresponding FILE_CHECKPOINT record.
Also, reduce some code bloat related to log output and log_sys.mutex.
Reviewed by: Debarun Banerjee
int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)
CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.
At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.
At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Reason:
======
- InnoDB fails to load the instant alter table metadata from
clustered index while loading the table definition.
The reason is that InnoDB metadata blob has the column length
exceeds maximum fixed length column size.
Fix:
===
- InnoDB should treat the long fixed length column as variable
length fields that needs external storage while initializing
the field map for instant alter operation
Problem:
========
- After the commit ada1074bb1 (MDEV-14398)
fil_crypt_set_encrypt_tables() iterates through all tablespaces to
fill the default_encrypt tables list. This was a trigger to
encrypt or decrypt when key rotation age is set to 0. But import
tablespace does call fil_crypt_set_encrypt_tables() unnecessarily.
The motivation for the call is to signal the encryption threads.
Fix:
====
ha_innobase::discard_or_import_tablespace: Remove the
fil_crypt_set_encrypt_tables() and add the import tablespace
to the default encrypt list if necessary
- commit 85db534731 (MDEV-33400)
retains the instantness in the table definition after discard
tablespace. So there is no need to assign n_core_null_bytes
during instant table preparation unless they are not
initialized.
We need to work around deficiencies of Valgrind, and apparently
the previous work-around attempts
(such as d247d64988) do not work
anymore, definitely not on recent clang-based compilers.
MemorySanitizer should be fine; unfortunately we set HAVE_valgrind for it
as well.
- During XA PREPARE, InnoDB releases the non-exclusive locks.
But it fails to remove the non-exclusive table lock from the
transaction table locks. In the mean time, main thread evicts
the table from the LRU cache. While rollbacking the XA transaction,
InnoDB iterates through the table locks to check whether it
holds lock on any system tables and wrongly assumes the
evicted table as system table since the table id is 0
Fix:
===
During XA PREPARE, remove the table locks of the transaction while
releasing the non-exclusive locks.
Problem:
========
- During shutdown, InnoDB tries to free the asynchronous
I/O slots and hangs. The reason is that InnoDB disables
asynchronous I/O before waiting for pending
asynchronous I/O to finish.
buf_load(): InnoDB aborts the buffer pool load due to
user requested shutdown and doesn't wait for the asynchronous
read to get completed. This could lead to debug assertion
in buf_flush_buffer_pool() during shutdown
Fix:
===
os_aio_free(): Should wait all read_slots and write_slots
to finish before disabling the aio.
buf_load(): Should wait for pending read request to complete
even though it was aborted.
- Column stat_value and sample_size in mysql.innodb_index_stats
table is declared as BIGINT UNSIGNED without any check constraint.
user manually updates the value of stat_value and sample_size
to zero. InnoDB aborts the server while reading the statistics
information because InnoDB expects at least one leaf
page to exist for the index.
- To fix this issue, InnoDB should interpret the value of
stat_n_leaf_pages, stat_index_size in innodb_index_stats
stat_clustered_index_size, stat_sum_of_other_index_sizes
in innodb_table_stats as valid one even though user
mentioned it as 0.
DML transactions on FK-child tables also get table locks
on FK-parent tables. If there is a DML transaction holding
such a lock, and a TOI transaction starts, the latter
BF-aborts the former and puts itself into a waiting state.
If at this moment another DML transaction on FK-child table
starts, it doesn't check that the transaction waiting on
a parent table lock is TOI, and it erroneously BF-aborts
the waiting TOI transaction.
The fix: don't roll back high-priority transaction waiting
on a lock in InnoDB, instead roll back an incoming DML
transaction.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
During read only mode, InnoDB doesn't allow checkpoint to happen.
So InnoDB should throw the warning when InnoDB tries to
force the checkpoint when innodb_read_only = 1 or
innodb_force_recovery = 6.
Valgrind looks as the assertions as examining uninitalized values.
As the assertions are tested in other Debug builds we know
it isn't all invalid.
Account for Valgrind by removing the assertion under
the WITH_VALGRIND=1 compulation.
ha_innobase::info_low(): For HA_STATUS_VARIABLE without
HA_STATUS_VARIABLE_EXTRA, let us avoid unnecessary and costly updates
of the data_free statistics, which are only needed for SHOW TABLE STATUS.
This optimization had been enabled in
commit 247ecb7597 but not utilized until now.
- InnoDB tries to write FILE_CHECKPOINT marker during
early recovery when log file size is insufficient.
While updating the log checkpoint at the end of the recovery,
InnoDB must already have written out all pending changes
to the persistent files. To complete the checkpoint, InnoDB
has to write some log records for the checkpoint and to
update the checkpoint header. If the server gets killed
before updating the checkpoint header then it would lead
the logfile to be unrecoverable.
- This patch avoids FILE_CHECKPOINT marker during early
recovery and narrows down the window of opportunity to
make the log file unrecoverable.
- During recovery, InnoDB may fail to shrink the undo tablespaces
when there are no pages to recover while applying the redo log.
This issue exists only when innodb_undo_truncate is enabled.
trx_lists_init_at_db_start() could've applied the redo logs
for undo tablespace page0.
On an UBSAN clang-15 build, if running with UBSAN option
halt_on_error=1 (the issue doesn't show up without it),
MTR fails during mysqld --bootstrap with UBSAN error:
call to function io_callback(tpool::aiocb*) through pointer to incorrect function type 'void (*)(void *)'
This patch corrects the parameter type of io_callback
to match its expected type defined by callback_func,
i.e. (void*).
Reviewed By:
============
<TODO>
In cmake -DWITH_UBSAN=ON builds with clang but not with GCC,
-fsanitize=undefined will flag several runtime errors on
function pointer mismatch related to the lock-free hash table LF_HASH.
Let us use matching function signatures and remove function pointer
casts in order to avoid potential bugs due to undefined behaviour.
These errors could be caught at compilation time by
-Wcast-function-type-strict, which is available starting with clang-16,
but not available in any version of GCC as of now. The old GCC flag
-Wcast-function-type is enabled as part of -Wextra, but it specifically
does not catch these errors.
Reviewed by: Vladislav Vaintroub
number of non-user tablespace.
fil_space_t::try_to_close(): Don't try to close
the tablespace which is acquired by the caller of
the function
Added the suppression message in open_files_limit test case
number of non-user tablespace.
- InnoDB only closes the user tablespace when the number of open
files exceeds innodb_open_files limit. In that case, InnoDB should
make sure that innodb_open_files value should be greater
than number of undo tablespace, system and temporary tablespace files.
Problem:
=======
- This commit is a merge of mysql commit 129ee47ef994652081a11ee9040c0488e5275b14.
InnoDB FTS can be in inconsistent state when sync operation
terminates the server before committing the operation. This
could lead to incorrect synced doc id and incorrect query results.
Solution:
========
- During sync commit operation, InnoDB should pass
the sync transaction to update the max doc id
in the config table.
fts_read_synced_doc_id() : This function is used
to read only synced doc id from the config table.
The shared counter template ib_counter_t uses the function
my_timer_cycles() as a source of pseudo-random numbers to pick a shard.
On some platforms, my_timer_cycles() could return the constant value 0.
get_rnd_value(): Remove.
my_pseudo_random(): Implement as an alias of my_timer_cycles() or
a wrapper for pthread_self().
Reviewed by: Vladislav Vaintroub
- InnoDB page compression works only on COMPACT or DYNAMIC row
format tables. So InnoDB should throw error when alter table
tries to enable PAGE_COMPRESSED for redundant table.
Correct the second parameter for strxnmov to prevent potential buffer
overflows. The second parameter must be one less than the size of the
input buffer to avoid writing past the end of the buffer.
While the second parameter is usually correct, there are exceptions
that need fixing.
This commit addresses the issue within frm_file_exists() and other
affected places.