Some code was duplicated near the start of the function,
only for InnoDB, not XtraDB. This was noticed by
comparing the InnoDB between MariaDB and MySQL.
ha_innobase::open(): Always ignore problems with FOREIGN KEY constraints
(pass DICT_ERR_IGNORE_FK_NOKEY), no matter whether foreign_key_checks
is enabled. Instead, we must report errors when enforcing the FOREIGN KEY
constraints. As a result of ignoring these errors, the tables will be
loaded with dict_foreign_t objects whose foreign_index or referenced_index
will be NULL.
Also, pass DICT_ERR_IGNORE_FK_NOKEY instead of DICT_ERR_IGNORE_NONE
to dict_table_open_on_id_low() in many other cases. Notably, on
CREATE TABLE and ALTER TABLE, we will keep validating the FOREIGN KEY
constraints as before.
dict_table_open_on_name(): If no other flags than
DICT_ERR_IGNORE_FK_NOKEY are set, refuse access to unreadable tables.
Some encryption tests rely on this code path.
For the DML code path, we used to have the problem that when
one of the indexes was missing in dict_foreign_t, we would ignore
the FOREIGN KEY constraint altogether. The following changes
address that.
row_ins_check_foreign_constraints(): Add the parameter pk.
For the primary key, consider also foreign key constraints for which
foreign->foreign_index=NULL (no underlying index is available).
row_ins_check_foreign_constraint(): Report errors also for !check_ref.
Remove a redundant check for srv_read_only_mode.
row_ins_foreign_report_add_err(): Tolerate foreign->foreign_index=NULL.
fkerr_t: Errors for the foreign key checks. Replaces ulint,
which used #define that looked like dberr_t literals.
wsrep_dict_foreign_find_index(): Remove. Use
dict_foreign_find_index() instead, with default parameters.
dict_foreign_push_index_error(): Do not add redundant quotes
around quoted table names.
heap_scan() makes info->next_block to be either an integer number
of share->block.records_in_block's or the total number of
records in the table. So when this total number or records changes,
info->next_block needs to be recalculated to take it into account.
This is a different fix for "Fixes a problem with heap when scanning and insert rows at the same time"
===================
- Modify tracing to use htrc to be compatible with old versions
when this code is used to make an EOM module.
modified: storage/connect/restget.cpp
modified: storage/connect/tabrest.cpp
- Path apparently not needed for the cpprest lib on Linux
modified: storage/connect/CMakeLists.txt
Datafile::find_space_id(): Fix a regression that was introduced
in c0f47a4a58 for MDEV-12026.
Because the function buf_page_is_corrupted() now determines
the physical page size from the fsp_flags, our buffer size must
agree with the fsp_flags.
buf_page_is_corrupted(): Use the correct accessor
fil_space_t::zip_size() for convering the tablespace flags.
ROW_FORMAT=COMPRESSED files never use innodb_checksum_algorithm=full_crc32.
In addition to files and Mongo collections, JSON as well as XML and CSV data can be retrieved
from the net as answers from REST queries. Because it uses and external package (cpprestsdk)
this is currently available only to MariaDB servers compiled from source.
-- Add compile flags needed on Windows /MD or /MDd (debug)
-- Also include some changes needed on Linux
modified: storage/connect/CMakeLists.txt
- Add the xtrc tracing function
modified: storage/connect/global.h
modified: storage/connect/plugutil.cpp
- Modify tracing to use xtrc and some typo
modified: storage/connect/array.cpp
modified: storage/connect/block.h
modified: storage/connect/restget.cpp
- Fix compilation error when ZIP is not supported
modified: storage/connect/ha_connect.cc
modified: storage/connect/tabfmt.cpp
- Add some tracing + typo
modified: storage/connect/mycat.cc
modified: storage/connect/tabjson.cpp
- Add conditional code based on MARIADB
This to be able to use the same code in CONNECT and EOM modules
modified: storage/connect/osutil.h
modified: storage/connect/tabrest.cpp
- Replace PlugSetPath by some concat (crashed on Fedora) + typo
modified: storage/connect/reldef.cpp
- Try to fix test failures
modified: zlib/CMakeLists.txt
========
During ibd file creation, InnoDB flushes the page0 without crypt
information. During recovery, InnoDB encounters encrypted page read
before initialising the crypt data of the tablespace. So it leads t
corruption of page and doesn't allow innodb to start.
Solution:
=========
Write crypt_data information in page0 while creating .ibd file creation.
During recovery, crypt_data will be initialised while processing
MLOG_FILE_NAME redo log record.
It looks like the merge of MySQL 5.7.9 to MariaDB 10.2.2 conflicted with
earlier changes that were made in MDEV-8588.
row_search_mvcc(): If the page is corrupted, avoid invoking
btr_cur_store_position(). The caller should not try to fetch
the next record after a hard error.
No memory access violated the bounds of fake_extra_buf[],
but GCC does not like the fact that the pointer fake_extra
ends up pointing before the array.
Allocate a dummy element at the start of fake_extra_buf[]
in order to silence the warning.
ins_node_create() does not initialize all members of que_common_t, so
zero-init them with mem_heap_zalloc().
Handle out-of-memory correctly.
Init insert_node->common.parent to fulfill the contract of thr usage.
Free insert_node subtree at row_update_vers_insert() exit.
When using field_conv(), which is called in case of field1=field2 copy in
fill_records(), full varstring's was copied, including unitialized bytes.
This caused valgrind to compilain about usage of unitialized bytes when
using Aria static length records.
Fixed by not using memcpy when copying varstrings but instead just copy
the real bytes.
If there're multiple row versions in InnoDB, reading one row from PK
may have O(N) complexity and reading from secondary keys may have
O(N^2) complexity.
The problem occurs when there are many pending versions of the same
row, meaning that the primary key is the same, but a secondary key is
different. The slowdown occurs when the secondary index is
traversed. This patch creates a helper class for the function
row_sel_get_clust_rec_for_mysql() which can remember and re-use
cached_clust_rec & cached_old_vers so that rec_get_offsets() does not
need to be called over and over for the clustered record.
Corrections by Kevin Lewis <kevin.lewis@oracle.com>
MDEV-20341 Unstable innodb.innodb_bug14704286
Removed test that tested the ability of interrupting long query which
is not long anymore.
Problem:
========
Checksum for the encrypted temporary tablespace is not stored in the page
for full crc32 format.
Solution:
========
Made temporary tablespace in full crc32 format irrespective of encryption
parameter.
buf_tmp_page_encrypt(), buf_tmp_page_decrypt() - Both follows full_crc32
format.
Fixed the following issues:
- Call info with HA_STATUS_CONST to ensure that (key_info->rec_per_key)
contains latest data
- Don't access rec_per_key if key_info->algorithm == HA_KEY_ALG_LONG_HASH
is in this case the rec_per_key points to uninitialized data
- Cleaned up code to avoid some extra 'if' and to make things more readable
- Updated test cases that used 'old' rec_per_key values
- Include the valgrind suppressions from the FB upstream
- Use HAVE_Valgrind, not HAVE_Purify (like the rest of MariaDB code does)
The call to DisownData() is now actually disabled under Valgrind
Starting with commit 210855ce5d
Valgrind became aware that the unused tail of the buffer that
is returned by thd_get_xid() is actually uninitialized.
The problem should exist already in MySQL 5.0. I was able to
repeat it on MariaDB Server 5.5 with some additional instrumentation.
InnoDB is allocating 128+4+4 bytes for the XID and the lengths of
its components, even when the XID is shorter than 64+64 bytes.
In MariaDB Server 10.3, while running the test main.xa_binlog,
in the xid_t::set() that is called by sql_yacc.yy, the 128-byte data
buffer was uninitialized according to Valgrind, and only the first bytes
were initialized. When the xid_t::data was copied to
thd.transaction.xid_state.xid.data, it happened so that the entire
target buffer was considered initialized. With MariaDB Server 10.4 since
the said commit, Valgrind will correctly be detect the tail of the buffer
as uninitialized.
The impact of this bug is as follows:
(1) InnoDB will write unnecessarily much redo log for XA PREPARE.
(2) InnoDB will write garbage bytes to the redo log and undo log pages.
(3) The garbage should be 'harmless', because on recovery, only the
actual payload of the XID will be used, based on the written length.
trx_rseg_write_wsrep_checkpoint(), trx_undo_write_xid(): Write only
the actually used length of xid->data to the data page, and
zero out the rest of the buffer by mlog_memset().
MDEV-17614 flags INSERT…ON DUPLICATE KEY UPDATE unsafe for statement-based
replication when there are multiple unique indexes. This correctly fixes
something whose attempted fix in MySQL 5.7
in mysql/mysql-server@c93b0d9a97
caused lock conflicts. That change was reverted in MySQL 5.7.26
in mysql/mysql-server@066b6fdd43
(with a substantial amount of other changes).
In MDEV-17073 we already disabled the unfortunate MySQL change when
statement-based replication was not being used. Now, thanks to MDEV-17614,
we can actually remove the change altogether.
This reverts commit 8a346f31b9 (MDEV-17073)
and mysql/mysql-server@c93b0d9a97 while
keeping the test cases.
- mysqltest didn't free read_command_buf
- wait_for_slave_param did write different things to the log if valgrind
was used.
- Table open cache should not write the initial variable value as it
can depend on the configuration or if valgrind is used
- A variable in GetResult was used uninitalized
- Include the valgrind suppressions from the FB upstream
- Use HAVE_Valgrind, not HAVE_Purify (like the rest of MariaDB code does)
The call to DisownData() is now actually disabled under Valgrind
... produces "bytes lost" warnings
When rocksdb_validate_update_cf_options() returns an error,
the update won't happen.
Free the copy of the string in this case.
MDEV-17717
Assertion `!table->pos_in_locked_tables' failed in tc_release_table on
flushing RocksDB table under SERIALIZABLE
MDEV-17998
Deadlock and eventual Assertion `!table->pos_in_locked_tables' failed
in tc_release_table on KILL_TIMEOUT
MDEV-19591
Assertion `!table->pos_in_locked_tables' failed in tc_release_table upon
altering table into S3 under lock.
The problem was that thd->open_tables->pos_in_locked_tables was not reset
when alter table failed to reopen a locked table.
- pcretest.c could use macro with side effect
- maria_chk could access freed memory
- Initialized some variables that could be accessed uninitalized
- Fixed compiler warning in my_atomic-t.c
The general reason why innodb redo log file is limited by 512G is that
log_block_convert_lsn_to_no() returns value limited by 1G. But there is no
need to have unique log block numbers in log group. The fix removes 512G
limit and limits log group size by
(uint32_t maximum value) * (minimum page size), which, in turns, can be
removed if fil_io() is no longer used for innodb redo log io.
- The commit ab6dd77408 wrongly sets the
condition inside innobase_srv_conc_enter_innodb(). Problem is that
InnoDB makes the thread to sleep indefinitely if it is a replication
slave thread.
Thanks to Sujatha Sivakumar for contributing the replication test case.
Non-owning reference to elements.
Use it as function argument instead of pointer+size pair or instead of
const std::vector<T>.
Do not use it for strings!
More info is here http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Or just google about it.
A combination of:
* lots of include'd test files where each has "--source
include/have_rocksdb.inc"
* for each such occurrence, MTR adds testsuite's arguments into server
arguments
* which hits some limit on the length of argv array on Windows, causing
the server to get garbage data in the last argument.
Work around this by commenting out one of the totally redundant
"source include/have_rocksdb.inc" lines.
- Fix the LooseScan code to support storage engines that return
HA_ERR_END_OF_FILE if the index scan goes out of provided range
bounds
- Add a DBUG_EXECUTE_IF("force_group_by",...) to allow a test to
force a LooseScan
- Adjust rocksdb.group_min_max test not to use features not present
in MariaDB 10.2 (e.g. optimizer_trace. In MariaDB 10.4 it's present
but it doesn't meet the assumptions that the test makes about it
- Adjust the test result file:
= MariaDB doesn't support "Enhanced Loose Scan" that FB/MySQL has
= MariaDB has different cost calculations.
In addition to files and Mongo collections, JSON as well as XML and CSV data can be retrieved
from the net as answers from REST queries. Because it uses and external package (cpprestsdk)
this is currently available only to MariaDB servers compiled from source.
-- Add the REST support when Microsoft Casablanca package (cpprestsdk) is installed.
-- Also include some changes specific to MariaDB 10.3.
modified: storage/connect/CMakeLists.txt
-- Add conditional REST support
-- Added string options HTTP and URI.
-- Added added internal table type TAB_REST.
modified: storage/connect/ha_connect.cc
modified: storage/connect/mycat.cc
modified: storage/connect/mycat.h
modified: storage/connect/plgdbsem.h
-- Fix MDEV-19648 Variable connect_conv_size doesn't change
-- Change Variable wrong block parameter from 8169 to 1.
-- Also change connect_conv_size default value to 1024.
modified: storage/connect/ha_connect.cc
-- Avoid possible buffer overflow
-- In particular by the function ShowValue.
modified: storage/connect/tabdos.cpp
modified: storage/connect/tabfmt.cpp
modified: storage/connect/value.cpp
modified: storage/connect/value.h
-- Add some cast to avoid some compiler warnings
modified: storage/connect/filamdbf.cpp
-- Fix some C++ error
modified: storage/connect/javaconn.cpp
modified: storage/connect/jmgoconn.cpp
modified: storage/connect/plugutil.cpp
-- Miscellaneous Typo and warning suppressing changes
modified: storage/connect/connect.cpp
modified: storage/connect/connect.h
modified: storage/connect/filamvct.cpp
modified: storage/connect/inihandl.cpp
modified: storage/connect/jsonudf.cpp
modified: storage/connect/libdoc.cpp
modified: storage/connect/tabjson.cpp
modified: storage/connect/tabtbl.cpp
modified: storage/connect/tabxml.cpp
modified: storage/connect/user_connect.cc
modified: storage/connect/user_connect.h
-- Update failing test results and disbling
modified: storage/connect/mysql-test/connect/disabled.def
modified: storage/connect/mysql-test/connect/r/dir.result
modified: storage/connect/mysql-test/connect/r/grant.result
modified: storage/connect/mysql-test/connect/r/jdbc.result
modified: storage/connect/mysql-test/connect/r/jdbc_postgresql.result
modified: storage/connect/mysql-test/connect/r/xml.result
modified: storage/connect/mysql-test/connect/r/xml2.result
modified: storage/connect/mysql-test/connect/r/xml2_mult.result
modified: storage/connect/mysql-test/connect/r/xml_mult.result
-- Add an option
modified: storage/connect/mysql-test/connect/t/grant.test
* Made make_versioned_*() proxies inline;
* Renamed truncate_history to delete_history
Part of:
MDEV-19814 Server crash in row_upd_del_mark_clust_rec or Assertion
`update->n_fields < ulint(table->n_cols + table->n_v_cols)' failed in
upd_node_t::make_versioned_helper
This removes the test combination
rocksdb_rpl.mdev12179 'innodb,row,row-write-committed-slave-gtid-optimized'
for which the server failed to start due to the invalid parameter
slave_gtid_info=optimized.
This was broken in 5173e396ff
fts_sync(): Remove the constant parameter has_dict=false.
fts_sync_table(): Remove the constant parameter has_dict=false,
and the redundant parameter unlock_cache = !wait.
Make wait=true the default parameter.
PROBLEM
-------
Index defined on a virtual column whose base column was in a fk
relation was not getting updated. This is because while getting
the updated field information from the update vector of the parent
table we were comparing the column number of the base column (for
virtual column) in child table with the associated column number
in the parent table. There was a mismatch in this column number
because of which this update field information was skipped and
subsequently index was not getting updated.
FIX
The function pointer ut_timer() was only used by the
InnoDB defragmenting thread. Let InnoDB use a single monotonic
high-precision timer, my_interval_timer() [in nanoseconds],
occasionally wrapped by microsecond_interval_timer().
srv_defragment_interval: Change from "timer" units to nanoseconds.
This concludes the InnoDB time function cleanup that was
motivated by MDEV-14154. Only ut_time_ms() will remain for now,
wrapping my_interval_timer().
The FTS optimizer thread made a false assumption that time(NULL)
is monotonic. The system clock can be adjusted to the past,
for example if the hardware clock was drifting to the future,
and it was adjusted by NTP.
fts_slot_t::interval_time: Replace with the constant
FTS_OPTIMIZE_INTERVAL_IN_SECS.
fts_slot_t::last_run, fts_slot_t::completed: Clarify the
documentation.
fts_optimize_get_time_limit(): Remove a type cast, and
add a FIXME comment about domain mismatch.
fts_optimize_compact(), fts_optimize_words(): Limit the time
also when the current time has been moved to the past.
fts_optimize_table_bk(): Check for wrap-around.
fts_optimize_how_many(): Check for wrap-around, and remove the
failing assertions.
fts_is_sync_needed(): Remove a redundant call to time(NULL).
lock_t::requested_time: Document what the field is used for.
lock_t::wait_time: Document that the field is only used for
diagnostics and may be garbage if the system time is being adjusted.
srv_slot_t::suspend_time: Document that this is duplicating
trx_lock_t::wait_started.
lock_table_print(), lock_rec_print(): Declare in static scope.
Add a parameter for the current time.
lock_deadlock_check_and_resolve(), lock_deadlock_lock_print(),
lock_deadlock_joining_trx_print():
Add a parameter for the current time.
srv_slot_t::suspend_time, os_aio_slot_t::reservation_time,
sync_cell_t::reservation_time: Explain what could happen
if the system time has is being adjusted.
fts_sync_t::start_time: Document that the field is mostly unused.
Replace ut_usectime() with my_interval_timer(),
which is equivalent, but monotonically counting nanoseconds
instead of counting the microseconds of real time.
os_event_wait_time_low(): Use my_hrtime() instead of ut_usectime().
FIXME: Set a clock attribute on the condition variable that allows
a monotonic clock to be chosen as the time base, so that the wait
is immune to adjustments of the system clock.
1) Whenever purge thread tries to remove the secondary virtual index
entry, purge thread acquires metadata lock for the table and release
dict_operation_lock. After that, it retries the secondary index
deletion if MDL acquired successfully.
2) Inside row_vers_old_has_index_entry(), Change the safe_to_purge
to unsafe_to_purge goto statement. So it can be more appropriate to
return true if it is unsafe_to_purge.
3) Previously, row_vers_old_has_index_entry() returns false if InnoDB
fetched the MDL on the table for the first time. This check(two cases)
should checked only during purge thread. In row_purge_poss_sec(), again
InnoDB checks whether the MDL fetched for the first time. If it is then
InnoDB retry the secondary index deletion logic. So in that case,
InnoDB have to clean up the memory used inside row_vers_old_has_index_entry()
and shouldn't care about return value.
This is motivated by PS-5221 in
percona/percona-server@2817c561fc
The coarser-precision ut_time() will still refer to the
system clock, meaning that bad things can happen if the
real time clock is adjusted backwards.
Valgrind started supporting CRC32 instruction starting with version
3.6.1, released in 2011. Thus remove the fallback to software
implementation in case running under Valgrind.
There is one directly applicable change to InnoDB:
commit 739f5239f1 in the
5.5 branch will be merged before the next MariaDB releases.
Another potentially applicable change will be tracked
separately as MDEV-20126.
Thus, here we only update the InnoDB version number and do
not change anything else.
Problem: Clients running different values for auto_increment_increment
and doing concurrent inserts leads to "Duplicate key error" in one of them.
Analysis:
When auto_increment_increment value is reduced in a session,
InnoDB uses last auto_increment_increment value
to recalculate the autoinc value.
In case, some other session has inserted a value
with different auto_increment_increment, InnoDB recalculate
autoinc values based on current session previous auto_increment_increment
instead of considering the auto_increment_increment used for last insert
across all session
Fix:
revert 7acdf29cb4
a.k.a. 7c12a9e5c3
as it causing the bug.
Reviewed By:
Bin <bin.x.su@oracle.com>
Kevin <kevin.lewis@oracle.com>
RB#21777
Note: In MariaDB Server, earlier changes in
ae5bc05988
for MDEV-533 require that the original test in
mysql/mysql-server@1ccd472d63
be adjusted for MariaDB.
Also, ef47b62551 (MDEV-8827)
had to be reverted after the upstream fix had been backported.
Problem:
=======
Autoincrement value gives duplicate values because of the following reasons.
(1) In InnoDB handler function, current autoincrement value is not changed
based on newly set auto_increment_increment or auto_increment_offset variable.
(2) Handler function does the rounding logic and changes the current
autoincrement value and InnoDB doesn't aware of the change in current
autoincrement value.
Solution:
========
Fix the problem(1), InnoDB always respect the auto_increment_increment
and auto_increment_offset value in case of current autoincrement value.
By fixing the problem (2), handler layer won't change any current
autoincrement value.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
RB: 13748
This is a regression due to MDEV-16515 that affects some versions in
the MariaDB 10.1 server series starting with 10.1.35, and possibly
all versions starting with 10.2.17, 10.3.8, and 10.4.0.
The idea of MDEV-16515 is to allow DROP TABLE to be interrupted,
in case it was stuck due to some concurrent activity. We already
made some cases of internal DROP TABLE immune to kill in MDEV-18237,
MDEV-16647, MDEV-17470. We must include the cleanup of
CREATE TABLE...SELECT in the list of such internal DROP TABLE.
ha_innobase::delete_table(): Pass create_failed=true if the current
SQL statement is CREATE, so that the table will be dropped.
row_drop_table_for_mysql(): If create_failed=true, do not allow
the operation to be interrupted.
This is the race between DELETE and INSERT (or other any two operations accessing to the table).
What should happen in good case:
1. ALTER TABLE is issued. vc_templ->default_rec is initialized with temporary share's default_fields
2. temporary share is freed, but datadict is still there, with garbage in vc_templ->default_rec
3. DELETE is issued. It is first after ALTER TABLE finished.
4. ha_innobase::open() is called, ib_table->get_ref_count() should be one
5. we reinitialize vc_templ, so no garbage anymore
What actually happens:
3. DELETE is issued.
4. ha_innobase::open() is called and ib_table->get_ref_count() is 1
5. INSERT (or SELECT etc.) is issued in parallel
6. ha_innobase::open() is called and ib_table->get_ref_count() is 1
7. we check ib_table->get_ref_count() and it is 2 in both threads when we want reinitialize vc_templ
8. garbage is there
Fix:
* Do not store pointers to SHARE memory in table dict, copy it instead.
* But then we don't need to refresh it each time when refcount=1.
btr_push_update_extern_fields(): Add a parameter for the original number
of fields in the record before btr_cur_trim(). Assume that this function
will only be called for the clustered index, which is the only index
that can contain off-page columns.
trx_undo_prev_version_build(), btr_cur_pessimistic_update():
Only invoke btr_push_update_extern_fields() for the clustered index.
Fix this patch (two csets before):
Disable rocksdb.shutdown test
It was introduced by this patch in fb/mysql-5.6:
Author: Yoshinori Matsunobu <yoshinori@fb.com>
Date: Mon Jun 10 14:09:28 2019 -0700
Extending SHUTDOWN query to support read_only/aborting
Summary:
This diff extends SHUTDOWN query to support the following
features.
- Aborting with any specified exit code (range is 0..255).
If nothing is specified or 0 is given, it does default clean
shutdown. If 1+ is given, exits with the given error code
immediately. This is helpful to shutting down instance
even if it is stuck somewhere.
MariaDB doesn't support SHUTDOWN statement or have any other way
to exit the server process.
It was introduced by this patch in fb/mysql-5.6:
Author: Yoshinori Matsunobu <yoshinori@fb.com>
Date: Mon Jun 10 14:09:28 2019 -0700
Extending SHUTDOWN query to support read_only/aborting
Summary:
This diff extends SHUTDOWN query to support the following
features.
- Aborting with any specified exit code (range is 0..255).
If nothing is specified or 0 is given, it does default clean
shutdown. If 1+ is given, exits with the given error code
immediately. This is helpful to shutting down instance
even if it is stuck somewhere.
MariaDB doesn't support SHUTDOWN statement or have any other way
to exit the server process.
Use RocksDB debug sync points to introduce a sync delay. This
commits to get grouped even when the datadir is on ramdisk.
For some unclear reason the effect is visible on write_prepared
but not write_committed, so run the test only with write_prepared.
Problem:
=======
Checksum fields can have value as zero. In that case, InnoDB falsely
consider that page should be all zeroes. It leads to wrong detection of page
corruption.
Solution:
========
Remove the condition that checks if checksum fields are zero then
page should be all zeroes.
which are pointed to the table being altered
Problem:
========
InnoDB failed to change the column name present in foreign key cache
for instant add column. So it leads to column mismatch for the consecutive
rename of column.
Solution:
=========
Evict the foreign key information from cache and load the foreign
key information again for instant operation.
Problem:
=======
During online alter, fts tokenization thread uses new table page size
to read the externally stored page from old table. If the alter changes
the page size then it leads to failure of alter table.
Solution:
=========
fts tokenization thread should use old table page size to read the
externally stored page from old table.
Problem:
========
There is a possibility that there can be more concurrent DMLs While the
alter table thread is waiting for upgrading to MDL_EXCLUSIVE before commit phase.
In commit phase, InnoDB acquires dict_operation_lock and it already holds MDL_EXCLUSIVE
on the table. After that, InnoDB applies the concurrent DML logs in commit phase.
This could lead to blocking of the following things:
1) DML on the particular table (due to MDL_EXCLUSIVE on the table)
2) InnoDB DDLs (due to dict_operation_lock)
3) Purge thread, stats thread, the master thread (due to dict_operation_lock)
Fix:
====
Apply the concurrent DML logs in commit phase but before acquiring
dict_operation_lock in commit phase. It makes sure that (2), (3) can't be
blocked for longer time.
Basic idea of the patch: disallow creating tables which allow to create
rows which are too big to insert. In other words, if user created a table user
should never see an errors like 'can not insert row as it is too big for current
page size'.
SET innodb_strict_mode=OFF; will allow to create very long tables and only a
warning will be issued.
dict_table_t::get_overflow_field_local_len(): this function lets know a maximum
local field len for overflow fields for every file and row format.
innobase_check_column_length(): improve name to too_big_key_part_length()
and reuse in a different part of code.
create_table_info_t::prepare_create_table(): add check for maximum allowed
key part length to keep ALGORITHM=COPY behavior similar to ALGORITHM=INPLACE
behavior. Affected test is innodb.strict_mode
Rename dict_index_too_big_for_tree() to
dict_index_t::rec_potentially_too_big(): copy overflow-related size computation
from dtuple_convert_big_rec(). A lot of tests was changed because of that.
I wonder whether users will complain about it?
Test innodb.max_record_size tests dict_index_t::rec_potentially_too_big()
for different row formats and page sizes.
MDEV-19486 and one more similar bug appeared because handler::write_row() interface
welcomes to modify buffer by storage engine. But callers are not ready for that
thus bugs are possible in future.
handler::write_row():
handler::ha_write_row(): make argument const
Use on every virtual function override.
ha_innobase: mark a final
ha_innobase::bas_ext(): remove as unused
ha_innobase::get_cascade_foreign_key_table_list: remove as unused
ha_innobase::end_stmt(): merge into ha_innobase::reset()
According to the code, it was Windows specific "simulated AIO"
workaround. The simulated s not supported on Windows anymore.
Thus, remove the dead code
Many InnoDB internal variables and counters were only exposed
in an unstructured fashion via SHOW ENGINE INNODB STATUS.
Expose more variables via SHOW STATUS. Many of these were
exported in XtraDB.
Also, introduce SHOW_SIZE_T and use the proper size for
exporting the InnoDB variables.
Remove some unnecessary indirection via export_vars, and
bind some variables directly.
dict_sys_t::rough_size(): Replaces dict_sys_get_size()
and includes the hash table sizes.
This is based on a contribution by Tony Liu from ServiceNow.
Shorten some VARCHAR attributes to a more reasonable length.
INNODB_METRICS: Rename the column STATUS to ENABLED, and make it Boolean.
Replace with INT(1) many Boolean attributes that were declared as VARCHAR
containing 'NO','YES','disabled','enabled','Uninitialized','Initialized'.
Replace some VARCHAR attributes with ENUM.
Replace some BIGINT with INT when 32 bits are sufficient.
Remove INNODB_SYS_TABLESPACES.SPACE_TYPE. The type of a tablespace
can be derived from the tablespace ID. A fixed number is used for
the system tablespace and the temporary tablespace. All other tablespaces
are single-table or single-partition tablespaces.
i_s_locks_row_t::lock_type, lock_get_type_str(): Remove.
This is a redundant field. Table and record locks can be
distinguished by whether i_s_locks_row_t::lock_index is NULL.
fill_trx_row(): Do not unnecessarily copy the constant strings that
trx->op_info is pointing to.
i_s_locks_row_t::lock_mode: Replace string with integer.
lock_get_mode_str(), lock_get_trx_id(), lock_get_trx(): Remove.
field_store_ulint(): Remove.
In row_ins_foreign_check_on_constraint(), clustered index record is being passed to wsrep_append_foreign_key() after releasing the latch. If a record has been changed by other thread in the meantime then it could lead to a crash when
wsrep_rec_get_foreign_key () tries to access the record.
row_ins_foreign_check_on_constraint
Use cascade->pcur->old_rec instead of clust_rec.
row_ins_check_foreign_constraint
Add missing error printout.
The test innodb.leaf_page_corrupted_during_recovery
fails on buildbot with
Warning 1406 Data too long for column 'line' at row 10
line
len 16384; hex ...
because of a page dumps that InnoDB is generating for a corrupted page
Since this test is using debug instrumentation, we will solve the
issue by disabling page dumps in debug builds altogether. Users of
debug builds will likely know how to extract page dumps in other means.
Page dump output could sometimes be useful when diagnosing problems
that users are facing. Hence we will keep the page dump output in
non-debug (release) builds.