The problem was originally stated in
http://bugs.mysql.com/bug.php?id=82212
The size of an base64-encoded Rows_log_event exceeds its
vanilla byte representation in 4/3 times.
When a binlogged event size is about 1GB mysqlbinlog generates
a BINLOG query that can't be send out due to its size.
It is fixed with fragmenting the BINLOG argument C-string into
(approximate) halves when the base64 encoded event is over 1GB size.
The mysqlbinlog in such case puts out
SET @binlog_fragment_0='base64-encoded-fragment_0';
SET @binlog_fragment_1='base64-encoded-fragment_1';
BINLOG @binlog_fragment_0, @binlog_fragment_1;
to represent a big BINLOG.
For prompt memory release BINLOG handler is made to reset the BINLOG argument
user variables in the middle of processing, as if @binlog_fragment_{0,1} = NULL
is assigned.
Notice the 2 fragments are enough, though the client and server still may
need to tweak their @@max_allowed_packet to satisfy to the fragment
size (which they would have to do anyway with greater number of
fragments, should that be desired).
On the lower level the following changes are made:
Log_event::print_base64()
remains to call encoder and store the encoded data into a cache but
now *without* doing any formatting. The latter is left for time
when the cache is copied to an output file (e.g mysqlbinlog output).
No formatting behavior is also reflected by the change in the meaning
of the last argument which specifies whether to cache the encoded data.
Rows_log_event::print_helper()
is made to invoke a specialized fragmented cache-to-file copying function
which is
copy_cache_to_file_wrapped()
that takes care of fragmenting also optionally wraps encoded
strings (fragments) into SQL stanzas.
my_b_copy_to_file()
is refactored to into my_b_copy_all_to_file(). The former function
is generalized
to accepts more a limit argument to constraint the copying and does
not reinitialize anymore the cache into reading mode.
The limit does not do any effect on the fully read cache.
always logged properly with binlog_row_image=MINIMAL
There are two issues fixed in this commit.
The first is an observation of a multi-table UPDATE binlogged
in row-format in binlog_row_image=MINIMAL mode. While the UPDATE aims
at a table with an ON-UPDATE attribute its binlog after-image misses
to record also installed default value.
The reason for that turns out missed marking of default-capable fields
in TABLE::write_set.
This is fixed to mark such fields similarly to 10.2's MDEV-10134 patch (db7edfed17)
that introduced it. The marking follows up 93d1e5ce0b841bed's idea
to exploit TABLE:rpl_write_set introduced there though,
and thus does not mess (in 10.1) with the actual MDEV-10134 agenda.
The patch makes formerly arg-less TABLE::mark_default_fields_for_write()
to accept an argument which would be TABLE:rpl_write_set.
The 2nd issue is extra columns in in binlog_row_image=MINIMAL before-image
while merely a packed primary key is enough. The test main.mysqlbinlog_row_minimal
always had a wrong result recorded.
This is fixed to invoke a function that intended for read_set
possible filtering and which is called (supposed to) in all type of MDL, UPDATE
including; the test results have gotten corrected.
At *merging* from 10.1->10.2 the 1st "main" part of the patch is unnecessary
since the bug is not observed in 10.2, so only hunks from
sql/sql_class.cc
are required.
ASAN noticed a freed memory access during EXECUTE in this script:
PREPARE stmt FROM "SELECT 'x' ORDER BY NAME_CONST( 'f', 'foo' )";
EXECUTE stmt;
In case of a PREPARE statement, all Items, including Item_name_const,
are created on Prepared_statement::main_mem_root.
Item_name_const::fix_fields() did not take this into account
and could allocate the value of Item::name on a wrong memory root,
in this code:
if (is_autogenerated_name)
{
set_name(thd, item_name->c_ptr(), (uint) item_name->length(),
system_charset_info);
}
When fix_fields() is called in the reported SQL script, THD's arena already
points to THD::main_mem_root rather than to Prepared_statement::main_mem_root,
so Item::name was allocated on THD::main_mem_root.
Then, at the end of the dispatch_command() for the PREPARE statement,
THD::main_mem_root got cleared. So during EXECUTE, Item::name
pointed to an already freed memory.
This patch changes the code to set the implicit name for Item_name_const
at the constructor time rather than at fix_fields time. This guarantees
that Item_name_const and its Item::name always reside on the same memory root.
Note, this change makes the code for Item_name_const symmetric with other
constant-alike items that set their default implicit names at the constructor
call time rather than at fix_fields() time:
- Item_string
- Item_int
- Item_real
- Item_decimal
- Item_null
- Item_param
Problem:
========
Server fails to notify the engine by not setting the ADD_PK_INDEX and
DROP_PK_INDEX When there is a
i) Change in candidate for primary key.
ii) New candidate for primary key.
Fix:
====
Server sets the ADD_PK_INDEX and DROP_PK_INDEX while doing alter for the
above problematic case.
32 bit int
Row-based slave applier could not parse correctly the table id when
the value exceeded the max of 32 bit unsigned int.
The reason turns out in that the being parsed value placeholder
was sized as 4 bytes.
The type is fixed to ulonglong.
Additionally the patch works around Rows_log_event::m_table_id 4 bytes
size on 32 bits platforms. In case of last_table_id value overflows
the 4 byte max, there won't be the zero value for m_table_id generated
and the first wrapped-around value is one, this is thanks to excluding
UINT_MAX32 + 1 from TABLE_SHARE::table_map_id.
dict_sys_get_size(): Replace the time-consuming loop with
a crude estimate that can be computed without holding any mutex.
Even before dict_sys->size was removed in MDEV-13325,
not all memory allocations by the InnoDB data dictionary cache
were being accounted for. One example is foreign key constraints.
Another example is virtual column metadata, starting with 10.2.
on startup innodb is checking whether files "ib_logfileN"
(for N from 1 to 100) exist, and whether they're readable.
A non-existent file aborted the scan.
A directory instead of a file made InnoDB to fail.
Now it treats "directory exists" as "file doesn't exist".
When InnoDB is invoking posix_fallocate() to extend data files, it
was missing a call to fsync() to update the file system metadata.
If file system recovery is needed, the file size could be incorrect.
When the setting innodb_flush_method=O_DIRECT_NO_FSYNC
that was introduced in MariaDB 10.0.11 (and MySQL 5.6) is enabled,
InnoDB would wrongly skip fsync() after extending files.
Furthermore, the merge commit d8b45b0c00
inadvertently removed XtraDB error checking for posix_fallocate()
which this fix is restoring.
fil_flush(): Add the parameter bool metadata=false to request that
fil_buffering_disabled() be ignored.
fil_extend_space_to_desired_size(): Invoke fil_flush() with the
extra parameter. After successful posix_fallocate(), invoke
os_file_flush(). Note: The bookkeeping for fil_flush() would not be
updated the posix_fallocate() code path, so the "redundant"
fil_flush() should be a no-op.
In the function QUICK_RANGE_SELECT::init_ror_merged_scan we create a seperate handler if the handler in
head->file cannot be reused. The flag free_file tells us if we have a seperate handler or not.
There are cases where you might create a handler and then there might be a failure(running ALTER)
and then we have to revert the handler back to the original one. The code does that
but it does not reset the flag 'free_file' in this case.
Also backported f2c418079d.
row_drop_table_for_mysql(): Fix a regression introduced in MDEV-16515.
Similar to the follow-up fixes MDEV-16647 and MDEV-17470, we must make
the internal tables of FULLTEXT INDEX immune to kills, to avoid noise
and resource leakage on DROP TABLE or ALTER TABLE. (Orphan internal tables
would be dropped at the next InnoDB startup only.)
Problem:
========
MLOG_FILE_WRITE_CRYPT_DATA redo log fails to apply type for
the crypt_data present in the space. While processing the double-write
buffer pages, page fails to decrypt. It leads to warning message.
Fix:
====
Set the type while parsing MLOG_FILE_WRITE_CRYPT_DATA redo log.
If type and length is of invalid type then mark it as corrupted.
If galera.galera_gtid_slave_sst_rsync is repeated more than once it will fail due incorrect GTID position. After stopping SLAVE node reset also GTID_SLAVE_POS variable.
This mutex can be freed when server shuts down (when thread_count goes down to 0)
, but it is still used inside THD::~THD() when Statement_map is destroyed.
The fix is to call Statement_map::reset() at the point where thread_count
is still positive, and avoid locking LOCK_prepared_stmt_count in THD
destructor.
When performing a hash search via HASH_SEARCH we first look at a key of a node
and then at its pointer to the next node in chain. If we have those in one cache
line instead of a two we reduce memory reads.
I found dict_table_t, fil_space_t and buf_page_t suitable for such improvement.
During database recovery, a transaction with wsrep XID is
recovered from InnoDB in prepared state. However, when the
transaction is looked up with trx_get_trx_by_xid() in
innobase_commit_by_xid(), trx->xid gets cleared in
trx_get_trx_by_xid_low() and commit time serialization history
write does not update the wsrep XID in trx sys header for
that recovered trx. As a result the transaction gets
committed during recovery but the wsrep position does not
get updated appropriately.
As a fix, we preserve trx->xid for Galera over transaction
commit in recovery phase.
Fix authored by: Teemu Ollakka (GaleraCluster) and Marko Mäkelä.
modified: mysql-test/suite/galera/disabled.def
modified: mysql-test/suite/galera/r/galera_gcache_recover_full_gcache.result
modified: mysql-test/suite/galera/r/galera_gcache_recover_manytrx.result
modified: mysql-test/suite/galera/t/galera_gcache_recover_full_gcache.test
modified: mysql-test/suite/galera/t/galera_gcache_recover_manytrx.test
modified: storage/innobase/trx/trx0trx.cc
modified: storage/xtradb/trx/trx0trx.cc
When we have a nested subquery then a subquery that was a dependent subquery
may change to an independent one when we optimizer the inner subqueries.
This is handled st_select_lex::optimize_unflattened_subqueries.
Currently a subquery that was changed to independent from dependent after optimization
phase incorrectly shows dependent in the output of Explain, this happens because we
don't update used_tables for the WHERE clause, ON clause, etc after the optimization phase.
If an encrypted table is created during backup, then
mariabackup --backup could wrongly fail.
This caused a failure of the test mariabackup.huge_lsn once on buildbot.
This is due to the way how InnoDB creates .ibd files. It would first
write a dummy page 0 with no encryption information. Due to this,
xb_fil_cur_open() could wrongly interpret that the table is not encrypted.
Subsequently, page_is_corrupted() would compare the computed page
checksum to the wrong checksum. (There are both "before" and "after"
checksums for encrypted pages.)
To work around this problem, we introduce a Boolean option
--backup-encrypted that is enabled by default. With this option,
Mariabackup will assume that a nonzero key_version implies that the
page is encrypted. We need this option in order to be able to copy
encrypted tables from MariaDB 10.1 or 10.2, because unencrypted pages
that were originally created before MySQL 5.1.48 could contain nonzero
garbage in the fields that were repurposed for encryption.
Later, MDEV-18128 would clean up the way how .ibd files are created,
to remove the need for this option.
page_is_corrupted(): Add missing const qualifiers, and do not check
space->crypt_data unless --skip-backup-encrypted has been specified.
xb_fil_cur_read(): After a failed page read, output a page dump.
This is a regression after MDEV-13671.
The bug is related to key part prefix lengths wich are stored in SYS_FIELDS.
Storage format is not obvious and was handled incorrectly which led to data
dictionary corruption.
SYS_FIELDS.POS actually contains prefix length too in case if any key part
has prefix length.
innobase_rename_column_try(): fixed prefixes handling
Tests for prefixed indexes added too.
Closes#1063