Problem:
=======
P1) Conditional jump or move depends on uninitialised value(s)
sql_ex_info::init(char const*, char const*, bool) (log_event.cc:3083)
code: All the following variables are not initialized.
----
return ((cached_new_format != -1) ? cached_new_format :
(cached_new_format=(field_term_len > 1 || enclosed_len > 1 ||
line_term_len > 1 || line_start_len > 1 || escaped_len > 1)));
P2) Conditional jump or move depends on uninitialised value(s)
Rows_log_event::Rows_log_event(char const*, unsigned
int, Format_description_log_event const*) (log_event.cc:9571)
Code: Uninitialized values is reported for 'var_header_len' variable.
----
if (var_header_len < 2 || event_len < static_cast<unsigned
int>(var_header_len + (post_start - buf)))
P3) Conditional jump or move depends on uninitialised value(s)
Table_map_log_event::pack_info(Protocol*) (log_event.cc:11553)
code:'m_table_id' is uninitialized.
----
void Table_map_log_event::pack_info(Protocol *protocol)
...
size_t bytes= my_snprintf(buf, sizeof(buf), "table_id: %lu (%s.%s)",
m_table_id, m_dbnam, m_tblnam);
Fix:
===
P1 - Fix)
Initialize cached_new_format,field_term_len, enclosed_len, line_term_len,
line_start_len, escaped_len members in default constructor.
P2 - Fix)
"var_header_len" is initialized by reading the event buffer. In case of an
invalid event the buffer will contain invalid data. Hence added a check to
validate the event data. If event_len is smaller than valid header length
return immediately.
P3 - Fix)
'm_table_id' within Table_map_log_event is initialized by reading data from
the event buffer. Use 'VALIDATE_BYTES_READ' macro to validate the current
state of the buffer. If it is invalid return immediately.
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
The problem was originally stated in
http://bugs.mysql.com/bug.php?id=82212
The size of an base64-encoded Rows_log_event exceeds its
vanilla byte representation in 4/3 times.
When a binlogged event size is about 1GB mysqlbinlog generates
a BINLOG query that can't be send out due to its size.
It is fixed with fragmenting the BINLOG argument C-string into
(approximate) halves when the base64 encoded event is over 1GB size.
The mysqlbinlog in such case puts out
SET @binlog_fragment_0='base64-encoded-fragment_0';
SET @binlog_fragment_1='base64-encoded-fragment_1';
BINLOG @binlog_fragment_0, @binlog_fragment_1;
to represent a big BINLOG.
For prompt memory release BINLOG handler is made to reset the BINLOG argument
user variables in the middle of processing, as if @binlog_fragment_{0,1} = NULL
is assigned.
Notice the 2 fragments are enough, though the client and server still may
need to tweak their @@max_allowed_packet to satisfy to the fragment
size (which they would have to do anyway with greater number of
fragments, should that be desired).
On the lower level the following changes are made:
Log_event::print_base64()
remains to call encoder and store the encoded data into a cache but
now *without* doing any formatting. The latter is left for time
when the cache is copied to an output file (e.g mysqlbinlog output).
No formatting behavior is also reflected by the change in the meaning
of the last argument which specifies whether to cache the encoded data.
Rows_log_event::print_helper()
is made to invoke a specialized fragmented cache-to-file copying function
which is
copy_cache_to_file_wrapped()
that takes care of fragmenting also optionally wraps encoded
strings (fragments) into SQL stanzas.
my_b_copy_to_file()
is refactored to into my_b_copy_all_to_file(). The former function
is generalized
to accepts more a limit argument to constraint the copying and does
not reinitialize anymore the cache into reading mode.
The limit does not do any effect on the fully read cache.
Methods:
- Item_user_var_as_out_param::print_for_load()
- sql_exchange::escaped_given(void)
Parameters:
- sql_exchange in write_execute_load_query_log_event()
- sql_exchange in mysql_load()
- sql_exchange in Load_log_event::Load_log_event()
Also, removing cast to "char*" in a few places in
Load_log_event::Load_log_event()
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
This preserves const str for constant strings
Other things
- A few variables where changed from LEX_STRING to LEX_CSTRING
- Incident_log_event::Incident_log_event and record_incident where
changed to take LEX_CSTRING* as an argument instead of LEX_STRING
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
Main problem was that no log-event print function checked for disk
full error on the IO_CACHE.
All changes in this patch only affects mysqlbinlog, not the server!
- Changed all log-event print functions to return 1 on error
- Fixed memory usage when not using --flashback.
- Added printing of number of rows in row events. Can be disabled with
--print-row-count=0
- Print annotated rows when using mysqlbinlog --short-form
- Fixed that mysqlbinlog --debug works
- Fixed create_drop_binlog.test test failure
- Reorganized fields in PRINT_EVENT_INFO to be according to size to
optimize storage
- Don't change print_row_event_position or print_row_counts if set by user
- Remove some testing of argument to my_free is 0
- base64-output=never is now supported and works in all context
- Updated help information for --base64-output and --short-form
- print_row_count is now on by default. Reset automatically if --short-form
is used
- Removed obsolote warning for mysql 5.6.0
- More DBUG_PRINT for mysqltest.cc
- my_b_write_byte() now checks for flush failures. This fixed a memory
overrun on disk full
- my_b_printf() now returns 1 on failure, 0 on ok. This simplifies code
and no old code was using the old return value of my_b_printf().
- my_b_Write_backtick_quote() now returns 1 on failure and 0 on ok
- Fixed some error conditions in log printing that was not previously
handled.
- Slave_rows_error_report() can now handle longlong positions
- Write_on_release_cache() rewritten so that we can detect errors
on flush. Not depending on automatic release anymore.
- Changed types for Pos and End_log_pos to 64 bit in SHOW BINLOG EVENTS
- Fixed that copy_event_cache_to_string_and_reinit() works with strings
longer than 4G (Changed to use LEX_STRING instead of String)
- Restricted binlog_rows_event_max_size to UINT32_MAX-1 as String's are
anyway restricted to UINT32_MAX
- Fixed bug in rpl_binlog_state::write_to_iocache() which hide write
failures (duplicate variable name)
- Fixed bug in String::append if original string was not allocated
- Stop mysqlbinlog output at once if there is an error.
- Before printing error message, flush result file. This ensures that
the error message is printed last. (Easier to find)
Problem
-------
For one-statement contains multiple row events, Flashback didn't reverse the
sequence of row events inside one-statement.
Solution
--------
Using a new array 'events_in_stmt' to store the row events of one-statement,
when parsed the last one event, then print from the last one to the first one.
In the same time, fixed another bug, without -vv will not insert the table_map
into print_event_info->m_table_map, then change_to_flashback_event() will not
execute because of Table_map_log_event is empty.
The XtraDB storage engine was already replaced by InnoDB
and disabled in MariaDB Server 10.2. Let us remove it altogether
to avoid dragging dead code around.
Replace some references to XtraDB with references to InnoDB.
rpl_get_position_info(): Remove.
Remove the mysql-test-run --suite=percona, because it only contains
tests specific to XtraDB, many of which were disabled already in
earlier MariaDB versions.
- Old sequence code forced row based replication for any statements that
refered to a sequence table. What is new is that row based replication
is now sequence aware:
- NEXT VALUE is now generating a short row based event with only
next_value and round being replicated.
- Short row based events are now on the slave updated as trough
SET_VALUE(sequence_name)
- Full row based events are on the slave updated with a full insert,
which is practically same as ALTER SEQUENCE.
- INSERT on a SEQUENCE table does now a EXCLUSIVE LOCK to ensure that
it is logged in binary log before any following NEXT VALUE calls.
- Enable all sequence tests and fixed found bugs
- ALTER SEQUENCE doesn't anymore allow changes that makes the next_value
outside of allowed range
- SEQUENCE changes are done with TL_WRITE_ALLOW_WRITE. Because of this
one can generate a statement for MyISAM with both
TL_WRITE_CONCURRENT_INSERT and TL_WRITE_ALLOW_WRITE. To fix a warning
I had to add an extra test in thr_lock.c for this.
- Removed UPDATE of SEQUENCE (no need to support this as we
have ALTER SEQUENCE, which takes the EXCLUSIVE lock properly.
- Removed DBUG_ASSERT() in MDL_context::upgrade_shared_lock. This was
removed upstream in MySQL 5.6 in 72f823de453.
- Simplified test in decided_logging_format() by using sql_command_flags()
- Fix that we log DROP SEQUENCE correctly.
- Fixed that Aria works with SEQUENCE
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
Working features:
CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name
[ INCREMENT [ BY | = ] increment ]
[ MINVALUE [=] minvalue | NO MINVALUE ]
[ MAXVALUE [=] maxvalue | NO MAXVALUE ]
[ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ]
ENGINE=xxx COMMENT=".."
SELECT NEXT VALUE FOR sequence_name;
SELECT NEXTVAL(sequence_name);
SELECT PREVIOUS VALUE FOR sequence_name;
SELECT LASTVAL(sequence_name);
SHOW CREATE SEQUENCE sequence_name;
SHOW CREATE TABLE sequence_name;
CREATE TABLE sequence-structure ... SEQUENCE=1
ALTER TABLE sequence RENAME TO sequence2;
RENAME TABLE sequence TO sequence2;
DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names
Missing features
- SETVAL(value,sequence_name), to be used with replication.
- Check replication, including checking that sequence tables are marked
not transactional.
- Check that a commit happens for NEXT VALUE that changes table data (may
already work)
- ALTER SEQUENCE. ANSI SQL version of setval.
- Share identical sequence entries to not add things twice to table list.
- testing insert/delete/update/truncate/load data
- Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence)
- Write documentation for NEXT VALUE / PREVIOUS_VALUE
- NEXTVAL in DEFAULT
- Ensure that NEXTVAL in DEFAULT uses database from base table
- Two NEXTVAL for same row should give same answer.
- Oracle syntax sequence_table.nextval, without any FOR or FROM.
- Sequence tables are treated as 'not read constant tables' by SELECT; Would
be better if we would have a separate list for sequence tables so that
select doesn't know about them, except if refereed to with FROM.
Other things done:
- Improved output for safemalloc backtrack
- frm_type_enum changed to Table_type
- Removed lex->is_view and replaced with lex->table_type. This allows
use to more easy check if item is view, sequence or table.
- Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers
that want's to support sequences
- Added handler calls:
- engine_name(), to simplify getting engine name for partition and sequences
- update_first_row(), to be able to do efficient sequence implementations.
- Made binlog_log_row() global to be able to call it from ha_sequence.cc
- Added handler variable: row_already_logged, to be able to flag that the
changed row is already logging to replication log.
- Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify
deny_updates_if_read_only_option()
- Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy
- Moved code for add_table_options() out from sql_show.cc::show_create_table()
- Added String::append_longlong() and used it in sql_show.cc to simplify code.
- Added extra option to dd_frm_type() and ha_table_exists to indicate if
the table is a sequence. Needed by DROP SQUENCE to not drop a table.
This happens because the master writes a table_map event to the binary log, but no row event.
The slave has a check that there should always be a row event if there was a table_map event, which
causes a crash.
Fixed by remembering in the cache what kind of events are logged
and ignore cached statements which is just a table map event.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
==== Description ====
Flashback can rollback the instances/databases/tables to an old snapshot.
It's implement on Server-Level by full image format binary logs (--binlog-row-image=FULL), so it supports all engines.
Currently, it’s a feature inside mysqlbinlog tool (with --flashback arguments).
Because the flashback binlog events will store in the memory, you should check if there is enough memory in your machine.
==== New Arguments to mysqlbinlog ====
--flashback (-B)
It will let mysqlbinlog to work on FLASHBACK mode.
==== New Arguments to mysqld ====
--flashback
Setup the server to use flashback. This enables binary log in row mode
and will enable extra logging for DDL's needed by flashback feature
==== Example ====
I have a table "t" in database "test", we can compare the output with "--flashback" and without.
#client/mysqlbinlog /data/mysqldata_10.0/binlog/mysql-bin.000001 -vv -d test -T t --start-datetime="2013-03-27 14:54:00" > /tmp/1.sql
#client/mysqlbinlog /data/mysqldata_10.0/binlog/mysql-bin.000001 -vv -d test -T t --start-datetime="2013-03-27 14:54:00" -B > /tmp/2.sql
Then, importing the output flashback file (/tmp/2.log), it can flashback your database/table to the special time (--start-datetime).
And if you know the exact postion, "--start-postion" is also works, mysqlbinlog will output the flashback logs that can flashback to "--start-postion" position.
==== Implement ====
1. As we know, if binlog_format is ROW (binlog-row-image=FULL in 10.1 and later), all columns value are store in the row event, so we can get the data before mis-operation.
2. Just do following things:
2.1 Change Event Type, INSERT->DELETE, DELETE->INSERT.
For example:
INSERT INTO t VALUES (...) ---> DELETE FROM t WHERE ...
DELETE FROM t ... ---> INSERT INTO t VALUES (...)
2.2 For Update_Event, swapping the SET part and WHERE part.
For example:
UPDATE t SET cols1 = vals1 WHERE cols2 = vals2
--->
UPDATE t SET cols2 = vals2 WHERE cols1 = vals1
2.3 For Multi-Rows Event, reverse the rows sequence, from the last row to the first row.
For example:
DELETE FROM t WHERE id=1; DELETE FROM t WHERE id=2; ...; DELETE FROM t WHERE id=n;
--->
DELETE FROM t WHERE id=n; ...; DELETE FROM t WHERE id=2; DELETE FROM t WHERE id=1;
2.4 Output those events from the last one to the first one which mis-operation happened.
For example: