Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.
Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.
Notes:
1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
When doing condition pushdown from HAVING into WHERE,
Item_equal::create_pushable_equalities() calls
item->set_extraction_flag(IMMUTABLE_FL) for constant items.
Then, Item::cleanup_excluding_immutables_processor() checks for this flag
to see if it should call item->cleanup() or leave the item as-is.
The failure happens when a constant item has a non-constant one inside it,
like:
(tbl.col=0 AND impossible_cond)
item->walk(cleanup_excluding_immutables_processor) works in a bottom-up
way so it
1. will call Item_func_eq(tbl.col=0)->cleanup()
2. will not call Item_cond_and->cleanup (as the AND is constant)
This creates an item tree where a fixed Item has an un-fixed Item inside
it which eventually causes an assertion failure.
Fixed by introducing this rule: instead of just calling
item->set_extraction_flag(IMMUTABLE_FL);
we call Item::walk() to set the flag for all sub-items of the item.
Implicit system-versioned table does not contain system fields in SHOW
CREATE. Therefore after mysqldump recovery such table has system
fields in the last place in frm image. The original table meanwhile
does not guarantee these system fields on last place because adding
new fields via ALTER TABLE places them last. Thus the order of fields
may be different between master and slave, so row-based replication
may fail.
To fix this on ALTER TABLE we now place system-invisible fields always
last in frm image. If the table was created via old revision and has
an incorrect order of fields it can be fixed via any copy operation of
ALTER TABLE, f.ex.:
ALTER TABLE t1 FORCE;
To check the order of fields in frm file one can use hexdump:
hexdump -C t1.frm
Note, the replication fails only when all 3 conditions are met:
1. row-based or mixed mode replication;
2. table has new fields added via ALTER TABLE;
3. table was rebuilt on some, but not all nodes via mysqldump image.
Otherwise it will operate properly even with incorrect order of
fields.
vers_info->hist_part retained stale value after ROLLBACK. The
algorithm in vers_set_hist_part() continued iteration from that value.
The simplest solution is to process partitions each time from start
for LIMIT in vers_set_hist_part().
When single-row subquery fails with "Subquery reutrns more than 1 row"
error, it will raise an error and return NULL.
On the other hand, Item_singlerow_subselect sets item->maybe_null=0
for table-less subqueries like "(SELECT not_null_value)" (*)
This discrepancy (item with maybe_null=0 returning NULL) causes the
code in Type_handler_decimal_result::make_sort_key_part() to crash.
Fixed this by allowing inference (*) only when the subquery is NOT a
UNION.
The --skip-write-binlog message was confusing that it only had
an effect if the galera was enabled. There are uses beyond galera
so we apply SET SESSION SQL_LOG_BIN=0 as implied by the option
without being conditional on the wsrep status.
We also with --skip-write-binlog actually check the session @@WSREP_ON
variable rather than the global server variable.
Since 10.6, the wsrep_mode could replicate Aria and MyISAM, in which
case no change to innodb and back is needed.
By removing the conditions, we can use LOCK TABLES in a general case
improving the load speed of Aria (MDEV-23326), regardless of the
skip-write-binlog flag. The only case where we don't use LOCK TABLES is
when we are replicating via Innodb, because wsrep_on=1 and wsrep_mode
doesn't contain REPLICATE_ARIA{,MYISAM}. This uses an Innodb transaction
instead. When replicating via InnoDB we change the table engine type
back to what it was originally.
By removing the \d and other syntax that requires parsing by
the mariadb client, we can use the generated SQL more generally, like
in the embedded server.
We also save and restore the SQL_LOG_BIN and WSREP_ON session server
variables so this can be included in the same session as other data
without taking into changes in state.
Remove wsrep.mysql_tzinfo_to_sql_symlink{,_skip} tests as they offered
no additional coverage beyond main.mysql_tzinfo_to_sql_symlink (no
server testing was done).
Add galera.mariadb_tzinfo_to_sql to actually test the replication
of tzinfo data through galera.
The conditional executable comment around /*M!100602 ...
START TRANSACTION .. LOCK TABLES.. */ is so that we can provide tzinfo
files (MDEV-27113, MDBF-389) and in the case that a user uses it on a
pre-10.6 server version it will still work. Both START TRANSACTION and
LOCK TABLES are not supported in prepared statements in MariaDB versions
earlier than 10.6.2.
Reviewed by Brandon Nesterenko
- Simplified Chinese translation added
- Character encoding is gdk
-- gdk covers more characters
-- gdk includes both Simplified and Traditional
-- best option I think, may need to work along with other locale
settings
- Other cleanup
-- Within each error, messages are sorted according to language code
-- More consistent formatting (8 spaces proceeding each translation)
-- jps removed as duplicate of jpn translation
This should be a good starting point. More refinement is appreciated,
and needed down the road.
English "containt" (sic) spelling fixes on ER_FK_NO_INDEX_{CHILD,PARENT}
resulting in mtr test case adjustments.
Edited/reviewed by Daniel Black
Analysis: When --old option is used, the corresponding --old-mode variables
are set but warning is not given.
Fix: Use sql_print_warning() to give warning.
Analysis: There are 2 server variables- "old_mode" and "old". "old" is no
longer needed as "old_mode" has replaced it (however still used in some places
in the code). "old_mode" and "old" has same purpose- emulate behavior from
previous MariaDB versions. So they can be merged to avoid confusion.
Fix: Deprecate "old" variable and create another mode for @@old_mode to mimic
behavior of previous "old" variable. Create specific modes for specifix task
that --old sql variable was doing earlier and use the new modes instead.
New Feature:
============
Extend mariadb-binlog command-line tool to allow for filtering
events using GTID domain and server ids. The functionality mimics
that of a replica server’s DO_DOMAIN_IDS, IGNORE_DOMAIN_IDS, and
IGNORE_SERVER_IDS from CHANGE MASTER TO. For completeness, this
patch additionally adds the option --do-server-ids as an alias for
--server-id, which now accepts a list of server ids instead of a
single one.
Example usage:
mariadb-binlog --do-domain-ids=2,3,4 --do-server-ids=1,3
master-bin.000001
Functional Notes:
1. --do-domain-ids cannot be combined with --ignore-domain-ids
2. --do-server-ids cannot be combined with --ignore-server-ids
3. A domain id filter can be combined with a server id filter
4. When any new filter options are combined with the
--gtid-strict-mode option, events from excluded domains/servers are
not validated.
5. Domain/server id filters can be combined with GTID ranges (i.e.
specifications of --start-position and --stop-position). However,
because the --stop-position option implicitly undertakes filtering
to only output events within its range of domains, when combined
with --do-domain-ids or --ignore-domain-ids, output will consist of
the intersection between the filters. Specifically, with
--do-domain-ids and --stop-position, only events with domain ids
present in both argument lists will be output. Conversely, with
--ignore-domain-ids and --stop-position, only events with domain ids
present in the --stop-position and absent from the
--ignore-domain-ids options will be output.
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
because CONTEXT_ANALYSIS_ONLY_VCOL_EXPR can be used only for,
exactly, context analysys. Items fixed that way cannot be evaluated.
But vcols are going to be evaluated, so they have to be fixed properly,
for evaluation.
column generated using date_format() and if()
vcol_info->expr is allocated on expr_arena at parsing stage. Since
expr item is allocated on expr_arena all its containee items must be
allocated on expr_arena too. Otherwise fix_session_expr() will
encounter prematurely freed item.
When table is reopened from cache vcol_info contains stale
expression. We refresh expression via TABLE::vcol_fix_exprs() but
first we must prepare a proper context (Vcol_expr_context) which meets
some requirements:
1. As noted above expr update must be done on expr_arena as there may
be new items created. It was a bug in fix_session_expr_for_read() and
was just not reproduced because of no second refix. Now refix is done
for more cases so it does reproduce. Tests affected: vcol.binlog
2. Also name resolution context must be narrowed to the single table.
Tested by: vcol.update main.default vcol.vcol_syntax gcol.gcol_bugfixes
3. sql_mode must be clean and not fail expr update.
sql_mode such as MODE_NO_BACKSLASH_ESCAPES, MODE_NO_ZERO_IN_DATE, etc
must not affect vcol expression update. If the table was created
successfully any further evaluation must not fail. Tests affected:
main.func_like
Reviewed by: Sergei Golubchik <serg@mariadb.org>
1. moved fix_vcol_exprs() call to open_table()
mysql_alter_table() doesn't do lock_tables() so it cannot win from
fix_vcol_exprs() from there. Tests affected: main.default_session
2. Vanilla cleanups and comments.
UNION ALL queries are a subject of optimization introduced in MDEV-334
when creation of a temporary table is skipped.
While there is a check for this optimization in Explain_union::print_explain()
there was no such in Explain_union::print_explain_json(). This resulted in
printing irrelevant data like:
"union_result": {
"table_name": "<union2,3>",
"access_type": "ALL",
"r_loops": 0,
"r_rows": null
in case when creation of the temporary table was actually optimized out.
This commits adds a check whether the temporary table was actually created
during the UNION ALL processing and eliminates printing of the irrelevant data.
When fixing vcols, fix_fields might call convert_const_to_int().
And that will try to read the field value (from record[0]).
Mark the table as having no data to prevent that, because record[0]
is not initialized yet.
the bug was that in_vector array in Item_func_in was allocated in the
statement arena, not in the table->expr_arena.
revert part of the 5acd391e8b. Instead, change the arena correctly
in fix_all_session_vcol_exprs().
Remove TABLE_ARENA, that was introduced in 5acd391e8b to force
item tree changes to be rolled back (because they were allocated in the
wrong arena and didn't persist. now they do)
Range can be thought about in similar manner as wildcard (*) where
more than one elements are processed. To implement range notation, extended
json parser to parse the 'to' keyword and added JSON_PATH_ARRAY_RANGE for
path type. If there is 'to' keyword then use JSON_PATH_ARRAY range for
path type along with existing type.
This new integer to store the end index of range is n_item_end.
When there is 'to' keyword, store the integer in n_item_end else store in
n_item.
* Item_default_value::fix_fields creates a copy of its argument's field.
* Field::default_value is changed when its expression is prepared in
unpack_vcol_info_from_frm()
This means we must unpack any vcol expression that includes DEFAULT(x)
strictly after unpacking x->default_value.
To avoid building and solving this dependency graph on every table open,
we update Item_default_value::field->default_value after all vcols
are unpacked and fixed.
This patch can be viewed as combination of two parts:
1) Enabling '-' in the path so that the parser does not give out a warning.
2) Setting the negative index to a correct value and returning the
appropriate value.
1) To enable using the negative index in the path:
To make the parser not return warning when negative index is used in path
'-' needs to be allowed in json path characters. P_NEG is added
to enable this and is made recognizable by setting the 45th index of
json_path_chr_map[] to P_NEG (instead of previous P_ETC)
because 45 corresponds to '-' in unicode.
When the path is being parsed and '-' is encountered, the parser should
recognize it as parsing '-' sign, so a new json state PS_NEG is required.
When the state is PS_NEG, it means that a negative integer is
going to be parsed so set is_negative_index of current step to 1 and
n_item is set accordingly when integer is encountered after '-'.
Next proceed with parsing rest of the path and get the correct path.
Next thing is parsing the json and returning correct value.
2) Setting the negative index to a correct value and returning the value:
While parsing json if we encounter array and the path step for the array
is a negative index (n_item < 0), then we can count the number of elements
in the array and set n_item to correct corresponding value. This is done in
json_skip_array_and_count.
on_table_fill_finished() should always be done at the end of open()
even if result is not Select_materialize but (for example)
Select_fetch_into_spvars.
on_table_fill_finished() should always be done at the end of open()
even if result is not Select_materialize but (for example)
Select_fetch_into_spvars.
This commit fixes problems with parsing ipv6 addresses given via
the wsrep_sst_receive_address and wsrep_node_address options.
Also, this commit removes extra lines in the configuration files
in the mtr test suites for Galera related to these parameters.
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
so following set_handlers() shouldn't modify error_log_handler_list
without some protection
`m_status == DA_ERROR' failed on SELECT after setting tmp_disk_table_size.
Analysis: Mismatch in number of warnings between "194 warnings" vs
"64 rows in set" is because of max_error_count variable which has default
value of 64.
About the corrupted tables, the error that occurs because of insufficient
tmp_disk_table_size variable is not reported correctly and we continue to
execute the statement. But because the previous error (about table being
full)is not reported correctly, this error moves up the stack and is
wrongly reported as parsing error later on while parsing frm file of one
of the information schema table. This parsing error gives corrupted table
error.
As for the innodb error, it occurs even when tmp_disk_table_size is not
insufficient is default but the internal error handler takes care of it
and the error doesn't show. But when tmp_disk_table_size is insufficient,
the fatal error which wasn't reported correctly moves up the stack so
internal error handler is not called. So it shows errors.
Fix: Report the error correctly.
This is a temporary fix for 10.2.
This problem was permanently fixed in 10.9 under terms of MDEV-27743.
This patch should propagate up to 10.8 then null-merged to 10.9.
This crash happens on a combination of multiple conditions:
- There is a thead#1 running an "ANALYZE FORMAT=JSON" query for a
"SELECT .. FROM INFORMATION_SCHEMA.COLUMNS WHERE .. "
- The WHERE clause contains a stored function call, say f1().
- The WHERE clause is built in the way so that the function f1()
is never actually called, e.g.
WHERE .. AND (TRUE OR f1()=expr)
- The database contains multiple VIEWs that have the function f1() call,
e.g. in their <select list>
- The WHERE clause is built in the way so that these VIEWs match
the condition.
- There is a parallel thread#2 running. It creates or drops or recreates
some other stored routine, say f2(), which is not used in the ANALYZE query.
It effectively invalidates the stored routine cache for thread#1
without locking.
Note, it is important that f2() is NOT used by ANALYZE query.
Otherwise, thread#2 would be locked until the ANALYZE query
finishes.
When all of the above conditions are met, the following happens:
1. thread#1 starts the ANALYZE query. It notices a call for the stored function
f1() in the WHERE condition. The function f1() gets parsed and cached
to the SP cache. Its address also gets assigned to Item_func_sp::m_sp.
2. thread#1 starts iterating through all tables that
match the WHERE condition to find the information about their columns.
3. thread#1 processes columns of the VIEW v1.
It notices a call for f1() in the VIEW v1 definition.
But f1() is already cached in the step#1 and it is up to date.
So nothing happens with the SP cache.
4. thread#2 re-creates f2() in a non-locking mode.
It effectively invalidates the SP cache in thread#1.
5. thread#1 processes columns of the VIEW v2.
It notices a call for f1() in the VIEW v2 definition.
It also notices that the cached version of f1() is not up to date.
It frees the old definition of f1(), parses it again, and puts a
new version of f1() to the SP cache.
6. thread#1 finishes processing rows and generates the JSON output.
When printing the "attached_condition" value, it calls
Item_func_sp::print() for f1(). But this Item_func_sp links
to the old (freed) version of f1().
The above scenario demonstrates that Item_func_sp::m_sp can point to an
alredy freed instance when Item_func_sp::func_name() is called,
so accessing to Item_sp::m_sp->m_handler is not safe.
This patch rewrites the code to use Item_func_sp::m_handler instead,
which is always reliable.
Note, this patch is only a cleanup for MDEV-28166 to quickly fix the regression.
It fixes MDEV-28267. But it does not fix the core problem:
The code behind I_S does not take into account that the SP
cache can be updated while evaluating rows of the COLUMNS table.
This is a corner case and it never happens with any other tables.
I_S.COLUMNS is very special.
Another example of the core problem is reported in MDEV-25243.
The code accesses to Item_sp::m_sp->m_chistics of an
already freed m_sp, again. It will be addressed separately.