MDEV-5217: Incorrect MyISAM event execution order causing incorrect parallel replication
In parallel replication, if transactions A,B group-commit together on the
master, we can execute them in parallel on a replication slave. But then, if
transaction C follows on the master, on the slave, we need to be sure that
both A and B have completed before starting on C to be sure to avoid
conflicts.
The necessary wait is implemented such that B waits for A to commit before it
commits itself (thus preserving commit order). And C waits for B to commit
before it itself can start executing. This way C does not start until both A
and B have completed.
The wait for B's commit on A happens inside the commit processing. However, in
the case of MyISAM with no binlog enabled on the slave, it appears that no
commit processing takes place (since MyISAM is non-transactional), and thus
the wait of B for A was not done. This allowed C to start before A, which can
lead to conflicts and incorrect replication.
Fixed by doing an extra wait for A at the end of B before signalling C.
MDEV-5217: Unlock of de-allocated mutex
There was a race in the code for wait_for_commit::wakeup().
Since the waiter does a dirty read of the waiting_for_commit
flag, it was possible for the waiter to complete and deallocate
the wait_for_commit object while the waitee was still running
inside wakeup(). This would cause the waitee to access invalid
memory.
Fixed by putting an extra lock/unlock in the destructor for
wait_for_commit, to ensure that waitee has finished with the
object before it is deallocated.
MDEV-5217: Incorrect event pos update leading to corruption of reading of events from relay log
The rli->event_relay_log_pos was sometimes undated incorrectly when using
parallel replication, especially around relay log rotates. This could cause
the SQL thread to seek into an invalid position in the relay log, resulting in
errors about invalid events or even random corruption in some cases.
MDEV-5217: Last_sql_error lost in parallel replication.
For some reason, the query execution code in log_event.cc call
rli->clear_error for each event (part of clear_all_errors()).
This causes a problem in parallel replication, where the
execution in one worker thread could clear the error set by
another thread, causing the SQL thread to stop but leaving no
error visible in SHOW SLAVE STATUS.
There seems to be no reason to clear the global error code
in Relay_log_info for each event execution, from code review
and from running the test suite. So remove this clearing of
the error code to make things work also in the parallel case.
MDEV-5217: SQL thread hangs during stop if error occurs in the middle of an event group
Normally, when we stop the slave SQL thread in parallel replication, we want
the worker threads to continue processing events until the end of the current
event group. But if we stop due to an error that prevents further events from
being queued, such as an error reading the relay log, no more events can be
queued for the workers, so they have to abort even if they are in the middle
of an event group. There was a bug that we would deadlock, the workers
waiting for more events to be queued for the event group, the SQL thread
stopped and waiting for the workers to complete their current event group
before exiting.
Fixed by now signalling from the SQL thread to all workers when it is about
to exit, and cleaning up in all workers when so signalled.
This patch fixes one of multiple problems reported in MDEV-5217.
- thread_pool_size command line option upper limit increased to 100 000
(same as for max_connections)
- thread_pool_size system variable upper limit is maximum of 128 or
the value given at command line
- thread groups are now allocated dynamically
Different limit for command line option and system variable was done to
avoid additional mutex for all_groups and threadpool_max_size.
extend table names discovery (ha_discover_table_names() and Discovered_table_list) to return
or optionally filter out temporary tables ("#sql..."). SHOW commands and I_S tables
typically want temp table filtered out, while DROP DATABASE wants to see them too.
additonally, remove the supression for the warning "Invalid (old?) table or database name"
from mtr, and add it to .test files as needed (we need to test that this warning
does *not* happen in drop.test)
Updated --help text to declare --slave-parallel-threads as an alpha feature
mysql-test/r/mysqld--help.result:
Updated --help text
sql/slave.cc:
Added missing trans_retries++ that caused rpl_deadlock_innodb.test to fail.
This is safe as this part is never run in parallel.
sql/sql_base.cc:
Fixed temporary table handling (part of merge)
sql/sys_vars.cc:
Updated --help text to declare --slave-parallel-threads as an alpha feature
The merge is still missing a few hunks related to temporary tables and
InnoDB log file size. The associated code did not seem to exist in
10.0, so the merge of that needs more work. Until this is fixed, there
are a number of test failures as a result.
In parallel replication, there are two kinds of events which are
executed in different ways.
Normal events that are part of event groups/transactions are executed
asynchroneously by being queued for a worker thread.
Other events like format description and rotate and such are executed
directly in the driver SQL thread.
If the direct execution of the other events were to update the old-style
position, then the position gets updated too far ahead, before the normal
events that have been queued for a worker thread have been executed. So
this patch adds some special cases to prevent such position updates ahead
of time, and instead queues dummy events for the worker threads, so that
they will at an appropriate time do the position updates instead.
(Also fix a race in a test case that happened to trigger while running
tests for this patch).
Fix a couple of issues in MDEV-4506, Parallel replication:
- Missing mysql_cond_signal(), which could cause hangs.
- Fix incorrect update of old-style replication position.
- Change assertion to error handling (can trigger on manipulated/
corrupt binlog).
There was 2 problems:
1) coping/moving of the same type (usually casting) as sizeof() (solved in different ways depends on the cause);
2) using 'const' in SSL_CTX::getVerifyCallback() which return object (not reference) and so copy of the object will be created and 'const' has no sens.
In parallel replication, when the IO thread switches relay log,
the SQL thread re-opens the current relaylog and seeks to the
current position. There was a race that would cause it to
sometimes seek to the wrong position, causing corruption and
crash.
Analysis:
st_select_lex_unit::prepare() computes can_skip_order_by as TRUE.
As a result join->prepare() gets called with order == NULL, and
doesn't do name resolution for the inner ORDER clause. Due to this
the prepare phase doesn't detect that the query references non-exiting
function and field.
Later join->optimize() calls update_used_tables() for a non-resolved
Item_field, which understandably has no Field object. This call results
in a crash.
Solution:
Resolve unnecessary ORDER BY clauses to detect if they reference non-exising
objects. Then remove such clauses from the JOIN object.
MDEV-5189: Error handling in parallel replication.
Fix error handling in parallel worker threads when a query fails:
- Report the error to the error log.
- Return the error back, and set rli->abort_slave.
- Stop executing more events after the error.
GRANT ... IDENTIFIED BY [ PASSWORD ] ...
GRANT ... IDENTIFIED VIA ... [ USING ... ]
GRANT ... REQUIRE ...
GRANT ... MAX_xxx ...
SET PASSWORD FOR ... = ...
Two problems were fixed:
1. When not in GTID mode (master_use_gtid=no), then we must not apply events
in different domains in parallel (in non-GTID mode we are not capable of
restarting at different points in different domains).
2. When transactions B and C group commit together, but after and separate
from A, we can apply B and C in parallel, but both B and C must not start
until A has committed. Fix sub_id to be globally increasing (not just
per-domain increasing) so that this wait (which is based on sub_id) can be
done correctly.
Do not update relay-log.info and master.info on disk after every event
when using GTID mode:
- relay-log.info and master.info are not crash-safe, and are not used
when slave restarts in GTID mode (slave connects with GTID position
instead and immediately rewrites the file with the new, correct
information found).
- When using GTID and parallel replication, the position in
relay-log.info is misleading at best and simply wrong at worst.
- When using parallel replication, the fact that every single
transaction needs to do a write() syscall to the same file is
likely to become a serious bottleneck.
The files are still written at normal slave stop.
In non-GTID mode, the files are written as normal (this is needed to
be able to restart after slave crash, even if such restart is then not
crash-safe, no change).
- Backport MySQL's fix: do set ha_partition::m_pkey_is_clustered for ha_partition
objects created with handler->clone() call.
- Also, include a testcase.
Fix some more parts of old-style position updates.
Now we save in rgi some coordinates for master log and relay log, so
that in do_update_pos() we can use the right set of coordinates with
the right events.
The Rotate_log_event::do_update_pos() is fixed in the parallel case
to not directly update relay-log.info (as Rotate event runs directly
in the driver SQL thread, ahead of actual event execution). Instead,
group_master_log_file is updated as part of do_update_pos() in each
event execution.
In the parallel case, position updates happen in parallel without
any ordering, but taking care that position is not updated backwards.
Since position update happens only after event execution this leads
to the right result.
Also fix an access-after-free introduced in an earlier commit.
Only allow NONE instead of a role name in SET ROLE.
Don't allow PUBLIC as a role name anywhere (to be fixed later)
Fix db_access calculations on SET ROLE
Reduce the size of role_grants and parent_grantee per-user/role arrays.
Fix the wording and specify the correct sqlstate for ER_INVALID_ROLE
functions for traversing the role graph in either direction.
merging of global, database, table, column, routine privileges.
debug status variables for counting number of privilege merges.
tests.
mainly to avoid the pattern of
* get username/hostname/rolename
* optionally find the corresponding ACL_USER and ACL_ROLE
* allocate memory, concatenate username/hostname/rolename
* call a function passing only this memory as an argument
** use concatenated username/etc to find ACL_USER and ACL_ROLE again
** do something
* free the object
Also to undo push_dynamic we use pop_dynamic now,
not a linear search/scan through the dynamic array.
as a bonus, role@ is now an invalid way to refer to a role.
sql/sp.cc:
don't split "user@host" string in db_load_routine, because the caller needs to
generate it from user and host. instead pass user and host directly into db_load_routine
sql/sql_parse.cc:
1. REVOKE ALL doesn't need invoker.
2. make sp_process_definer() reusable
sql/sql_trigger.cc:
don't duplicate the code from sp_process_definer(), reuse it
sql/sql_view.cc:
don't duplicate the code from sp_process_definer(), reuse it
than an empty host '' is the same as any-host wildcard '%'.
Replace '' with '%' in the parser (for GRANT ... foo@'') and when loading grant tables.
Side effect: one cannot have foo@'' and foo@'%' both at the same time
(but one can have foo@'%' and foo@'%%')
because parser might modify the lex->user (e.g. set lex->user-password).
switch to use LEX_STRING current_user string, and also change other similar constants
to be LEX_STRING's for consistency.
mysql-test/r/acl_roles_show_grants.result:
one can do SHOW GRANTS for himself
mysql-test/t/acl_roles_set_role-table-column-priv.test:
correct error message
mysql-test/t/acl_roles_show_grants.test:
one can SHOW GRANTS for himself
sql/sql_acl.cc:
bugfixing:
* don't assign with && - it can shortcut and the second assignment won't be executed
* correct the test in check_grant_all_columns() - want_access should not be modified
*
sql/sql_cmd.h.OTHER:
add new commands at the end
sql/sql_db.cc:
don't call acl_get() if all privileges are already satisfied
(crashes when run with --skip-grants, because acl data stuctures aren't initialized)
sql/sql_parse.cc:
* test for current_user in get_current_user()
* map explicitly specified user@host to current_user
The output is not completely correct due to recursive role grants not
being completly implemented. However, this will help with testing the
implementation of set role with recursive grants.
The function now performs a DEPTH FIRST SEARCH on the role graph.
At various key points: on_start, on_open, on_cycle, on_finish,
the function calls one of the corresponding functions passed as parameters.
The bug was caused by not renaming the role if it was previously
modified by the handle_grant_struct(ROLE_ACL,...) call.
The same function used find_acl_role and would search for the already
renamed role when it handled ROLES_MAPPINGS_HASH. This caused it to not rename
the role/user correctly.
Also fixed issue with drop role not clearing internal memory entry
for that role. The issue was due to a condition introduced in handle_grant_data
Updated testsuite to also check the possible error conditions.
Also fixed a bug regarding the HASH iteration. It previously got
the stop condition from a different hashtable and this caused errors
when the hash sizes were different.
It will only hold _valid_ entries for as long as it held in memory. Any change
regarding acl_users or acl_roles in memory should update the structure
immediately. This is why the rebuild_roles_mappings no longer removes invalid
entries.
In order to keep things consistent with the existing code, the following jobs
are assigned to each function:
The role of rebuild_roles_mappings is to recreate the links between users and
roles. Any other updates are to be done in the functions:
handle_grant_*
This change prepares the code for the next step, which is cascading updates.
Changed ACL_USER.user from char * to LEX_STRING.
Refactored every section that made use of ACL_USER.user as a char*.
This was done so as to be able to quickly check the hash_key of the acl_user.
BNL and BNLH joins pre-filter the records from a joined table via JOIN_TAB::cache_select->cond.
There is no need to re-evaluate the same conditions via JOIN_TAB::select_cond. This patch removes
the duplicated conditions from the top-level conjuncts of each pushed condition.
The added "Using where" in few EXPLAINs is due to taking into account tab->cache_select->cond
in addition to tab->select_cond in JOIN::save_explain_data_intern.