When the slave processes the master restart format_description event,
parallel replication needs to complete any prior events before processing
the restart event (which closes temporary tables and such stuff).
This happens in wait_for_workers_idle(), however it was not waiting long
enough. The wait was using wait_for_prior_commit(), but at that points table
can still be open. This lead to assertion in this case.
So change wait_for_workers_idle() to wait until all worker threads have
reached finish_event_group(), at which point all tables should have been
closed.
- Changing Comp_creator::create() and create_swap() to return
Item_bool_rowready_func2 instead of Item_bool_func2, as they
can never return neither Item_func_like nor Item_func_xor
- Changing the first argument of Comp_create::create() and create_swap()
from THD to MEM_ROOT, so the method implementations can now reside in
item_cmpfunc.h instead of item_cmpfunc.cc and thus make the code slightly
easier to read.
Make sure that when we publish the crypt_data we access the
memory cache of the tablespace crypt_data. Make sure that
crypt_data is stored whenever it is really needed.
All this is not yet enough in my opinion because:
sql/encryption.cc has DBUG_ASSERT(scheme->type == 1) i.e.
crypt_data->type == CRYPT_SCHEME_1
However, for InnoDB point of view we have global crypt_data
for every tablespace. When we change variables on crypt_data
we take mutex. However, when we use crypt_data for
encryption/decryption we use pointer to this global
structure and no mutex to protect against changes on
crypt_data.
Tablespace encryption starts in fil_crypt_start_encrypting_space
from crypt_data that has crypt_data->type = CRYPT_SCHEME_UNENCRYPTED
and later we write page 0 CRYPT_SCHEME_1 and finally whe publish
that to memory cache.
Part 1: first 2 cases of valgrind complain. context_analysis_only can be used on non-started LEX (opening tables)
obviouse fixes in DBUG and is_lex_started assignment.
When detecting a statement that can make use of ha_delete_all_rows(),
we refrained from running the statement when being presented
with the analyze or explain prefix.
* Extract it into the "encryption_scheme" service.
* Make these engines to use the service, remove duplicate code.
* Change MY_AES_xxx error codes, to return them safely
from encryption_scheme_encrypt/decrypt without conflicting
with ENCRYPTION_SCHEME_KEY_INVALID error
Step#5: changing the function remove_eq_conds() into a virtual method in Item.
It removes 6 virtual calls for Item_func::type(), and adds only 2
virtual calls for Item***::remove_eq_conds().
Avoid calling current_thd from thd_kill_level(). This reduces number of
pthread_getspecific() calls from 776 to 354.
Also thd_kill_level(NULL) is not permitted anymore: this saves one condition.
Added THD argument to select_result and all derivative classes.
This reduces number of pthread_getspecific calls from 796 to 776 per OLTP RO
transaction.
sql_alloc() has additional costs compared to direct mem_root allocation:
- function call: it is defined in a separate translation unit and can't be
inlined
- it needs to call pthread_getspecific() to get THD::mem_root
It is called dozens of times implicitely at least by:
- List<>::push_back()
- List<>::push_front()
- new (for Sql_alloc derived classes)
- sql_memdup()
Replaced lots of implicit sql_alloc() calls with direct mem_root allocation,
passing through THD pointer whenever it is needed.
Number of sql_alloc() calls reduced 345 -> 41 per OLTP RO transaction.
pthread_getspecific() overhead dropped 0.76 -> 0.59
sql_alloc() overhed dropped 0.25 -> 0.06
delete_dynamic() was called 9-11x per OLTP RO query + 3x per BEGIN/COMMIT.
3 calls were performed by LEX_MASTER_INFO. Added condition to call those only
for CHANGE MASTER.
1 call was performed by lock_table_names()/Hash_set/my_hash_free(). Hash_set was
supposed to be used for DDL and LOCK TABLES to gather database names, while it
was initialized/freed for DML too. In fact Hash_set didn't do any useful job
here. Hash_set was removed and MDL requests are now added directly to the list.
The rest 5-7 calls are done by optimizer, mostly by Explain_query and friends.
Since dynamic arrays are used in most cases, they can hardly be optimized.
my_hash_free() overhead dropped 0.02 -> out of radar.
delete_dynamic() overhead dropped 0.12 -> 0.04.
PROFILING::start_new_query() optimizations:
- no need to check "current": added assertion instead
- "enabled" now means "is enabled currently" instead of "was enabled at query
start". Old meaning was useless, new meaning echoes OPTION_PROFILING so
that start_new_query() can be defined in sql_profile.h.
- remnants of start_new_query() moved to sql_profile.h so it can be inlined
PROFILING::start_new_query() overhead dropped 0.08% -> out of radar.
PROFILING::set_query_source() optimizations:
- no need to check "enabled": !enabled && current is impossible
- remnants of set_query_source() moved to sql_profile.h so it can be inlined
PROFILING::set_query_source() overhead dropped 0.02% -> out of radar.
PROFILING::finish_current_query() optimizations:
- moved "current" check out to sql_profile.h so it can be inlined
PROFILING::finish_current_query() overhead dropped 0.10% -> out of radar.
THD::enter_stage() optimizations:
- stage backup code moved to THD::backup_stage(), saves one condition
- moved check for "new_stage" out to callers that actually need it
- remnants of enter_stage() moved to sql_class.h so it can be inlined
THD::enter_stage() overhead dropped 0.48% -> 0.07%.
PROFILING::status_change() optimizations:
- "status_arg" is now checked by QUERY_PROFILE::new_status()
- no need to check "enabled": !enabled && current is impossible
- remnants of status_change() moved to sql_profile.h so it can be inlined
PROFILING::status_change() overhead dropped 0.1% -> out of radar.
The bug was that open_tables() returned error in case of
thd->killed() without properly calling thd->send_kill_message()
to set the correct error. This was fixed already in get_schema_column_record,
so the code was just copied to get_geometry_column_record.
In optimistic parallel replication, it is not safe to try to run a following
transaction in parallel with a DDL statement, and there is code to prevent
this.
However, the code was missing the case where the DDL is the very first event
after slave start. In this case, following transactions could run in
parallel with the DDL, which can cause the slave to hang or even corrupt
slave in unlucky cases.
table
Performance schema discovery fails if connection has no active database set.
This happened due to restriction in SQL parser: table name with no database name
is ambiguous in such case.
Fixed by temporary substitution of default database with being discovered table
database.
XA COMMIT/ROLLBACK of XA transaction owned by different thread may access
freed memory if that thread disconnects at the same time.
Also concurrent XA COMMIT/ROLLBACK of recovered XA transaction were not
serialized properly.
as Item_func_spatial_mbr_rel needs nothing from Item_bool_func2.
- Renaming Item_func_spacial_rel (the class that implements precise spacial
relations) to Item_func_spatial_precise_rel
- Adding a new abstract class Item_func_spatial_rel as a common parent
for Item_func_spatial_precise_rel and Item_func_spatial_mbr_rel.
Step #3: Splitting the function check_equality() into a method in Item.
Implementing Item::check_equality() and Item_func_eq::check_equality().
Implement Item_func_eq::build_equal_items() in addition to
Item_func::build_equal_items() and moving the call for check_equality()
from Item_func::build_equal_items() to Item_func_eq::build_equal_items().
Step 2c:
After discussion with Igor, it appeared that Item_field and Item_ref
could not appear in this context in the old function build_equal_item_for_cond:
else if (cond->type() == Item::FUNC_ITEM ||
cond->real_item()->type() == Item::FIELD_ITEM)
The part of the condition checking for Item_field::FIELD_ITEM was a dead code.
- Moving implementation of Item_ident_or_func_or_sum::build_equal_items()
to Item_func::build_equal_items()
- Restoring deriving of Item_ident and Item_sum_or_func from Item_result_field.
Removing Item_ident_or_func_or_sum.
C++03 seem to require default constructor for const objects even though they're
fully initialized. There is defect report for this:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#253
Apparently some compilers (e.g. gcc) addressed this defect report, while some
(e.g. clang) did not.
Added empty default constructor to MDL_scoped_lock and MDL_object_lock classes.
Also replaced lf_hash_search_using_hash_value() with lf_hash_search(). The
latter will call mdl_key->hash_value() anyway.
MDEV-7950 Item_func::type() takes 0.26% in OLTP RO (Step#2)
- Item_ref was doing unnecessary extra job after the "MDEV-7950 Step#2" patch.
Fallback to Item::build_equal_items() if real_type() is not FIELD_ITEM.
Note, Item_ref::build_equal_items() will most likely be further simplified.
Waiting for Sanja and Igor to check a possibly dead code.
- Safety: Adding Item_sum::build_equal_items() with ASSERT, as Item_sum
should never appear in build_equal_items() context.
Step#2:
1. Removes the function build_equal_items_for_cond() and
introduces a new method Item::build_equal_items() instead,
with specific implementations in the following Items:
Item (the default implementation)
Item_ident_or_func_or_sum
Item_cond
Item_cond_and
2. Adds a new abstract class Item_ident_or_func_or_sum,
a common parent for Item_ident and Item_func_or_sum,
as they have exactly the same build_equal_items().
3. Renames Item_cond_and::cond_equal to Item_cond_and::m_cond_equal,
to avoid confusion between the member and local variables named
"cond_equal".
The slave SQL thread was clearing serial_rgi->thd before deleting
serial_rgi, which could cause access to NULL THD.
The clearing was introduced in commit
2e100cc5a4 and is just plain wrong. So revert
that part (single line) of that commit.
Thanks to Daniel Black for bug analysis and test case.
There was a rare race, where a deadlock error might not be correctly
handled, causing the slave to stop with something like this in the error
log:
150423 14:04:10 [ERROR] Slave SQL: Connection was killed, Gtid 0-1-2, Internal MariaDB error code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [Warning] Slave: Connection was killed Error_code: 1927
150423 14:04:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001 position 1234
The problem was incorrect error handling. When a deadlock is detected, it
causes a KILL CONNECTION on the offending thread. This error is then later
converted to a deadlock error, and the transaction is retried.
However, the deadlock error was not cleared at the start of the retry, nor
was the lingering kill signal. So it was possible to get another deadlock
kill early during retry. If this happened with particular thread
scheduling/timing, it was possible that the new KILL CONNECTION error was
masked by the earlier deadlock error, so that the second kill was not
properly converted into a deadlock error and retry.
This patch adds code that clears the old error and killed flag before
starting the retry. It also adds code to handle a deadlock kill caught in a
couple of places where it was not handled before.
This was a regression from the patch for MDEV-7668.
A test was incorrect, so the slave would not properly handle re-using
temporary tables, which lead to replication failure in this case.