This patch introduces support for the system variable eq_range_index_dive_limit
that existed in MySQL starting from 5.6. The variable sets a limit for
index dives into equality ranges. Index dives are performed by optimizer
to estimate the number of rows in range scans. Index dives usually provide
good estimate but they are pretty expensive. To estimate the number of rows
in equality ranges statistical data on indexes can be employed. Its usage gives
not so good estimates but it's cheap. So if the number of equality dives
required by an index scan exceeds the set limit no dives for equality
ranges are performed by the optimizer for this index.
As the new system variable is introduced in a stable version the default
value for it is set to a special value meaning there is no limit for the number
of index dives performed by the optimizer.
The patch partially uses the MySQL code for WL 5957
'Statistics-based Range optimization for many ranges'.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
The logic and the implementation scheme are similar with the
MDEV-9197 Pushdown conditions into non-mergeable views/derived tables
How the push down is made on the example:
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y) from t2 group by x);
-->
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y)
from t2
where x>3
group by x
having max(y)>10);
The implementation scheme:
1. Search for the condition cond that depends only on the fields
from the left part of the IN subquery (left_part)
2. Find fields F_group in the select of the right part of the
IN subquery (right_part) that are used in the GROUP BY
3. Extract from the cond condition cond_where that depends only on the
fields from the left_part that stay at the same places in the left_part
(have the same indexes) as the F_group fields in the projection of the
right_part
4. Transform cond_where so it can be pushed into the WHERE clause of the
right_part and delete cond_where from the cond
5. Transform cond so it can be pushed into the HAVING clause of the right_part
The optimization is made in the
Item_in_subselect::pushdown_cond_for_in_subquery() and is controlled by the
variable condition_pushdown_for_subquery.
New test file in_subq_cond_pushdown.test is created.
There are also some changes made for setup_jtbm_semi_joins().
Now it is decomposed into the 2 procedures: setup_degenerate_jtbm_semi_joins()
that is called before optimize_cond() for cond and setup_jtbm_semi_joins()
that is called after optimize_cond().
New setup_jtbm_semi_joins() is made in the way so that the result of its work is
the same as if it was called before optimize_cond().
The code that is common for pushdown into materialized derived and into materialized
IN subqueries is factored out into pushdown_cond_for_derived(),
Item_in_subselect::pushdown_cond_for_in_subquery() and
st_select_lex::pushdown_cond_into_where_clause().
Introduced new alter algorithm type called NOCOPY & INSTANT for
inplace alter operation.
NOCOPY - Algorithm refuses any alter operation that would
rebuild the clustered index. It is a subset of INPLACE algorithm.
INSTANT - Algorithm allow any alter operation that would
modify only meta data. It is a subset of NOCOPY algorithm.
Introduce new variable called alter_algorithm. The values are
DEFAULT(0), COPY(1), INPLACE(2), NOCOPY(3), INSTANT(4)
Message to deprecate old_alter_table variable and make it alias
for alter_algorithm variable.
alter_algorithm variable for slave is always set to default.
The upper 1M limit for max_prepared_stmt_count was set over 10 years
ago. It doesn't suite current hardware and a sysbench oltp_read_write
test with 512 threads will hit this limit.
MDEV--15609 engines/funcs.crash_manytables_number crashes with error 24
(too many open files)
MDEV-10286 Adjustment of table_open_cache according to system limits
does not work when open-files-limit option is provided
Fixed by adjusting tc_size downwards if there is not enough file
descriptors to use.
Other changes:
- Ensure that there is 30 (was 10) extra file descriptors for other usage
- Decrease TABLE_OPEN_CACHE_MIN to 200 as it's better to have a smaller
table cache than getting error 24
- Increase minimum of max_connections and table_open_cache from 1 to 10
as 1 is not usable for any real application, only for testing.
Standard compatible behavior for UPDATE: all assignments in SET
are executed "simultaneously", not left-to-right. And `SET a=b,b=a`
will swap the values.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
and the system_versioning_transaction_registry variable.
The user enables transaction registry by specifying BIGINT for
row_start/row_end columns.
check mysql.transaction_registry structure on the first open,
not on startup. Avoid warnings unless transaction_registry
is actually used.
Many related changes.
Note that AS OF condition must always be pushed down to physical tables,
it cannot be applied to a derived or a view. Thus:
* no versioning for internal temporary tables, they can never store
historical data.
* remove special versioning code from mysql_derived_prepare and
remove ER_VERS_DERIVED_PROHIBITED - derived can have no historical
data and cannot be prohibited for system versioning related reasons.
* do not expand select list for derived/views with sys vers fields,
derived/views can never have historical data.
* remove special invisiblity rules for sys vers fields, they are no
longer needed after the previous change
* remove system_versioning_hide, it lost the meaning after the
previous change.
* remove ER_VERS_SYSTEM_TIME_CLASH, it's no "clash", the inner
AS OF clause always wins.
* non-versioned fields in a historical query
reword the warning text, downgrade to note, don't
replace values with NULLs
Other things, mainly to get
create_mysqld_error_find_printf_error tool to work:
- Added protection to not include mysqld_error.h twice
- Include "unireg.h" instead of "mysqld_error.h" in server
- Added protection if ER_XX messages are already defined
- Removed wrong calls to my_error(ER_OUTOFMEMORY) as
my_malloc() and my_alloc will do this automatically
- Added missing %s to ER_DUP_QUERY_NAME
- Removed old and wrong calls to my_strerror() when using
MY_ERROR_ON_RENAME (wrong merge)
- Fixed deadlock error message from Galera. Before the extra
information given to ER_LOCK_DEADLOCK was missing because
ER_LOCK_DEADLOCK doesn't provide any extra information.
I kept #ifdef mysqld_error_find_printf_error_used in sql_acl.h
to make it easy to do this kind of check again in the future
* rename in_subquery_conversion_threshold to in_predicate_conversion_threshold
* make it debug-only, hide from users
* change from ulong to uint - same type and range on all architectures
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.
Todo:
React on killed status by repl_semisync_master.wait_after_sync(). Currently
Repl_semi_sync_master::commit_trx does not check the killed status.
There were few bugfixes found that are present in mysql and its unclear
whether/how they are covered. Those include:
Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
HAS BAD PERFORMANCE
Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS
Part of MDEV-13073 AliSQL Optimize performance of semisync
Did the following renames to match other similar variables
key_ss_mutex_LOCK_binlog_ > key_LOCK_bing
key_ss_cond_COND_binlog_send_ -> key_COND_binlog_send
COND_binlog_send_ -> COND_binlog_send
LOCK_binlog_ -> LOCK_binlog
debian/mariadb-server-10.2.install does not install semisync libs.
* The version of tcmalloc is written to the system variable
'version_malloc_library' if tcmalloc is used, similarly to
jemalloc
* Extracted method guess_malloc_library()
* Note: breaking change; since this commit, a plugin that has
worked so far might get rejected due to plugin maturity
* mariabackup is not affected (allows all plugins)
* VERSION file defines SERVER_MATURITY, which defines the
corresponding numeric value as SERVER_MATURITY_LEVEL in
include/mysql_version.h
* The default value for 'plugin_maturity' is SERVER_MATURITY_LEVEL - 1
* Logs a warning if a plugin has maturity lower than
SERVER_MATURITY_LEVEL
* Tests suppress the plugin maturity warning
* Tests use --plugin-maturity=unknown by default so as not to fail
due to the stricter plugin maturity handling
This is 10.1 version where no merge error exists.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we refuse to start InnoDB.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
Problem was a merge error from MySQL wsrep i.e. Galera.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we fall back to First-Come-First-Served (FCFS)
with notice to user.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_reset_lock_and_trx_wait
Use ib::hex() to print out transaction ID.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
RecLock::add_to_waitq
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
lock_prdt_other_has_conflicting
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
RecLock::create
Conclicting lock pointer is moved to last parameter with
default value NULL. This conflicting transaction could
be selected as victim in Galera if requesting transaction
is BF (brute force) transaction. In this case contents
of conflicting lock pointer will be changed. Use ib::hex() to print
transaction ids.
* Note: breaking change; since this commit, a plugin that has
worked so far might get rejected due to plugin maturity
* mariabackup is not affected (allows all plugins)
* VERSION file defines SERVER_MATURITY, which defines the
corresponding numeric value as SERVER_MATURITY_LEVEL in
include/mysql_version.h
* The default value for 'plugin_maturity' is SERVER_MATURITY_LEVEL - 1
* Logs a warning if a plugin has maturity lower than
SERVER_MATURITY_LEVEL
* Tests suppress the plugin maturity warning
* Tests use --plugin-maturity=unknown by default so as not to fail
due to the stricter plugin maturity handling
This is about adding more options to force slave retries
Two new variables has been added:
slave_transaction_retry_errors
- Tells the slave thread to retry transaction for replication when a
query event returns an error from the provided list. Deadlock and
elapsed lock wait timeout errors are automatically added to this list
slave-transaction-retry-interval
- Interval of the slave SQL thread will retry a transaction
in case it failed with a deadlock or elapsed lock wait
timeout or listed in slave_transaction_retry_errors
Other changes:
- Simplifed code for slave_skip_errors (to be aligned with
slave_transaction_retry_errors)
- Renamed print_slave_skip_errors() to make_slave_skip_errors_printable()
- Remove printing error from init_slave_skip_errors as my_bitmap_init()
will do that if needed.
- Generalize has_temporary_error()
As a result of this merge the code for the following tasks appears in 10.3:
- MDEV-12172 Implement tables specified by table value constructors
- MDEV-12176 Transform [NOT] IN predicate with long list of values INTO
[NOT] IN subquery.