Do not attempt to produce "r_engine_stats" on the temporary (=work) tables.
These tables may be
- re-created during the query execution
- freed during the query execution (This is done e.g. in JOIN::cleanup(),
before we produce ANALYZE FORMAT=JSON output).
- (Also, make save_explain_data() functions not set handler_for_stats
to point to handler objects that do not have handler->handler_stats set.
If the storage engine is not collecting handler_stats, it will not have
them when we're producing ANALYZE FORMAT=JSON output, either).
ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
has the engine statistics. Only non-zero members are printed.
Internally: EXPLAIN data structures Explain_table_acccess and
Explain_update now have handler* handler_for_stats pointer.
It is used to read statistics from handler_for_stats->handler_stats.
The following applies only to 10.9+, backport doesn't use it:
Explain data structures exist after the tables are closed. We avoid
walking invalid pointers using this:
- SQL layer calls Explain_query::notify_tables_are_closed() before
closing tables.
- After that call, printing of JSON output is disabled. Non-JSON output
can be printed but we don't access handler_for_stats when doing that.
The new statistics is enabled by adding the "engine", "innodb" or "full"
option to --log-slow-verbosity
Example output:
# Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1
# Pages_read_time: 17.0204 Engine_time: 248.1297
Page_read_time is time doing physical reads inside a storage engine.
(Writes cannot be tracked as these are usually done in the background).
Engine_time is the time spent inside the storage engine for the full
duration of the read/write/update calls. It uses the same code as
'analyze statement' for calculating the time spent.
The engine statistics is done with a generic interface that should be
easy for any engine to use. It can also easily be extended to provide
even more statistics.
Currently only InnoDB has counters for Pages_% and Undo_% status.
Engine_time works for all engines.
Implementation details:
class ha_handler_stats holds all engine stats. This class is included
in handler and THD classes.
While a query is running, all statistics is updated in the handler. In
close_thread_tables() the statistics is added to the THD.
handler::handler_stats is a pointer to where statistics should be
collected. This is set to point to handler::active_handler_stats if
stats are requested. If not, it is set to 0.
handler_stats has also an element, 'active' that is 1 if stats are
requested. This is to allow engines to avoid doing any 'if's while
updating the statistics.
Cloned or partition tables have the pointer set to the base table if
status are requested.
There is a small performance impact when using --log-slow-verbosity=engine:
- All engine calls in 'select' will be timed.
- IO calls for InnoDB reads will be timed.
- Incrementation of counters are done on local variables and accesses
are inline, so these should have very little impact.
- Statistics has to be reset for each statement for the THD and each
used handler. This is only 40 bytes, which should be neglectable.
- For partition tables we have to loop over all partitions to update
the handler_status as part of table_init(). Can be optimized in the
future to only do this is log-slow-verbosity changes. For this to work
we have to update handler_status for all opened partitions and
also for all partitions opened in the future.
Other things:
- Added options 'engine' and 'full' to log-slow-verbosity.
- Some of the new files in the test suite comes from Percona server, which
has similar status information.
- buf_page_optimistic_get(): Do not increment any counter, since we are
only validating a pointer, not performing any buf_pool.page_hash lookup.
- Added THD argument to save_explain_data_intern().
- Switched arguments for save_explain_.*_data() to have
always THD first (generates better code as other functions also have THD
first).
EXPLAIN EXTENDED for an UPDATE/DELETE/INSERT/REPLACE statement did not
produce the warning containing the text representation of the query
obtained after the optimization phase. Such warning was produced for
SELECT statements, but not for DML statements.
The patch fixes this defect of EXPLAIN EXTENDED for DML statements.
This is a DELETE only case. Normally this statement doesn't make inserts,
but DELETE ... FOR PORTION changes it. UPDATE and INSERT initializes
autoinc by calling handler::info(HA_STATUS_AUTO). Also myisam and innodb
can lazily initialize it in their update_create_info overrides.
The solution is to initialize autoinc during delete preparation,
if period (DELETE FOR PORTION) is specified.
The initial work has been done by Kento Takeuchi by his PR #2048,
however this commit also holds a few technical modifications by
Nikita Malyavin
- query->intersection fails to get freed if the query exceeds
innodb_ft_result_cache_limit
- errors from init_ftfuncs were not propogated by delete command
This is taken from percona/percona-server@ef2c0bcb9a
not every index-using plan sets bits in table->quick_keys.
QUICK_ROR_INTERSECT_SELECT, for example, doesn't.
Use the fact that select->quick is set instead.
Also allow EXPLAIN to work.
- Handle stored function conditions correctly, with the same logic as with UDFs.
- When running queries on Spider SE, by default, we do not push down WHERE conditions containing usage of UDFs/stored functions to remote data nodes, unless the user demands (by setting spider_use_pushdown_udf).
- Disable direct update/delete when a udf condition is skipped.
The problem is that array binding uses net buffer to read parameters for each
execution while each execiting with RETURNING write in the same buffer.
Solution is to allocate new net buffer to avoid changing buffer we are reading
from.
Both EXPLAIN and EXPLAIN EXTENDED statements produce different results set
in case it is run in normal way and in PS mode for the statements
UPDATE/DELETE with subquery.
The use case below reproduces the issue:
MariaDB [test]> CREATE TABLE t1 (c1 INT KEY) ENGINE=MyISAM;
Query OK, 0 rows affected (0,128 sec)
MariaDB [test]> CREATE TABLE t2 (c2 INT) ENGINE=MyISAM;
Query OK, 0 rows affected (0,023 sec)
MariaDB [test]> CREATE TABLE t3 (c3 INT) ENGINE=MyISAM;
Query OK, 0 rows affected (0,021 sec)
MariaDB [test]> EXPLAIN EXTENDED UPDATE t3 SET c3 =
-> ( SELECT COUNT(d1.c1) FROM ( SELECT a11.c1 FROM t1 AS a11
-> STRAIGHT_JOIN t2 AS a21 ON a21.c2 = a11.c1 JOIN t1 AS a12
-> ON a12.c1 = a11.c1 ) d1 );
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 0 | 100.00 | |
| 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set (0,002 sec)
MariaDB [test]> PREPARE stmt FROM
-> EXPLAIN EXTENDED UPDATE t3 SET c3 =
-> ( SELECT COUNT(d1.c1) FROM ( SELECT a11.c1 FROM t1 AS a11
-> STRAIGHT_JOIN t2 AS a21 ON a21.c2 = a11.c1 JOIN t1 AS a12
-> ON a12.c1 = a11.c1 ) d1 );
Query OK, 0 rows affected (0,000 sec)
Statement prepared
MariaDB [test]> EXECUTE stmt;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 0 | 100.00 | |
| 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set (0,000 sec)
The reason by that different result sets are produced is that on execution
of the statement 'EXECUTE stmt' the flag SELECT_DESCRIBE not set
in the data member SELECT_LEX::options for instances of SELECT_LEX that
correspond to subqueries used in the UPDTAE/DELETE statements.
Initially, these flags were set on parsing the statement
PREPARE stmt FROM "EXPLAIN EXTENDED UPDATE t3 SET ..."
but latter they were reset before starting real execution of
the parsed query during handling the statement 'EXECUTE stmt';
So, to fix the issue the functions mysql_update()/mysql_delete()
have been modified to set the flag SELECT_DESCRIBE forcibly
in the data member SELECT_LEX::options for the primary SELECT_LEX
of the UPDATE/DELETE statement.
The ROWNUM() function is for SELECT mapped to JOIN->accepted_rows, which is
incremented for each accepted rows.
For Filesort, update, insert, delete and load data, we map ROWNUM() to
internal variables incremented when the table is changed.
The connection between the row counter and Item_func_rownum is done
in sql_select.cc::fix_items_after_optimize() and
sql_insert.cc::fix_rownum_pointers()
When ROWNUM() is used anywhere in query, the optimization to ignore ORDER
BY in sub queries are disabled. This was done to get the following common
Oracle query to work:
select * from (select * from t1 order by a desc) as t where rownum() <= 2;
MDEV-3926 "Wrong result with GROUP BY ... WITH ROLLUP" contains a discussion
about this topic.
LIMIT optimization is enabled when in a top level WHERE clause comparing
ROWNUM() with a numerical constant using any of the following expressions:
- ROWNUM() < #
- ROWNUM() <= #
- ROWNUM() = 1
ROWNUM() can be also be the right argument to the comparison function.
LIMIT optimization is done in two cases:
- For the current sub query when the ROWNUM comparison is done on the top
level:
SELECT * from t1 WHERE rownum() <= 2 AND t1.a > 0
- For an inner sub query, when the upper level has only a ROWNUM comparison
in the WHERE clause:
SELECT * from (select * from t1) as t WHERE rownum() <= 2
In Oracle mode, one can also use ROWNUM without parentheses.
Other things:
- Fixed bug where the optimizer tries to optimize away sub queries
with RAND_TABLE_BIT set (non-deterministic queries). Now these
sub queries will not be converted to joins. This bug fix was also
needed to get rownum() working inside subqueries.
- In remove_const() remove setting simple_order to FALSE if ROLLUP is
USED. This code was disable a long time ago because of wrong assignment
in the following code. Instead we set simple_order to false if
RAND_TABLE_BIT was used in the SELECT list. This ensures that
we don't delete ORDER BY if the result set is not deterministic, like
in 'SELECT RAND() AS 'r' FROM t1 ORDER BY r';
- Updated parameters for Sort_param::init_for_filesort() to be able
to provide filesort with information where the number of accepted
rows should be stored
- Reordered fields in class Filesort to optimize storage layout
- Added new error messsage to tell that a function can't be used in HAVING
- Added field 'with_rownum' to THD to mark that ROWNUM() is used in the
query.
Co-author: Oleksandr Byelkin <sanja@mariadb.com>
LIMIT optimization for sub query
Before this patch mergeable derived tables / view used in a multi-table
update / delete were merged before the preparation stage.
When the merge of a derived table / view is performed the on expression
attached to it is fixed and ANDed with the where condition of the select S
containing this derived table / view. It happens after the specification of
the derived table / view has been merged into S. If the ON expression refers
to a non existing field an error is reported and some other mergeable derived
tables / views remain unmerged. It's not a problem if the multi-table
update / delete statement is standalone. Yet if it is used in a stored
procedure the select with incompletely merged derived tables / views may
cause a problem for the second call of the procedure. This does not happen
for select queries using derived tables / views, because in this case their
specifications are merged after the preparation stage at which all ON
expressions are fixed.
This patch makes sure that merging of the derived tables / views used in a
multi-table update / delete statement is performed after the preparation
stage.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Before this patch mergeable derived tables / view used in a multi-table
update / delete were merged before the preparation stage.
When the merge of a derived table / view is performed the on expression
attached to it is fixed and ANDed with the where condition of the select S
containing this derived table / view. It happens after the specification of
the derived table / view has been merged into S. If the ON expression refers
to a non existing field an error is reported and some other mergeable derived
tables / views remain unmerged. It's not a problem if the multi-table
update / delete statement is standalone. Yet if it is used in a stored
procedure the select with incompletely merged derived tables / views may
cause a problem for the second call of the procedure. This does not happen
for select queries using derived tables / views, because in this case their
specifications are merged after the preparation stage at which all ON
expressions are fixed.
This patch makes sure that merging of the derived tables / views used in a
multi-table update / delete statement is performed after the preparation
stage.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
First part of the fix (row0mysql.cc) addresses external columns when adding history
row on referential action. The full data must be retrieved before the
row is inserted.
Second part of the fix (the rest) avoids duplicate primary key error between
the history row generated on referential action and the history row
generated by SQL command. Both command and referential action can
happen on same table since foreign key can be self-reference (parent
and child tables are same). Moreover, the self-reference can refer
multiple rows when the key is non-unique. In such case history is
generated by referential action occured on first row but processed all
rows by a matched key. The second round is when the next row is
processed by a command but history already exists. In such case we
check TRX_ID of existing history row and if it is the same we assume
the above situation and skip adding one more history row or failing
the command.
DELETE...FOR PORTION OF... statement accepts non-constant
FROM...TO clause. This contradicts the documentation and
is inconsistent with the behavior of UPDATE statement.
Thus, we add a validation that checks if a given deletion
period is specified by constant.
Fixed by:
- Make all quick_* variable allocated according to real number keys instead
of MAX_KEY
- Store all the quick* items in separated allocated structure (OPT_RANGE)
- Ensure we don't access any quick* variable without first checking
opt_range_keys.is_set(). Thanks to this, we don't need any
pre-initialization of quick* variables anymore.
Some renames was done to use the new structure:
table->quick_keys -> table->opt_range_keys
table->quick_rows[X] -> table->opt_range[X].rows
table->quick_key_parts[X] -> table->opt_range[X].key_parts
table->quick_costs[X] -> table->opt_range[X].cost
table->quick_index_only_costs[X] -> table->opt_range[X].index_only_cost
table->quick_n_ranges[X] -> table->opt_range[X].ranges
table->quick_condition_rows -> table->opt_range_condition_rows
This patch should both decrease memory needed for TABLE objects
(3528 -> 984 + keyinfo) and increase performance, thanks to less
initializations per query, and more localized memory, thanks to the
opt_range structure.
MDEV-22531 Remove maria::implicit_commit()
MDEV-22607 Assertion `ha_info->ht() != binlog_hton' failed in
MYSQL_BIN_LOG::unlog_xa_prepare
From the handler point of view, Aria now looks like a transactional
engine. One effect of this is that we don't need to call
maria::implicit_commit() anymore.
This change also forces the server to call trans_commit_stmt() after doing
any read or writes to system tables. This work will also make it easier
to later allow users to have system tables in other engines than Aria.
To handle the case that Aria doesn't support rollback, a new
handlerton flag, HTON_NO_ROLLBACK, was added to engines that has
transactions without rollback (for the moment only binlog and Aria).
Other things
- Moved freeing of MARIA_SHARE to a separate function as the MARIA_SHARE
can be still part of a transaction even if the table has closed.
- Changed Aria checkpoint to use the new MARIA_SHARE free function. This
fixes a possible memory leak when using S3 tables
- Changed testing of binlog_hton to instead test for HTON_NO_ROLLBACK
- Removed checking of has_transaction_manager() in handler.cc as we can
assume that as the transaction was started by the engine, it does
support transactions.
- Added new class 'start_new_trans' that can be used to start indepdendent
sub transactions, for example while reading mysql.proc, using help or
status tables etc.
- open_system_tables...() and open_proc_table_for_Read() doesn't anymore
take a Open_tables_backup list. This is now handled by 'start_new_trans'.
- Split thd::has_transactions() to thd::has_transactions() and
thd::has_transactions_and_rollback()
- Added handlerton code to free cached transactions objects.
Needed by InnoDB.
squash! 2ed35999f2a2d84f1c786a21ade5db716b6f1bbc