(Patch from Monty, slightly amended)
Fix rowid filtering optimization in best_access_path():
== Ref access + rowid filtering ==
The cost computations compare #records and index-only scan cost
(keyread_tmp) to find out the per-record advantage one will get if
they skip reading full table record.
The computations produce wrong result when:
- the #records are "clipped down" with s->worst_seeks or
thd->variables.max_seeks_for_key. keyread_tmp is not clipped
this way so the numbers are not comparable.
- access_factor is negative. This means index_only read is
cheaper than non-index-only read.
This patch makes the optimizer not to consider Rowid Filtering in
such cases.
The decision is logged in the Optimizer Trace using
"rowid_filter_skipped" name.
== Range access + rowid filtering ==
when considering to use Rowid Filter with range access, do multiply
keyread_tmp by record_count. That way, it is comparable with the
range access's estimate, which is multiplied by record_count.
mysql_discard_or_import_tablespace(): On successful
ALTER TABLE...DISCARD TABLESPACE, evict the table handle from the
table definition cache, so that ha_innobase::close() will be invoked,
like InnoDB expects to be the case. This will avoid an assertion failure
ut_a(table->get_ref_count() == 0) during IMPORT TABLESPACE.
ha_innobase::open(): Do not issue any ER_TABLESPACE_DISCARDED warning.
Member functions for DML will do that.
ha_innobase::truncate(), ha_innobase::check_if_supported_inplace_alter():
Issue ER_TABLESPACE_DISCARDED warnings, to compensate for the removal of
the warning in ha_innobase::open().
row_quiesce_write_indexes(): Only write information about committed
indexes. The ALTER TABLE t NOWAIT ADD INDEX(c) in the nondeterministic
test case will most of the time fail due to a metadata lock (MDL) timeout
and leave behind an uncommitted index.
Reviewed by: Sergei Golubchik
The geometry type requires Type:"Feature" but the feature need
not be first in the JSON structure.
Adjust code to return an error if geometry isn't a JSON object,
but continue parsing searching for Type: "Feature" to trigger
the geometry parsing.
Thanks Derick Magnusen for the bug report.
The bug is caused by a similar mechanism as MDEV-21027.
The function, check_insert_or_replace_autoincrement, failed to open
all the partitions on REPLACE SELECT statements and it results in the
assertion error.
Item_func_not_all::print() either uses Item_func::print() or
directly invokes args[0]->print(). Thus the precedence should be
either the one of Item_func or of args[0].
Item_allany_subselect::print() prints args[0], then a comparison op,
then a subquery. That is, the precedence should be the one of
a comparison.
rename to stress that is a specific hack for Item_func_nextval
and should not be used for other items.
If a vcol uses Item_func_nextval, a corresponding table for the sequence
should be added to the prelocking list (in that sense NEXTVAL is not
simply a function, but more like a subquery), see add_internal_tables()
in DML_prelocking_strategy::handle_table(). At the moment it is only
implemented for DEFAULT, not for GENERATED ALWAYS AS, thus the
VCOL_NEXTVAL hack.
select_union_direct::send_data() only sends a record when
the LIMIT ... OFFSET clause of the individual select won't skip it.
Thus, select_union_direct::send_data() should not do any actions
related to a sending a record if the offset of a select isn't
reached yet
Like in MDEV-16110 we must release items allocated on thd->mem_root by
reopening the table.
MDEV-16290 relocated MDEV-16110 fix in 10.5 so it works for MDEV-28576
as well. 10.3 without MDEV-16290 now duplicates this fix.
The change from MDEV-29465 exposed a flaw in replace_column_table
where again we were not properly updating the column-level bits.
replace_table_table was changed in MDEV-29465 to properly update
grant_table->init_cols, however replace_column_table still only
modified grant_column->rights when the GRANT_COLUMN already existed.
This lead to a missmatch between GRANT_COLUMN::init_rights and
GRANT_COLUMN::rights, *if* the GRANT_COLUMN already existed.
As an example:
GRANT SELECT (col1) ...
Here:
For col1
GRANT_COLUMN::init_rights and GRANT_COLUMN::rights are set to 1 (SELECT) in
replace_column_table.
GRANT INSERT (col1) ...
Here, without this patch GRANT_COLUMN::init_rights is still 1 and
GRANT_COLUMN::rights is 3 (SELECT_PRIV | INSERT_PRIV)
Finally, if before this patch, one does:
REVOKE SELECT (col1) ...
replace_table_table will see that init_rights loses bit 1 thus it
considers there are no more rights granted on that particular table.
This prompts the whole GRANT_TABLE to be removed via the first revoke,
when the GRANT_COLUMN corresponding to it should still have init_rights == 2.
By also updating replace_column_table to keep init_rights in sync
properly, the issue is resolved.
Reviewed by <serg@mariadb.com>
Test MDEV-26575 fails when it runs after MDEV-25389. This is because
the latter simulates a failure while an applier thread is
created in `start_wsrep_THD()`. The failure was not handled correctly
and would not cleanup the created THD from the global
`server_threads`. A subsequent shutdown would hang and eventually fail
trying to close this THD.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Fix `wsrep_table_accessible_when_detached()` so that commands that
access no tables are rejected while a node is disconnected from a
cluster.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Prevent wsrep files from being installed if WITH_WSREP=OFF.
Reviewed by Daniel Black
Additionally excluded #include wsrep files and galera* files
along with galera/wsrep tests.
mysql-test/include/have_wsrep.inc remainds as its used by
a few isolated tests.
Co-authored-by: Chris Ross <cross2@cisco.com>
Fix the regression introduced in
dfb41fddf6.
In the restructure of mysql_rm_table_no_locks the early condition
of !frm_error that enabled non_tmp_table_deleted, and hence the
query cache invalidation, was removed.
The query_cache_invalidate1(thd, dbnorm) called after
mysql_rm_table_no_locks depends on the query cache removal
(for unexamined reasons).
Under DROP DATABASE, in mysql_rm_table_no_locks, dont_log_query
is true preventing the late setting of non_tmp_table_deleted
(which retained one of its purposes as a replication deletion
of temporary tables, but not query cache invalidation).
The non_temp_tables_count however can still be used to invalidate
the query cache.
This is a DELETE only case. Normally this statement doesn't make inserts,
but DELETE ... FOR PORTION changes it. UPDATE and INSERT initializes
autoinc by calling handler::info(HA_STATUS_AUTO). Also myisam and innodb
can lazily initialize it in their update_create_info overrides.
The solution is to initialize autoinc during delete preparation,
if period (DELETE FOR PORTION) is specified.
The initial work has been done by Kento Takeuchi by his PR #2048,
however this commit also holds a few technical modifications by
Nikita Malyavin
Virtual column values are updated in handler in reading commands,
like ha_index_next, etc. This was missing for ha_ft_read.
handler::ha_ft_read: add table->update_virtual_fields() call
This patch adds the correct setting of the "--tls-version" and
"--ssl-verify-server-cert" options in the client-side utilities
such as mysqltest, mysqlcheck and mysqlslap, as well as the correct
setting of the "--ssl-crl" option when executing queries on the
slave side, and also the correct option codes in the "sslopts-logopts.h"
file (in the latter case, incorrect values are not a problem right
now, but may cause subtle test failures in the future, if the option
handling code changes).
This patch adds the correct setting of the "--ssl-verify-server-cert"
option in the client-side utilities such as mysqlcheck and mysqlslap,
as well as the correct setting of the "--ssl-crl" option when executing
queries on the slave side, and also add the correct option codes in
the "sslopts-logopts.h" file (in the latter case, incorrect values
are not a problem right now, but may cause subtle test failures in
the future, if the option handling code changes).
Because of the default warning level, aborted unauthenticated
connections are in the error log. These errors frequently occur
in production environments because cancelled connectiosn occur
all the time when web pages are shutdown.
Rather than flood our user's errors log with these ordinary
messages, lets push them down to the warning level at log-warnings=4
level.
Concept approved by Monty.
Fixing a few problems relealed by UBSAN in type_float.test
- multiplication overflow in dtoa.c
- uninitialized Field::geom_type (and Field::srid as well)
- Wrong call-back function types used in combination with SHOW_FUNC.
Changes in the mysql_show_var_func data type definition were not
properly addressed all around the code by the following commits:
b4ff64568c18feb62fee0ee879ff8a
Adding a helper SHOW_FUNC_ENTRY() function and replacing
all mysql_show_var_func declarations using SHOW_FUNC
to SHOW_FUNC_ENTRY, to catch mysql_show_var_func in the future
at compilation time.
The issue is that record_should_be_deleted() returns true in
mysql_delete() even if sub-select with join gets error from storage
engine when DELETE FROM ... WHERE ... IN (SELECT ...) statement is
executed.
The same is true for mysql_update() where select->skip_record() returns
true even if sub-select with join gets error from storage engine.
In the test case if sub-select is chosen as deadlock victim the whole
transaction is rolled back during sub-select execution, but
mysql_delete()/mysql_update() continues transaction execution and invokes
table->delete_row() as record_should_be_deleted() wrongly returns true
in mysql_delete() and table->update_row() as select->skip_record(thd)
wrongly returns 1 for mysql_update().
record_should_be_deleted() wrogly returns true because thd->is_error()
returns false SQL_SELECT::skip_record() invoked from
record_should_be_deleted().
It's supposed that THD error should be set in rr_handle_error() called
from rr_sequential() during sub-select JOIN::exec_inner() execution.
But rr_handle_error() does not set THD error because
READ_RECORD::print_error is not set in JOIN_TAB::read_record.
READ_RECORD::print_error should be initialized in
init_read_record()/init_read_record_idx(). But make_join_readinfo() does
not invoke init_read_record()/init_read_record_idx() for
JOIN_TAB::read_record.
The fix is to set JOIN_TAB::read_record.print_error in
make_join_readinfo(), i.e. in the same place where
JOIN_TAB::read_record.table is set.
Reviewed by Sergey Petrunya.
Problem:-
We are able to insert duplicate value in table because cmp_binary_offset
is not able to differentiate between NULL and empty string. So
check_duplicate_long_entry_key is never called and we don't check for
duplicate.
Solution
Added a if condition with is_null() on field which can differentiate
between NULL and empty string.
when assigning the cached item to the Item_cache for the first time
make sure to use Item_cache::setup(), not Item_cache::store().
Because the former copies the metadata (and allocates memory, in case
of Item_cache_row), and Item_cache::decimal must be set for
comparisons to work correctly.
Deallocation of TABLE_LIST::dt_handler and TABLE_LIST::pushdown_derived
was performed in multiple places if code. This not only made the code
more difficult to maintain but also led to memory leaks and
ASAN heap-use-after-free errors.
This commit puts deallocation of TABLE_LIST::dt_handler and
TABLE_LIST::pushdown_derived to the single point - JOIN::cleanup()
Per the code my_set_max_open_files 3 lines earlier, we attempt
to set the nofile (number of open files), rlimit to max_open_files.
We should use this in the warning because wanted_files may not
be the number.
When a range rowid filter was used with an index ref access the cost of
accessing the index entries for the records rejected by the filter was not
taken into account. For a ref access by an index with big average number
of records per key this led to poor execution plans if selectivity of the
used filter was high.
The patch resolves this problem. It also introduces a minor optimization
that skips look-ups into a filter that turns out to be empty.
With this patch the output of ANALYZE stmt reports the number of look-ups
into used rowid filters.
The patch also back-ports from 10.5 the code that properly sets the field
TABLE::file::table for opened temporary tables.
The test cases that were supposed to use rowid filters have been adjusted
in order to use similar execution plans after this fix.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The ALTER related code cannot do at the same time both:
- modify partitions
- change column data types
Explicit changing of a column data type together with a partition change is
prohibited by the parter, so this is not allowed and returns a syntax error:
ALTER TABLE t MODIFY ts BIGINT, DROP PARTITION p1;
This fix additionally disables implicit data type upgrade
(e.g. from "MariaDB 5.3 TIME" to "MySQL 5.6 TIME", or the other way
around according to the current mysql56_temporal_format) in case of
an ALTER modifying partitions, e.g.:
ALTER TABLE t DROP PARTITION p1;
In such commands now only the partition change happens, while
the data types stay unchanged.
One can additionally run:
ALTER TABLE t FORCE;
either before or after the ALTER modifying partitions to
upgrade data types according to mysql56_temporal_format.
Abort startup, if SSL setup fails.
Also, for the server always check that certificate matches private key
(even if ssl_cert is not set, OpenSSL will try to use default one)
the only query of the XA transaction is on a non-transactional table
errors out:
XA BEGIN 'x';
--error ER_DUP_ENTRY
INSERT INTO t1 VALUES (1),(1);
XA END 'x';
XA PREPARE 'x';
The binlogging pattern is correctly started as expected with
the errored-out Query or its ROW format events, but there is
no empty XA_prepare_log_event group.
The following
XA COMMIT 'x';
therefore should not be logged either, but it does.
The bug is fixed with proper maintaining of a read-write binlog hton
property and use it to enforce correct binlogging decisions.
Specifically in the bug description case XA COMMIT won't be binlogged
in both when given in the same connection and externally after disconnect.
The same continue to apply to an empty XA that do not change any data in all
transactional engines involved.
Read the version of the view share when we read definition to prevent
simultaniouse access to a view table SHARE (and so its MEM_ROOT)
from different threads.
OpenSSL handles memory management using **OPENSSL_xxx** API[^1]. For
allocation, there is `OPENSSL_malloc`. To free it, `OPENSSL_free` should
be called.
We've been lucky that OPENSSL (and wolfSSL)'s implementation allowed the
usage of `free` for memory cleanup. However, other OpenSSL forks, such
as AWS-LC[^2], is not this forgiving. It will cause a server crash.
Test case `openssl_1` provides good coverage for this issue. If a user
is created using:
`grant select on test.* to user1@localhost require SUBJECT "...";`
user1 will crash the instance during connection under AWS-LC.
There have been numerous OpenSSL forks[^3]. Due to FIPS[^4] and other
related regulatory requirements, MariaDB will be built using them. This
fix will increase MariaDB's adaptability by using more compliant and
generally accepted API.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
[^1]: https://www.openssl.org/docs/man1.1.1/man3/OPENSSL_malloc.html
[^2]: https://github.com/awslabs/aws-lc
[^3]: https://en.wikipedia.org/wiki/OpenSSL#Forks
[^4]: https://en.wikipedia.org/wiki/FIPS_140-2
st_select_lex::init_query is called in the exectuion of EXECUTE
IMMEDIATE 'alter table ...'. so reset the initialization at the
same point we set join= 0.
and also MDEV-25564, MDEV-18157.
Attempt to produce EXPLAIN output caused a crash in
Explain_node::print_explain_for_children. The cause of this was that an
Explain_node (actually a derived) had a link to child select#N, but
there was no query plan present for select#N.
The query plan wasn't present because the subquery was eliminated.
- Either it was a degenerate subquery like "(SELECT 1)" in MDEV-25564.
- Or it was a subquery in a UNION subquery's ORDER BY clause:
col IN (SELECT ... UNION
SELECT ... ORDER BY (SELECT FROM t1))
In such cases, legacy code structure in subquery/union processing code(*)
makes it hard to detect that the subquery was eliminated, so we end up
with EXPLAIN data structures (Explain_node::children) having dangling
links to child subqueries.
Do make the checks and don't follow the dangling links.
(In ideal world, we should not have these dangling links. But fixing
the code (*) would have high risk for the stable versions).
The bug is that we don't have a a lock on the trigger name, so it is
possible for two threads to try to create the same trigger at the same
time and both thinks that they have succeed.
Same thing can happen with drop trigger or a combinations of create and
drop trigger.
Fixed by adding a mdl lock for the trigger name for the duration of the
create/drop.
This was caused by the short_option_1-master.opt file that had the
option -T12, which means (among other things) to use blocking for
sockets. This was supported up to MariaDB 10.4, but not in 10.5 where
we removed the code that changes blocking sockets to non blocking in
case of errors.
Fixed by ignoring the TEST_BLOCKING flag and also by not using the -T12
argument in short_option_1.
Other things:
- Added back support for valgrind (the original issue had nothing to
do with valgrind).
- While debugging I noticed that the retry loop in
handle_connections_sockets() was doing a lot of work during shutdown.
Fixed by not doing retrys during shutdown.
The population of default values in INSERT SELECT was being
performed twice. With sequences, this resulted in every
second sequence value being used.
With SELECT INSERT we remove the second invokation of
table->update_default_fields(). This was already performed
in store_values() invoking fill_record_n_invoke_before_triggers()
which invoked update_default_fields() previously.
We do need to return an error on duplicate values, so the
::store_values is extended to take the ignore option.
=========== Problem =============
- `show columns` is not working for temporary tables, even though there
is enough privilege `create temporary tables`.
=========== Solution =============
- Append `TMP_TABLE_ACLS` privilege when running `show columns` for temp
tables.
- Additionally `check_access()` for database only once, not for each
field
=========== Additionally =============
- Update comments for function `check_table_access` arguments
Reviewed by: <vicentiu@mariadb.org>
For some queries that involve tables with different but convertible
character sets for columns taking part in the query, repeatable
execution of such queries in PS mode or as part of a stored routine
would result in server abnormal termination.
For example,
CREATE TABLE t1 (a2 varchar(10));
CREATE TABLE t2 (u1 varchar(10) CHARACTER SET utf8);
CREATE TABLE t3 (u2 varchar(10) CHARACTER SET utf8);
PREPARE stmt FROM
"SELECT t1.* FROM (t1 JOIN t2 ON (t2.u1 = t1.a2))
WHERE (EXISTS (SELECT 1 FROM t3 WHERE t3.u2 = t1.a2))";
EXECUTE stmt;
EXECUTE stmt; <== Running this prepared statement the second time
results in server crash.
The reason of server crash is that an instance of the class
Item_func_conv_charset, that created for conversion of a column
from one character set to another, is allocated on execution
memory root but pointer to this instance is stored in an item
placed on prepared statement memory root. Below is calls trace to
the place where an instance of the class Item_func_conv_charset
is created.
setup_conds
Item_func::fix_fields
Item_bool_rowready_func2::fix_length_and_dec
Item_func::setup_args_and_comparator
Item_func_or_sum::agg_arg_charsets_for_comparison
Item_func_or_sum::agg_arg_charsets
Item_func_or_sum::agg_item_set_converter
Item::safe_charset_converter
And the following trace shows the place where a pointer to
the instance of the class Item_func_conv_charset is passed
to the class Item_func_eq, that is created on a memory root of
the prepared statement.
Prepared_statement::execute
mysql_execute_command
execute_sqlcom_select
handle_select
mysql_select
JOIN::optimize
JOIN::optimize_inner
convert_join_subqueries_to_semijoins
convert_subq_to_sj
To fix the issue, switch to the Prepared Statement memory root
before calling the method Item_func::setup_args_and_comparator
in order to place any created Items on permanent memory root.
It may seem that such approach would result in a memory
leakage in case the parameter marker '?' is used in the query
as in the following example
PREPARE stmt FROM
"SELECT t1.* FROM (t1 JOIN t2 ON (t2.u1 = t1.a2))
WHERE (EXISTS (SELECT 1 FROM t3 WHERE t3.u2 = ?))";
EXECUTE stmt USING convert('A' using latin1);
but it wouldn't since for such case any of the parameter markers
is treated as a constant and no subquery to semijoin optimization
is performed.
See also commits aa8a31da and 64678c for a Bug #22990029 fix.
In this scenario INSERT chose to check if delete unmarking is available for
a just deleted record. To build an update vector, it needed to calculate
the vcols as well. Since this INSERT was not IGNORE-flagged, recalculation
failed.
Solutiuon: temporarily set abort_on_warning=true, while calculating the
column for delete-unmarked insert.
As of now innodb does not store trx_id for each record in secondary index.
The idea behind is following: let us store only per-page max_trx_id, and
delete-mark the records when they are deleted/updated.
If the read starts, it rememders the lowest id of currently active
transaction. Innodb refers to it as trx->read_view->m_up_limit_id.
See also ReadView::open.
When the page is fetched, its max_trx_id is compared to m_up_limit_id.
If the value is lower, and the secondary index record is not delete-marked,
then this page is just safe to read as is. Else, a clustered index could be
needed ato access. See page_get_max_trx_id call in row_search_mvcc, and the
corresponding switch (row_search_idx_cond_check(...)) below.
Virtual columns are required to be updated in case if the record was
delete-marked. The motivation behind it is documented in
Row_sel_get_clust_rec_for_mysql::operator() near
row_sel_sec_rec_is_for_clust_rec call.
This was basically a description why virtual column computation can
normally happen during SELECT, and, generally, a vcol index access.
Sometimes stats tables are updated by innodb. This starts a new
transaction, and it can happen that it didn't finish to the moment of
SELECT execution, forcing virtual columns recomputation. If the result was
a something that normally outputs a warning, like division by zero, then
it could be outputted in a racy manner.
The solution is to suppress the warnings when a column is computed
for the described purpose.
ignore_wrnings argument is added innobase_get_computed_value.
Currently, it is only true for a call from
row_sel_sec_rec_is_for_clust_rec.
MDEV-19243 introduced a regression on Windows.
In (supposedly rare) case, where environment variable TZ was set,
@@system_time_zone no longer derives from TZ. Instead, it incorrecty
refers to system default time zone, eventhough UTC time conversion
takes TZ into account.
The fix is to restore TZ-aware handling (timezone name derives from
tzname), if TZ is set.
Adding debug output for key and keyseg flags at ha_myisam::open() time.
So now there are three points of debug output:
1. In the very end of mysql_prepare_create_table()
2. In ha_myisam::create(), after the table2myisam() call
3. In ha_myisan::open(), after the mi_open() call
mi_create(), which is is called between 2 and 3, modifies flags for
some data types, so the output in 2 and 3 is different.
Fix error message to contain correct errno. This commit was
tested interactively because mtr will notice if you provide
wrong wsrep_provider in config and you may not change
wsrep_provider dynamically.
If repl.max_ws_size is set too low following CREATE TABLE could fail
during commit. In this case wsrep_commit_empty should allow rolling
it back if provider state is s_aborted.
Furhermore, original ER_ERROR_DURING_COMMIT does not really tell anything
clear for user. Therefore, this commit adds a new error
ER_TOO_BIG_WRITESET. This will change some test cases output.
Problem was that in ALTER TABLE execution variables were set
to 1 even when wsrep_auto_increment_control is OFF. We should
set them only when wsrep_auto_increment_control is ON.
In test user has set WSREP_ON=OFF this causes streaming replication
recovery to fail and this caused call to unireg_abort(). However,
this call is not necessary and we can let transaction to fail. Naturally,
if real user does this he needs to bootstrap his cluster.
10.5 part: test cases and comments.
The code is in the merge commit 74fe1c44aa
When f.ex. table is partitioned by HASH(a) and we rename column `a' to
`b' partitioning filter stays unchanged: HASH(a). That's the wrong
behavior.
The patch updates partitioning filter in accordance to the new columns
names. That includes partition/subpartition expression and
partition/subpartition field list.