Part#2 (final): rewritting the code to pass the correct enum_sp_aggregate_type
to the sp_head constructor, so sp_head never changes its aggregation type
later on. The grammar has been simplified and defragmented.
This allowed to check aggregate specific instructions right after
a routine body has been scanned, by calling new LEX methods:
sp_body_finalize_{procedure|function|trigger|event}()
Moving some C++ code from *.yy to a few new helper methods in LEX.
using Item_cond
This bug is similar to the bug MDEV-16765.
It appears because of the wrong pushdown into HAVING clause while this
pushdown shouldn't be made at all.
This happens because function that checks if Item_cond can be pushed
always returns that it can be pushed.
To fix it new method Item_cond::excl_dep_on_table() was added.
slave_list was used to provide data for SHOW SLAVE HOSTS and
Slaves_connected status variable.
Introduced binlog_dump_thread_count which is exposed via Slaves_connected
(replaces slave_list.records).
Store Slave_info on THD and access it by iterating server_threads
(replaces slave_list).
Added:
THD::slave_info
binlog_dump_thread_count
show_slave_hosts_callback()
Removed:
slave_list
SLAVE_LIST_CHUNK
SLAVE_ERRMSG_SIZE
slave_list_key()
slave_info_free()
init_slave_list()
end_slave_list()
all_slave_list_mutexes
init_all_slave_list_mutexes()
key_LOCK_slave_list
LOCK_slave_list
Moved:
SLAVE_INFO -> Slave_info
register_slave() -> THD::register_slave()
unregister_slave() -> THD::unregister_slave()
Also removed redundant end_slave() from close_connections(): it is called
again soon afterwards by clean_up().
Pre-requisite for clean MDEV-18450 solution.
If a splittable materialized derived table / view T is used in a inner nest
of an outer join with impossible ON condition then T is marked as a
constant table. Yet the execution plan to build T is still searched for
in spite of the fact that is not needed. So it should be set.
With wsrep_gtid_mode=ON, the appropriate commit hooks were not
called in all cases for applied streaming transactions.
As a fix, removed all special handling of commit order critical
section from Wsrep_high_priority_service and Wsrep_storage_service.
Now commit order critical section is always entered in ha_commit_trans().
Check for wsrep_run_commit_hook is now done in handler.cc, log.cc.
This makes it explicit that the transaction is an active wsrep
transaction which must go through commit hooks.
When the chosen execution plan accesses a join table employing a range
rowid filter a quick select to scan this range has to be built. This
quick select is built by a call of SQL_SELECT::test_quick_select().
At this call the function should allow to evaluate only single index
range scans. In order to be able to do this a new parameter was added
to this function.
1. Always drop merged_for_insert flag on cleanup (there could be errors which prevent TABLE to be assigned)
2. Make more precise cleanup of select parts which was touched
st_select_lex::handle_derived() and mysql_handle_list_of_derived() had
exactly the same implementations.
- Adding a new method LEX::handle_list_of_derived() instead
- Removing public function mysql_handle_list_of_derived()
- Reusing LEX::handle_list_of_derived() in st_select_lex::handle_derived()
* update system versioning fields before generaled columns
* don't presume that ha_write_row() means INSERT. It could still be UPDATE
* use the correct handler in check_duplicate_long_entry_key()
close table->update_handler in close_thread_tables().
it's not enough to do it in sql_update.cc only, because
sql_insert.cc can also do updates (REPLACE) and even
sql_delete.cc can (DELETE ... FOR PORTION OF)
when auto-adding a virtual LONG_UNIQUE_HASH_FIELD, fill in
a Virtual_column_info for it, so that fill_alter_inplace_info()
would know we're adding a virtual field (ALTER_ADD_VIRTUAL_COLUMN).
The bug manifested itself when executing a query with materialized
view/derived/CTE whose specification was a SELECT query contained
another materialized derived and impossible WHERE/HAVING condition
was detected for this SELECT.
As soon as such condition is detected the join structures of all
derived tables used in the SELECT are destroyed. So optimization
of the queries specifying these derived tables is impossible. Besides
it's not needed.
In 10.3 optimization of a materialized derived table is performed before
detection of impossible WHERE/HAVING condition in the embedding SELECT.
server shutdown code.
Fix fixes a race condition, if an active connection either writes,
or will be writing to the socket after it is closed.
Previous call to socket shutdown() is fully enough to wake up and idle
connection, so that close_connection is obsolete and dangerous.
Refactored wsrep patch to not use LOCK_thread_count and COND_thread_count anymore.
This has partially been replaced by using old LOCK_wsrep_slave_threads mutex.
For slave thread count change waiting, new COND_wsrep_slave_threads signal has been added
Added LOCK_wsrep_cluster_config mutex to control that cluster address change cannot happen in parallel
Protected wsrep_slave_threads variable changes with LOCK_cluster_config mutex
This is for avoiding concurrent slave thread count and cluster joining operations to happen
Fixes according to Teemu's review
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
The problem happened because Item_ident_for_show did not implement val_native().
Solution:
- Removing class Item_ident_for_show
- Implementing a new method Protocol::send_list_fields() instead,
which accepts a List<Field> instead of List<Item> as input.
Now no any Item creation is done during mysqld_list_fields().
Adding helper methods, to reuse the code easier:
- Moved a part of Protocol::send_result_set_metadata(),
responsible for sending an individual field metadata,
into a new method Protocol_text::store_field_metadata().
Reusing it in both send_list_fields() and send_result_set_metadata().
- Adding Protocol_text::store_field_metadata()
- Adding Protocol_text::store_field_metadata_for_list_fields()
Note, this patch also automatically fixed another bug:
MDEV-18685 mysql_list_fields() returns DEFAULT 0 instead of DEFAULT NULL for view columns
The reason for this bug was that Item_ident_for_show::val_xxx() and get_date()
did not check field->is_null() before calling field->val_xxx()/get_date().
Now the default value is correctly sent by Protocol_text::store(Field*).
Wsrep-lib is now guaranteed to hold the underlying mutex
which is wrapped in lock object passed to Wsrep_client_service
interrupted() call. The library part will now take care of
checking the wsrep::transaction specific state, so it is
enough to check the thd->killed state for the result.
The InnoDB DeadlockChecker::check_and_resolve() was missing a
call to wsrep_handle_SR_rollback() in the case when the
transaction running deadlock detection was chosen as victim.
Refined wsrep_handle_SR_rollback() to skip store_globals() calls
if the transaction was BF aborting itself.
Made mysql-wsrep-features#165 more deterministic by waiting until
the update is in progress before sending next update.
with UNION ALL after INTERSECT
EXPLAIN EXTENDED erroneously showed UNION instead of UNION ALL in
the warning if UNION ALL followed INTERSECT or EXCEPT operations.
The bug was in the function st_select_lex_unit::print() that printed
the text of the query used in the warning.
sql_field->key_length was 0 for blob fields when a field was
being added, but Field_blob::character_octet_length() on
subsequent ALTER TABLE's (when the Field object in the old table
already existed). This means mysql_prepare_create_table() couldn't
reliably detect if the keyseg was a prefix.
This patch implements engine independent unique hash index.
Usage:- Unique HASH index can be created automatically for blob/varchar/test column whose key
length > handler->max_key_length()
or it can be explicitly specified.
Automatic Creation:-
Create TABLE t1 (a blob unique);
Explicit Creation:-
Create TABLE t1 (a int , unique(a) using HASH);
Internal KEY_PART Representations:-
Long unique key_info will have 2 representations.
(lets understand this with an example create table t1(a blob, b blob , unique(a, b)); )
1. User Given Representation:- key_info->key_part array will be similar to what user has defined.
So in case of example it will have 2 key_parts (a, b)
2. Storage Engine Representation:- In this case there will be only one key_part and it will point to
HASH_FIELD. This key_part will be always after user defined key_parts.
So:- User Given Representation [a] [b] [hash_key_part]
key_info->key_part ----^
Storage Engine Representation [a] [b] [hash_key_part]
key_info->key_part ------------^
Table->s->key_info will have User Given Representation, While table->key_info will have Storage Engine
Representation.Representation can be changed into each other by calling re/setup_keyinfo_hash function.
Working:-
1. So when user specifies HASH_INDEX or key_length is > handler->max_key_length(), In mysql_prepare_create_table
One extra vfield is added (for each long unique key). And key_info->algorithm is set to HA_KEY_ALG_LONG_HASH.
2. In init_from_binary_frm_image values for hash_keypart is set (like fieldnr , field and flags)
3. In parse_vcol_defs, HASH_FIELD->vcol_info is created. Item_func_hash is used with list of Item_fields,
When Explicit length is given by user then Item_left is used to concatenate Item_field values.
4. In ha_write_row/ha_update_row check_duplicate_long_entry_key is called which will create the hash key from
table->record[0] and then call ha_index_read_map , if we found duplicated hash , we will compare the result
field by field.
and, again, *don't use thd->clear_error()*
this fixed main.sp_notembedded failure on various amd64 platforms
(where ER_STACK_OVERRUN_NEED_MORE happens to fire in open_stat_tables()
under Dummy_error_handler)
After FLUSH PRIVILEGES remember if the connection started under
--skip-grant-tables and keep it all-powerful, not a lowly anonymous.
One could use this connection to reset passwords as needed.
Also fix a crash in SHOW CREATE USER
post-merge changes:
* handle password expiration on old tables like everything else -
make changes in memory, even if they cannot be done on disk
* merge "debug" tests with non-debug tests, they don't use dbug anyway
* only run rpl password expiration in MIXED mode, it doesn't replicate
anything, so no need to repeat it thrice
* restore update_user_table_password() prototype, it should not change
ACL_USER, this is done in acl_user_update()
* don't parse json twice in get_password_lifetime and get_password_expired
* remove LEX_USER::is_changing_password, see if there was any auth instead
* avoid overflow in expiration calculations
* don't initialize Account_options in the constructor, it's bzero-ed later
* don't create ulong sysvars - they're not portable, prefer uint or ulonglong
* misc simplifications
This patch adds support for expiring user passwords.
The following statements are extended:
CREATE USER user@localhost PASSWORD EXPIRE [option]
ALTER USER user@localhost PASSWORD EXPIRE [option]
If no option is specified, the password is expired with immediate
effect. If option is DEFAULT, global policy applies according to
the default_password_lifetime system var (if 0, password never
expires, if N, password expires every N days). If option is NEVER,
the password never expires and if option is INTERVAL N DAY, the
password expires every N days.
The feature also supports the disconnect_on_expired_password system
var and the --connect-expired-password client option.
Closes#1166
* inject portion of time updates into mysql_delete main loop
* triggered case emits delete+insert, no updates
* PORTION OF `SYSTEM_TIME` is forbidden
* `DELETE HISTORY .. FOR PORTION OF ...` is forbidden as well
Apparently DBUG_ASSERT() can co-exist with DBUG_OFF when
-DCMAKE_CXX_FLAGS="-DDBUG_ASSERT_AS_PRINTF".
Removed assertion as it is useless now, since the type is unsigned.
The replayer did not signal replaying waiters. Added
mysql_cond_broadcast() after replaying is over.
Assertion on client error failed after replay attempt failed due
to certification failure. At this point the transaction does not
go through client state, so the client error cannot be overridden.
Assign ER_LOCK_DEADLOCK to thd directly instead.
Use timed cond wait when waiting for replayers to finish and
check if the transaction has been BF aborted during the wait.
Temporary disable WSREP while executing RESET MASTER. In situation when 2 nodes are both master/slave first stop slave on both and than reset master.
Enforce stricter causality check with wsrep_sync_wait.
Optimized the code that removed multiple equalities pushed from HAVING
into WHERE. Now this removal is postponed until all multiple equalities
are eliminated in substitute_for_best_equal_field().
The variable controls the amount of sampling analyze table performs.
If ANALYZE table with histogram collection is too slow, one can reduce the
time taken by setting analyze_sample_percentage to a lower value of the
total number of rows.
Setting it to 0 will use a formula to compute how many rows to sample:
The number of rows collected is capped to a minimum of 50000 and
increases logarithmically with a coffecient of 4096. The coffecient is
chosen so that we expect an error of less than 3% in our estimations
according to the paper:
"Random Sampling for Histogram Construction: How much is enough?”
– Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.
The drawback of sampling is that avg_frequency number is computed
imprecisely and will yeild a smaller number than the real one.
The add method does not need to provide the row order number. It was
only used to detect if the minimum/maximum value was populated once or not, so
as to force an update for the first encounter of a value.
Remove CMake INSTALL command for COMPONENT DataFiles.
mysql_install_db.exe will calculate default datadir, so that it can be
called without any parameters.
The check for streaming replication logging format in
THD::decide_logging_format() did the check also for DDLs running
in TOI mode. This caused DROP DATABASE to fail if streaming
replication was enabled.
Added check for THD wsrep execution mode and perform the check
only if the THD is in local processing mode (i.e. not TOI).
Added galera_sr_create_drop test to verify that CREATE/DROP
statements pass even if streaming replication is on.
in the tree bb-10.4-mdev7486
The crash was caused because of the similar problem as in mdev-16765:
Item_cond::excl_dep_on_group_fields_for_having_pushdown() was missing.
If we instantly change the size of a fixed-length field
and treat it as kind-of variable-length, then we will need
conversions between old column values and new ones.
I tried adding such a conversion to row_build(), but then I
noticed that more conversions would be needed, because
old values still appeared in a freshly rebuilt secondary index,
causing a mismatch when trying to search with the correct
longer value that was converted in my provisional fix to row_build().
So, we will revert the essential part of
MDEV-15563: Instant ROW_FORMAT=REDUNDANT column extension
(commit 22feb179ae), but not
remove any tests.
Make sure that the Annotate_rows_log_events is written into
binlog only for the first fragment of the current statement.
Also avoid flusing pending rows event when calculating bytes
generated by the transaction.
Added and recorded a test which verifies that the binlog
contains only one Annotate_rows_log_event per statement
with various SR settings. Recrded mysql-wsrep-features#136
which produced different output with excession log events
suppressed.
Change the defaults:
-histogram_size=0
+histogram_size=254
-histogram_type=SINGLE_PREC_HB
+histogram_type=DOUBLE_PREC_HB
Adjust the testcases:
- Some have ignorable changes in EXPLAIN outputs and
more counter increments due to EITS table reads.
- Testcases that meaningfully depend on the old defaults
are changed to use the old values.
Fix clang warning: 'this' pointer cannot be null in well-defined C++ code;
pointer may be assumed to always convert to true
The only caller of TABLE::best_range_rowid_filter_for_partial_join()
already seems to be assuming that s->table != NULL.
When node is JOINER and bin-log is enabled but bin-log-index is not set in configuration, we use NULL pointer which causes segfault.
Fixed by checking for NULL pointer before using variable.
Condition can be pushed from the HAVING clause into the WHERE clause
if it depends only on the fields that are used in the GROUP BY list
or depends on the fields that are equal to grouping fields.
Aggregate functions can't be pushed down.
How the pushdown is performed on the example:
SELECT t1.a,MAX(t1.b)
FROM t1
GROUP BY t1.a
HAVING (t1.a>2) AND (MAX(c)>12);
=>
SELECT t1.a,MAX(t1.b)
FROM t1
WHERE (t1.a>2)
GROUP BY t1.a
HAVING (MAX(c)>12);
The implementation scheme:
1. Extract the most restrictive condition cond from the HAVING clause of
the select that depends only on the fields that are used in the GROUP BY
list of the select (directly or indirectly through equalities)
2. Save cond as a condition that can be pushed into the WHERE clause
of the select
3. Remove cond from the HAVING clause if it is possible
The optimization is implemented in the function
st_select_lex::pushdown_from_having_into_where().
New test file having_cond_pushdown.test is created.
Galera versions below 4.x do not generate unique sequence number
for view events. Take this into account when writing the SE checkpoint
to avoid debug assertion in InnoDB.
Field_str::is_equal(): Do not allow instant conversions between
BIT (which is stored big-endian) and integer types (which can
be stored big-endian or little-endian, depending on storage engine).
row_sel_field_store_in_mysql_format_func(): Properly extend
narrower integer and DATA_FIXBINARY values to the current format.
DATA_FIXBINARY was incorrectly padded with 0x20 instead of 0.
1. Renaming Type_handler_json to Type_handler_json_longtext
There will be other JSON handlers soon, e.g. Type_handler_json_varchar.
2. Making the code more symmetric for data types:
- Adding a new virtual method
Type_handler::Column_definition_validate_check_constraint()
- Moving JSON-specific code from sql_yacc.yy to
Type_handler_json_longtext::Column_definition_validate_check_constraint()
3. Adding new files sql_type_json.cc and sql_type_json.h
and moving Type_handler+JSON related code into these files.
Allow ALGORITHM=INSTANT (or avoid touching any data)
when changing the collation, or in some cases, the character set,
of a non-indexed CHAR or VARCHAR column. There is no penalty
for subsequent DDL or DML operations, and compatibility with
older MariaDB versions will be unaffected.
Character sets may be changed when the old encoding is compatible
with the new one. For example, changing from ASCII to anything
ASCII-based, or from 3-byte to 4-byte UTF-8 can sometimes be
performed instantly.
This is joint work with Eugene Kosov.
The test cases as well as ALTER_CONVERT_TO, charsets_are_compatible(),
Type_handler::Charsets_are_compatible() are his work.
The Field_str::is_equal(), Field_varstring::is_equal() and
the InnoDB changes were mostly rewritten by me due to conflicts
with MDEV-15563.
Limitations:
Changes of indexed columns will still require
ALGORITHM=COPY. We should allow ALGORITHM=NOCOPY and allow
the indexes to be rebuilt inside the storage engine,
without copying the entire table.
Instant column size changes (in bytes) are not supported by
all storage engines.
Instant CHAR column changes are only allowed for InnoDB
ROW_FORMAT=REDUNDANT. We could allow this for InnoDB
when the CHAR internally uses a variable-length encoding,
say, when converting from 3-byte UTF-8 to 4-byte UTF-8.
Instant VARCHAR column changes are allowed for InnoDB
ROW_FORMAT=REDUNDANT, and for others only if the size
in bytes does not change from 128..255 bytes to more
than 256 bytes.
Inside InnoDB, this slightly changes the way how MDEV-15563
works and fixes the result of the innodb.instant_alter_extend test.
We change the way how ALTER_COLUMN_EQUAL_PACK_LENGTH_EXT
is handled. All column extension, type changes and renaming
now go through a common route, except when ctx->is_instant()
is in effect, for example, instant ADD or DROP COLUMN has
been initiated. Only in that case we will go through
innobase_instant_try() and rewrite all column metadata.
get_type(field, prtype, mtype, len): Convert a SQL data type into
InnoDB column metadata.
innobase_rename_column_try(): Remove the update of SYS_COLUMNS.
innobase_rename_or_enlarge_column_try(): New function,
replacing part of innobase_rename_column_try() and all of
innobase_enlarge_column_try(). Also changes column types.
innobase_rename_or_enlarge_columns_cache(): Also change
the column type.
move account options from LEX to Account_options structure
namely, mqh and ssl_*
Also, use LEX_CSTRING for ssl_*/x509_* strings and move
setting of ACL_USER::account_locked where it belongs
Add server support for user account locking.
This patch extends the ALTER/CREATE USER statements for
denying a user's subsequent login attempts:
ALTER USER
user [, user2] ACCOUNT [LOCK | UNLOCK]
CREATE USER
user [, user2] ACCOUNT [LOCK | UNLOCK]
The SHOW CREATE USER statement was updated to display the
locking state of an user.
Closes#1006
* Removed all references related to wsrep_thd_pool (which was removed)
* Removed unused declarations in wsrep_schema.h
* The following would result invalid reads in
Wsrep_schema::replay_transaction():
```
frag_table->field[4]->val_str(&buf);
Wsrep_schema_impl::end_index_scan(frag_table);
Wsrep_schema_impl::finish_stmt(thd);
ret= wsrep_apply_events(thd, rli, buf.c_ptr_safe(), buf.length());
```
because `buf` was accessed after closing the table. The fix is to
perform storage reads using a different THD.
* In Wsrep_schema::recover_sr_transactions(), cluster_table was opened
for write, however it is only read here. And frag_table was opened
for read, wereas write is potentially needed.
Also, avoid copy caused by String::c_ptr() to zero terminate the c
string, use c_ptr_quick instead.
Due to inconsistent usage of different cost models to calculate
the cost of ref accesses we have to make the calculation of the
gain promising by usage a range filter more complex.
Global variable wsrep_debug now can be used to filter wsrep-lib messages based on debug level provided.
Type of wsrep_debug is now set to be unsigned int, so tests and configuration files changed accordingly.
When creating a field of type JSON, it will be automatically
converted to TEXT with CHECK (json_valid(`a`)), if there wasn't any
previous check for the column.
Additional things:
- Added two bug fixes that was found while testing JSON. These bug
fixes has also been pushed to 10.3 (with a test case), but as they
where minimal and needed to get this task done and tested, the fixes
are repeated here.
- CREATE TABLE ... SELECT drops constraints for columns that
are both in the create and select part.
- If one has both a default expression and check constraint for a
column, one can get the error "Expression for field `a` is refering
to uninitialized field `a`.
- Removed some duplicate MYSQL_PLUGIN_IMPORT symbols
- CREATE TABLE ... SELECT drops constraints for columns that
are both in the create and select part.
- Fixed by copying the constraint in
Column_definiton::redefine_stage1_common()
- If one has both a default expression and check constraint for a
column, one can get the error "Expression for field `a` is refering
to uninitialized field `a`.
- Fixed by ignoring default expressions for current column when checking
for CHECK constraint
This was developed by Aleksey Midenkov based on my design.
In the original InnoDB storage format (that was retroactively named
ROW_FORMAT=REDUNDANT in MySQL 5.0.3), the length of each index field
is stored explicitly.
Because of this, we can and now will allow instant conversion from
VARCHAR to CHAR or VARBINARY to BINARY of equal or greater size,
as well as instant conversion of TINYINT to SMALLINT to MEDIUMINT
to INT to BIGINT (while not changing between signed and unsigned).
Theoretically, we could allow changing from an unsigned integer to
a bigger unsigned integer, as well as changing CHAR to VARCHAR, but
that would require additional metadata and conversions whenever
reading old records.
Field_str::is_equal(), Field_varstring::is_equal(), Field_num::is_equal():
Return the new result IS_EQUAL_PACK_LENGTH_EXT if the table advertises
HA_EXTENDED_TYPES_CONVERSION capability and we are considering the
above-mentioned conversions.
ALTER_COLUMN_EQUAL_PACK_LENGTH_EXT: A new ALTER TABLE flag, similar
to ALTER_COLUMN_EQUAL_PACK_LENGTH but requiring conversions when
reading the data. The Field::is_equal() result IS_EQUAL_PACK_LENGTH_EXT
will map to this flag.
dtype_get_fixed_size_low(): For BINARY, CHAR and integer columns
in ROW_FORMAT=REDUNDANT, return 0 (variable length) from now on.
dtype_get_sql_null_size(): Keep returning the current size for
BINARY, CHAR and integer columns, so that in ROW_FORMAT=REDUNDANT
it will remain possible to update in place between NULL and NOT NULL
values.
btr_index_rec_validate(): Relax a CHECK TABLE length check for
ROW_FORMAT=REDUNDANT tables.
btr_cur_instant_init_low(): No longer trust fixed_len
for ROW_FORMAT=REDUNDANT tables.
We cannot rely on fixed_len anymore because the record can have shorter
length from before instant extension. Note that importing such tablespace
into earlier MariaDB versions produces ER_TABLE_SCHEMA_MISMATCH when
using a .cfg file.
In the original InnoDB storage format (which was retroactively named
ROW_FORMAT=REDUNDANT in MySQL 5.0.3), the length of each index field
is stored explicitly. Thus, we can and from now on will allow arbitrary
extension of VARBINARY and VARCHAR columns when the table is in
ROW_FORMAT=REDUNDANT.
ha_innobase::open(): Advertise a new HA_EXTENDED_TYPES_CONVERSION
capability for ROW_FORMAT=REDUNDANT tables.
Field_varstring::is_equal(): If the HA_EXTENDED_TYPES_CONVERSION
capability is advertised for the table, return IS_EQUAL_PACK_LENGTH
for any length extension.
For up to 127 bytes length, InnoDB would use 1 byte for length, and
that byte would always be less than 128. If the maximum length is
longer than 255 bytes, InnoDB would use a variable-length encoding
for the length, using 1 byte for lengths up to 127 bytes, and
2 bytes for longer lengths.
Thus, 1-byte lengths are always compatible when the maximum size
changes from less than 128 bytes to anything longer.
Field_varstring::is_equal(): Return IS_EQUAL_PACK_LENGTH also when
converting from VARCHAR less than 128 bytes to any longer VARCHAR.
No need to call list.empty(): first one is called by List constructor,
second one doesn't make sense as the object is destroyed immediately
afterwards.
This task involves the implementation for the optimizer trace.
This feature produces a trace for any SELECT/UPDATE/DELETE/,
which contains information about decisions taken by the optimizer during
the optimization phase (choice of table access method, various costs,
transformations, etc). This feature would help to tell why some decisions were
taken by the optimizer and why some were rejected.
Trace is session-local, controlled by the @@optimizer_trace variable.
To enable optimizer trace we need to write:
set @@optimizer_trace variable= 'enabled=on';
To display the trace one can run:
SELECT trace FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
This task also involves:
MDEV-18489: Limit the memory used by the optimizer trace
introduces a switch optimizer_trace_max_mem_size which limits
the memory used by the optimizer trace. This was implemented by
Sergei Petrunia.
If wsrep_load_data_splitting is configured, change streaming replication
parameters internally to match the original behavior, i.e. replicate
on every 10000 rows. After load data is over, restore original
streaming replication settings.
Removed redundant wsrep_tc_log_commit().
The code was rewritten in the same way as the code of
ha_partition::multi_range_read_info_const() had been rewritten
earlier.
The fix allowed to run spider.partition_mrr.
Find indexes of one table which parts participate in one constraint.
These indexes are called constraint correlated.
New methods: TABLE::find_constraint_correlated_indexes() and
virtual method check_index_dependence() were added.
For each index it's own constraint correlated index map was created
where all indexes that are constraint correlated with the current are
marked.
The results of this task are used for MDEV-16188 (Use in-memory
PK filters built from range index scans).
* Donor node will now provide binlog-index argument to wsrep_sst_rsync script if binlog is used.
* Write correct path and binlog file names into joiner binlog-index file
MDEV-17631 select_handler for a full query pushdown
Interfaces + Proof of Concept for federatedx with test cases.
The interfaces have been developed for integration of ColumnStore engine.
ANALYZE and ANALYZE FORMAT=JSON structures are changed in the way that they
show additional information when rowid filter is used:
- r_selectivity_pct - the observed filter selectivity
- r_buffer_size - the size of the rowid filter container buffer
- r_filling_time_ms - how long it took to fill rowid filter container
New class Rowid_filter_tracker was added. This class is needed to collect data
about how rowid filter is executed.
renaming columns in a CHECK constraint during ALTER TABLE
taints the original TABLE and requires m_need_reopen=1.
In this case, though, renaming was redundant, so just don't do it.
remove TABLE_SHARE::error_table_name() and TABLE_SHARE::orig_table_name
(that was allocated in a wrong memroot in this bug).
instead, simply set TABLE_SHARE::table_name correctly.
introduce the syntax
... IDENTIFIED { WITH | VIA }
plugin [ { USING | AS } auth ]
[ OR plugin [ { USING | AS } auth ]
[ OR ... ]]
Server will try auth plugins in the specified order until the first
success. No protocol changes, server uses the existing "switch plugin"
packet.
The auth chain is stored in json as
"auth_or":[{"plugin":"xxx","authentication_string":"yyy"},
{},
{"plugin":"foo","authentication_string":"bar"},
...],
"plugin":"aaa", "authentication_string":"bbb"
Note:
* "auth_or" implies that there might be "auth_and" someday;
* one entry in the array is an empty object, meaning to take plugin/auth
from the main json object. This preserves compatibility with
the existing mysql.global_priv table and with the mysql.user view.
This entry is preferrably a mysql_native_password plugin for a
non-empty mysql.user.password column.
SET PASSWORD is supported and changes the password for the *first*
plugin in the chain that has a notion of a "password"
Revert the side effect of 7c40996cc8.
Do not convert password hash to its binary representation when a user
entry is loaded. Do it lazily on the first authenticatation attempt.
As a collateral - force all authentication plugins to follow the
protocol and read_packet at least once before accessing info->username
(username is not available before first client handshake packet is read).
Fix PAM and GSSAPI plugins to behave.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.
wsrep_certification_rules: Define as a weak global symbol.
While there are separate _embedded.a for statically
linked storage engine plugins, there is only one ha_innodb.so
which is supposed to work with both values of WITH_WSREP.
The merge from 10.0-galera introduced a reference to a global
variable that is only defined when the server is built WITH_WSREP.
We must define that symbol as weak global, so that when
a dynamically linked InnoDB or XtraDB is used with the embedded
server (which never includes write-set replication patches),
the variable will be read as 0, instead of causing a failure to
load the InnoDB or XtraDB plugin.
No need to lowercase table names on case-sensitive file systems, as the
cache won't contain the 'lowercased' table anyway. And it prevents the
UPPERCASE.frm from being deleted.
Do not try to write ER_SHUTDOWN error message to socket, when it is forcefully closed by the shutdown.
This will avoid the race condition (attempt to write to closed socket, if connection shuts down by itself).
Analysis:
========
Increasing the length of the indexed varchar column is not an instant operation for
innodb.
Fix:
===
- Introduce the new handler flag 'Alter_inplace_info::ALTER_COLUMN_INDEX_LENGTH' to
indicate the index length differs due to change of column length changes.
- InnoDB makes the ALTER_COLUMN_INDEX_LENGTH flag as instant operation.
This is a port of Mysql fix.
commit 913071c0b16cc03e703308250d795bc381627e37
Author: Nisha Gopalakrishnan <nisha.gopalakrishnan@oracle.com>
Date: Wed May 30 14:54:46 2018 +0530
BUG#26848813: INDEXED COLUMN CAN'T BE CHANGED FROM VARCHAR(15)
TO VARCHAR(40) INSTANTANEOUSLY
Signal handler is now respoinsible for setting abort_loop and breaking
poll() in main thread. The rest is handled by main thread itself.
Removed redundant LOCK_error_log init/destroy wrappers.
Removed redundant unireg_end(): it is trivial and it has only one caller.
Removed unused ready_to_exit from PFS.
Removed kill_in_progress: duplicates abort_loop.
Removed shutdown_in_progress: duplicates abort_loop.
Removed ready_to_exit: was used to make sure main thread waits for
cleanups, which are now done by main thread itself.
Removed SIGNALS_DONT_BREAK_READ, MAYBE_BROKEN_SYSCALL,
kill_broken_server: never defined/used.
Make clean_up() static.
fill_status.
Also, remove LOCK_status around calc_sum_of_all_status()
Also, rename LOCK_show_status into LOCK_all_status_vars.
This reflects the variable the lock protects.
modifications (insert/erase) are protected by write lock
iteration over list is protected by read lock.
This way, threads that iterate over the list (as in SHOW PROCESSLIST,
SHOW GLOBAL STATUS) do not block each other.
In contrast to thread_count, which is decremented by THD destructor,
this one was most probably intended to be decremented after all THD
destructors are done.
THD_count class was added to achieve similar effect with thread_count.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
Implemented and integrated THD_list as a replacement for the global
thread list. It uses own mutex instead of LOCK_thread_count for THD
list protection.
Removed unused first_global_thread() and next_global_thread().
delayed_insert_threads is now protected by LOCK_delayed_insert. Although
this patch doesn't fix very wrong synchronization of this variable.
After this patch there are only 2 legitimate uses of LOCK_thread_count
left, both in mysqld.cc: thread_count and ready_to_exit.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
LOG_INFO::lock was useless. It could've only protect against concurrent
iterators execution, which was already protected by LOCK_thread_count.
Use LOCK_thd_data instead of LOCK_thread_count as a protection against
THD::current_linfo reset.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
Bootstrap in a separate thread was introduced in 746f0b3b7 to workaround
OS/2 small stack size. OS/2 support was discontinued in 2006 and modern
operating systems have default stack size a few times larger than
default thread_stack and it is tunable.
Aim is to reduce usage of LOCK_thread_count and COND_thread_count.
Part of MDEV-15135.
If the TC log did not provide list of XIDs to recover, the
commit by XID was skipped during wsrep recovery if binlog emulation
was on. However, with wsrep we want to commit every prepared transaction
with assigned wsrep XID since the transaction has already been
committed in the cluster.
Added a special condition to always proceed to commit by XID in
xarecover_handlerton() if binlog is off and the recovered transaction
has wsrep XID.
This patch contains the port of the MDEV-18379 patch
for 10.1 branch, but also includes a number of changes
made within MDEV-17835, which are necessary for the
normal operation of tests that use IPv6:
1) Fixed flaws in the galera_3nodes mtr suite control scripts,
because of which they could not work with mariabackup.
2) Fixed numerous bugs in the SST scripts and in the mtr test
files (galera_3nodes mtr suite) that prevented the use of Galera
with IPv6 addresses.
3) Fixed flaws in tests for rsync and mysqldump (for galera_3nodes
mtr tests suite). These tests were not performed successfully
without these fixes.
4) Currently, the three-node mtr suite for Galera (galera_3nodes)
uses a separate IPv6 availability check using the "have_ipv6.inc"
file. This check duplicates a more accurate check at suite.pm
level, which can be used by including the file "check_ipv6.inc".
This patch removes this discrepancy between suites.
5) GAL-501 test in the galera_3nodes suite does not contain the
option "--bind-address=::" which is needed for the test to work
correctly with IPv6 (at least on some systems), since without
it the server will not wait for connections on the IPv6 interface.
https://jira.mariadb.org/browse/MDEV-18379
and partially https://jira.mariadb.org/browse/MDEV-17835
Clear wsrep XID in innobase_rollback_by_xid() for recovered wsrep
transaction in order to avoid resetting XID storage when rolling back
wsrep transaction during recovery.
Sort wsrep XIDs read from storage engine in ascending order and
erify that the range is continuous during crash recovery. If binlog is off,
commit all recovered transactions for continuous seqno range. This is safe
because all transactions with wsrep XID have been certified and must be
committed in the cluster. On the other hand if binlog is on, respect binlog
as a transaction coordinator in order to avoid missing transactions in binlog
that have been committed into storage engine .
The problem was originally stated in
http://bugs.mysql.com/bug.php?id=82212
The size of an base64-encoded Rows_log_event exceeds its
vanilla byte representation in 4/3 times.
When a binlogged event size is about 1GB mysqlbinlog generates
a BINLOG query that can't be send out due to its size.
It is fixed with fragmenting the BINLOG argument C-string into
(approximate) halves when the base64 encoded event is over 1GB size.
The mysqlbinlog in such case puts out
SET @binlog_fragment_0='base64-encoded-fragment_0';
SET @binlog_fragment_1='base64-encoded-fragment_1';
BINLOG @binlog_fragment_0, @binlog_fragment_1;
to represent a big BINLOG.
For prompt memory release BINLOG handler is made to reset the BINLOG argument
user variables in the middle of processing, as if @binlog_fragment_{0,1} = NULL
is assigned.
Notice the 2 fragments are enough, though the client and server still may
need to tweak their @@max_allowed_packet to satisfy to the fragment
size (which they would have to do anyway with greater number of
fragments, should that be desired).
On the lower level the following changes are made:
Log_event::print_base64()
remains to call encoder and store the encoded data into a cache but
now *without* doing any formatting. The latter is left for time
when the cache is copied to an output file (e.g mysqlbinlog output).
No formatting behavior is also reflected by the change in the meaning
of the last argument which specifies whether to cache the encoded data.
Rows_log_event::print_helper()
is made to invoke a specialized fragmented cache-to-file copying function
which is
copy_cache_to_file_wrapped()
that takes care of fragmenting also optionally wraps encoded
strings (fragments) into SQL stanzas.
my_b_copy_to_file()
is refactored to into my_b_copy_all_to_file(). The former function
is generalized
to accepts more a limit argument to constraint the copying and does
not reinitialize anymore the cache into reading mode.
The limit does not do any effect on the fully read cache.
always logged properly with binlog_row_image=MINIMAL
There are two issues fixed in this commit.
The first is an observation of a multi-table UPDATE binlogged
in row-format in binlog_row_image=MINIMAL mode. While the UPDATE aims
at a table with an ON-UPDATE attribute its binlog after-image misses
to record also installed default value.
The reason for that turns out missed marking of default-capable fields
in TABLE::write_set.
This is fixed to mark such fields similarly to 10.2's MDEV-10134 patch (db7edfed17)
that introduced it. The marking follows up 93d1e5ce0b841bed's idea
to exploit TABLE:rpl_write_set introduced there though,
and thus does not mess (in 10.1) with the actual MDEV-10134 agenda.
The patch makes formerly arg-less TABLE::mark_default_fields_for_write()
to accept an argument which would be TABLE:rpl_write_set.
The 2nd issue is extra columns in in binlog_row_image=MINIMAL before-image
while merely a packed primary key is enough. The test main.mysqlbinlog_row_minimal
always had a wrong result recorded.
This is fixed to invoke a function that intended for read_set
possible filtering and which is called (supposed to) in all type of MDL, UPDATE
including; the test results have gotten corrected.
At *merging* from 10.1->10.2 the 1st "main" part of the patch is unnecessary
since the bug is not observed in 10.2, so only hunks from
sql/sql_class.cc
are required.
Calling st_select_lex::update_used_tables in JOIN::optimize_unflattened_subqueries
only when we are sure that the join have not been cleaned up.
This can happen for a case when we have a non-merged semi-join and an impossible
where which would lead to the cleanup of the join which has the non-merged semi-join
ASAN noticed a freed memory access during EXECUTE in this script:
PREPARE stmt FROM "SELECT 'x' ORDER BY NAME_CONST( 'f', 'foo' )";
EXECUTE stmt;
In case of a PREPARE statement, all Items, including Item_name_const,
are created on Prepared_statement::main_mem_root.
Item_name_const::fix_fields() did not take this into account
and could allocate the value of Item::name on a wrong memory root,
in this code:
if (is_autogenerated_name)
{
set_name(thd, item_name->c_ptr(), (uint) item_name->length(),
system_charset_info);
}
When fix_fields() is called in the reported SQL script, THD's arena already
points to THD::main_mem_root rather than to Prepared_statement::main_mem_root,
so Item::name was allocated on THD::main_mem_root.
Then, at the end of the dispatch_command() for the PREPARE statement,
THD::main_mem_root got cleared. So during EXECUTE, Item::name
pointed to an already freed memory.
This patch changes the code to set the implicit name for Item_name_const
at the constructor time rather than at fix_fields time. This guarantees
that Item_name_const and its Item::name always reside on the same memory root.
Note, this change makes the code for Item_name_const symmetric with other
constant-alike items that set their default implicit names at the constructor
call time rather than at fix_fields() time:
- Item_string
- Item_int
- Item_real
- Item_decimal
- Item_null
- Item_param
Problem:
========
Server fails to notify the engine by not setting the ADD_PK_INDEX and
DROP_PK_INDEX When there is a
i) Change in candidate for primary key.
ii) New candidate for primary key.
Fix:
====
Server sets the ADD_PK_INDEX and DROP_PK_INDEX while doing alter for the
above problematic case.
32 bit int
Row-based slave applier could not parse correctly the table id when
the value exceeded the max of 32 bit unsigned int.
The reason turns out in that the being parsed value placeholder
was sized as 4 bytes.
The type is fixed to ulonglong.
Additionally the patch works around Rows_log_event::m_table_id 4 bytes
size on 32 bits platforms. In case of last_table_id value overflows
the 4 byte max, there won't be the zero value for m_table_id generated
and the first wrapped-around value is one, this is thanks to excluding
UINT_MAX32 + 1 from TABLE_SHARE::table_map_id.
Issue:
------
When a subquery contains UNION the count of the number of
subquery columns is calculated incorrectly. Only the first
query block in the subquery's UNION is considered and an
array indexing goes out-of-bounds, and this is caught by an
assert.
Solution:
---------
Sum up the columns from all query blocks of the query
expression.
Change specific to 5.6/5.5:
---------------------------
The "child" points to the last query block of the UNION
(as opposed to 5.7+ where it points to the first member of
UNION). So "child->master_unit()->first_select()" is used
to reach the first query block of UNION.