This patch implements engine independent unique hash index.
Usage:- Unique HASH index can be created automatically for blob/varchar/test column whose key
length > handler->max_key_length()
or it can be explicitly specified.
Automatic Creation:-
Create TABLE t1 (a blob unique);
Explicit Creation:-
Create TABLE t1 (a int , unique(a) using HASH);
Internal KEY_PART Representations:-
Long unique key_info will have 2 representations.
(lets understand this with an example create table t1(a blob, b blob , unique(a, b)); )
1. User Given Representation:- key_info->key_part array will be similar to what user has defined.
So in case of example it will have 2 key_parts (a, b)
2. Storage Engine Representation:- In this case there will be only one key_part and it will point to
HASH_FIELD. This key_part will be always after user defined key_parts.
So:- User Given Representation [a] [b] [hash_key_part]
key_info->key_part ----^
Storage Engine Representation [a] [b] [hash_key_part]
key_info->key_part ------------^
Table->s->key_info will have User Given Representation, While table->key_info will have Storage Engine
Representation.Representation can be changed into each other by calling re/setup_keyinfo_hash function.
Working:-
1. So when user specifies HASH_INDEX or key_length is > handler->max_key_length(), In mysql_prepare_create_table
One extra vfield is added (for each long unique key). And key_info->algorithm is set to HA_KEY_ALG_LONG_HASH.
2. In init_from_binary_frm_image values for hash_keypart is set (like fieldnr , field and flags)
3. In parse_vcol_defs, HASH_FIELD->vcol_info is created. Item_func_hash is used with list of Item_fields,
When Explicit length is given by user then Item_left is used to concatenate Item_field values.
4. In ha_write_row/ha_update_row check_duplicate_long_entry_key is called which will create the hash key from
table->record[0] and then call ha_index_read_map , if we found duplicated hash , we will compare the result
field by field.
* inject portion of time updates into mysql_delete main loop
* triggered case emits delete+insert, no updates
* PORTION OF `SYSTEM_TIME` is forbidden
* `DELETE HISTORY .. FOR PORTION OF ...` is forbidden as well
Due to inconsistent usage of different cost models to calculate
the cost of ref accesses we have to make the calculation of the
gain promising by usage a range filter more complex.
Find indexes of one table which parts participate in one constraint.
These indexes are called constraint correlated.
New methods: TABLE::find_constraint_correlated_indexes() and
virtual method check_index_dependence() were added.
For each index it's own constraint correlated index map was created
where all indexes that are constraint correlated with the current are
marked.
The results of this task are used for MDEV-16188 (Use in-memory
PK filters built from range index scans).
MDEV-17631 select_handler for a full query pushdown
Interfaces + Proof of Concept for federatedx with test cases.
The interfaces have been developed for integration of ColumnStore engine.
remove TABLE_SHARE::error_table_name() and TABLE_SHARE::orig_table_name
(that was allocated in a wrong memroot in this bug).
instead, simply set TABLE_SHARE::table_name correctly.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.
always logged properly with binlog_row_image=MINIMAL
There are two issues fixed in this commit.
The first is an observation of a multi-table UPDATE binlogged
in row-format in binlog_row_image=MINIMAL mode. While the UPDATE aims
at a table with an ON-UPDATE attribute its binlog after-image misses
to record also installed default value.
The reason for that turns out missed marking of default-capable fields
in TABLE::write_set.
This is fixed to mark such fields similarly to 10.2's MDEV-10134 patch (db7edfed17)
that introduced it. The marking follows up 93d1e5ce0b841bed's idea
to exploit TABLE:rpl_write_set introduced there though,
and thus does not mess (in 10.1) with the actual MDEV-10134 agenda.
The patch makes formerly arg-less TABLE::mark_default_fields_for_write()
to accept an argument which would be TABLE:rpl_write_set.
The 2nd issue is extra columns in in binlog_row_image=MINIMAL before-image
while merely a packed primary key is enough. The test main.mysqlbinlog_row_minimal
always had a wrong result recorded.
This is fixed to invoke a function that intended for read_set
possible filtering and which is called (supposed to) in all type of MDL, UPDATE
including; the test results have gotten corrected.
At *merging* from 10.1->10.2 the 1st "main" part of the patch is unnecessary
since the bug is not observed in 10.2, so only hunks from
sql/sql_class.cc
are required.
The error message modified.
Then the TABLE_SHARE::error_table_name() implementation taken from 10.3,
to be used as a name of the table in this message.
Part of MDEV-5336 Implement LOCK FOR BACKUP
- Added new locks to MDL_BACKUP for all stages of backup locks and
a new MDL lock needed for backup stages.
- Renamed MDL_BACKUP_STMT to MDL_BACKUP_DDL
- flush_tables() takes a new parameter that decides what should be flushed.
- InnoDB, Aria (transactional tables with checksums), Blackhole, Federated
and Federatedx tables are marked to be safe for online backup. We are
using MDL_BACKUP_TRANS_DML instead of MDL_BACKUP_DML locks for these
which allows any DML's to proceed for these tables during the whole
backup process until BACKUP STAGE COMMIT which will block the final
commit.
- Re-numbered enum_table_category to make some tests easier
- Moved TABLE_CATEGORY_INFORMATION to be first CATEGORY of virtual tables
- Don't take MDL locks for not updateable table category's
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
Race condition. field->flags were copied from s->field->flags during
field->clone(), early in open_table_from_share(). But s->field->flags
were getting their PART_INDIRECT_KEY_FLAG bit much later in
TABLE::mark_columns_used_by_virtual_fields() and only once per share.
If two threads were executing the code between field->clone()
and mark_columns_used_by_virtual_fields() at the same time, only
one would get PART_INDIRECT_KEY_FLAG bits in field[].
table for purge thread
Problem:
=======
Purge tries to fetch mdl lock for the whole table even though it tries
to open one of the partition. But table name length was wrongly set to indicate
the partition name too.
Solution:
========
- Table name length should identify the table name only not the partition name.
The optimizer erroneously allowed to use join cache when joining a
splittable materialized table together with splitting optimization.
As a consequence in some rare cases the server returned wrong result
sets for queries with materialized derived.
This patch allows to use either join cache without usage of splitting
technique for materialization of a splittable derived table or splitting
without usage of join cache when joining such table. The costs the these
alternatives are compared and the best variant is chosen.
ALTER TABLE locks the table with TL_READ_NO_INSERT, to prevent the
source table modifications while it's being copied. But there's an
indirect way of modifying a table, via cascade FK actions.
After previous commits, an attempt to modify an FK parent table
will cause FK children to be prelocked, so the table-being-altered
cannot be modified by a cascade FK action, because ALTER holds a
lock and prelocking will wait.
But if a new FK is being added by this very ALTER, then the target
table is not locked yet (it's a temporary table). So, we have to
lock FK parents explicitly.
The problem was that join_columns creation was not finished due to error of notfound column in USING, but next execution tried to use join_columns lists.
Solution is cleanup the lists on error. It can eat memory in statement MEM_ROOT but it is an error and error will be fixed or statement/procedure removed/altered.
The previous correction of the patch for mdev-16473 did not work
correctly for the databases whose names started with '*'.
Added a test case with a database named "*".
The bug was that innobase_get_computed_value() trashed record[0] and data
in Field_blob::value
Fixed by using a record on the heap for innobase_get_computed_value()
Reviewer: Marko Mäkelä
This is to mark that a field is indirectly part of a key, which simplifes
checking if we need to have this field up to date to evaluate a key.
For example:
CREATE TABLE t1 (a int, b int as (a) virtual,
c int as (b) virtual, index(c))
would mark a and b with PART_INDIRECT_KEY_FLAG.
c is marked with PART_KEY_FLAG as before.
The logic and the implementation scheme are similar with the
MDEV-9197 Pushdown conditions into non-mergeable views/derived tables
How the push down is made on the example:
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y) from t2 group by x);
-->
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y)
from t2
where x>3
group by x
having max(y)>10);
The implementation scheme:
1. Search for the condition cond that depends only on the fields
from the left part of the IN subquery (left_part)
2. Find fields F_group in the select of the right part of the
IN subquery (right_part) that are used in the GROUP BY
3. Extract from the cond condition cond_where that depends only on the
fields from the left_part that stay at the same places in the left_part
(have the same indexes) as the F_group fields in the projection of the
right_part
4. Transform cond_where so it can be pushed into the WHERE clause of the
right_part and delete cond_where from the cond
5. Transform cond so it can be pushed into the HAVING clause of the right_part
The optimization is made in the
Item_in_subselect::pushdown_cond_for_in_subquery() and is controlled by the
variable condition_pushdown_for_subquery.
New test file in_subq_cond_pushdown.test is created.
There are also some changes made for setup_jtbm_semi_joins().
Now it is decomposed into the 2 procedures: setup_degenerate_jtbm_semi_joins()
that is called before optimize_cond() for cond and setup_jtbm_semi_joins()
that is called after optimize_cond().
New setup_jtbm_semi_joins() is made in the way so that the result of its work is
the same as if it was called before optimize_cond().
The code that is common for pushdown into materialized derived and into materialized
IN subqueries is factored out into pushdown_cond_for_derived(),
Item_in_subselect::pushdown_cond_for_in_subquery() and
st_select_lex::pushdown_cond_into_where_clause().
MDEV-16100 FOR SYSTEM_TIME erroneously resolves string user variables as transaction IDs
Problem:
Vers_history_point::resolve_unit() tested item->result_type() before
item->fix_fields() was called.
- Item_func_get_user_var::result_type() returned REAL_RESULT by default.
This caused MDEV-16100.
- Item_func_sp::result_type() crashed on assert.
This caused MDEV-16094
Changes:
1. Adding item->fix_fields() into Vers_history_point::resolve_unit()
before using data type specific properties of the history point
expression.
2. Adding a new virtual method Type_handler::Vers_history_point_resolve_unit()
3. Implementing type-specific
Type_handler_xxx::Type_handler::Vers_history_point_resolve_unit()
in the way to:
a. resolve temporal and general purpose string types to TIMESTAMP
b. resolve BIT and general purpose INT types to TRANSACTION
c. disallow use of non-relevant data type expressions in FOR SYSTEM_TIME
Note, DOUBLE and DECIMAL data types are disallowed intentionally.
- DOUBLE does not have enough precision to hold huge BIGINT UNSIGNED values
- DECIMAL rounds on conversion to INT
Both lack of precision and rounding might potentionally lead to
very unpredictable results when a wrong transaction ID would be chosen.
If one really wants dangerous use of DOUBLE and DECIMAL, explicit CAST
can be used:
FOR SYSTEM_TIME AS OF CAST(double_or_decimal AS UNSIGNED)
QQ: perhaps DECIMAL(N,0) could still be allowed.
4. Adding a new virtual method Item::type_handler_for_system_time(),
to make HEX hybrids and bit literals work as TRANSACTION rather
than TIMESTAMP.
5. sql_yacc.yy: replacing the rule temporal_literal to "TIMESTAMP TEXT_STRING".
Other temporal literals now resolve to TIMESTAMP through the new
Type_handler methods. No special grammar needed. This removed
a few shift/resolve conflicts.
(TIMESTAMP related conflicts in "history_point:" will be removed separately)
6. Removing the "timestamp_only" parameter from
vers_select_conds_t::resolve_units() and Vers_history_point::resolve_unit().
It was a hint telling that a table did not have any TRANSACTION-aware
system time columns, so it's OK to resolve to TIMESTAMP in case of uncertainty.
In the new reduction it works as follows:
- the decision between TIMESTAMP and TRANSACTION is first made
based only on the expression data type only
- then, in case if the expression resolved to TRANSACTION, the table
is checked if TRANSACTION-aware columns really exist.
This way is safer against possible ALTER TABLE statements changing
ROW START and ROW END columns from "BIGINT UNSIGNED" to "TIMESTAMP(x)"
or the other way around.
Make sure that SELECT_LEX_UNIT::derived, behaves as documented
(points to the "TABLE_LIST representing this union in the
embedding select"). For recursive CTE this was not necessarily
the case, it could've pointed to the TABLE_LIST inside the CTE,
not in the embedding select.
To fix:
* don't update unit->derived in mysql_derived_prepare(), pass derived
as an argument to st_select_lex_unit::prepare()
* prefer to set unit->derived in TABLE_LIST::init_derived()
to the TABLE_LIST in the embedding select, not to the recursive
reference. Fail if there are many TABLE_LISTs in the embedding
select with conflicting FOR SYSTEM_TIME clauses.
cleanup:
* remove redundant THD* argument from st_select_lex_unit::prepare()
For this case we have a view that is mergeable but we are not able to merge it in the
parent select because that would exceed the maximum tables allowed in the join list, so we
materialise this view
TABLE_LIST::dervied is NULL for such views, it is only set for views which have ALGORITHM=TEMPTABLE
Fixed by making sure TABLE_LIST::derived is set for views that could not be merged
Lots of changes:
* calculate the current history partition in ::external_lock(),
not in ::write_row() or ::update_row()
* remove dynamically collected per-partition row_end stats
* no full table scan in open_table_from_share to calculate these
stats, no manual MDL/thr_locks in open_table_from_share
* no shared stats in TABLE_SHARE = no mutexes or condition waits when
calculating current history partition
* always compare timestamps, don't convert them to MYSQL_TIME
(avoid DST ambiguity, and it's faster too)
* correct interval handling, 1 month = 1 month, not 30 * 24 * 3600 seconds
* save/restore first partition start time, and count intervals from there
* only allow to drop first partitions if INTERVAL
* when adding new history partitions, split the data in the last history
parition, if it was overflowed
* show partition boundaries in INFORMATION_SCHEMA.PARTITIONS
is not supported
Allowed to use recursive references in derived tables.
As a result usage of recursive references in operands of
INTERSECT / EXCEPT is now supported.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
This will make it easier to how memory allocation is done when debugging
with either DBUG or gdb.
Will especially help when debugging stored procedures
Main change is a name argument as second argument to init_alloc_root()
init_sql_alloc()
Other things:
- Added DBUG_ENTER/EXIT to some Virtual_tmp_table functions
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
Now we don't open partitions if it was explicitly cpecified.
ha_partition::m_opened_partition bitmap added to track
partitions that were actually opened.
and the system_versioning_transaction_registry variable.
The user enables transaction registry by specifying BIGINT for
row_start/row_end columns.
check mysql.transaction_registry structure on the first open,
not on startup. Avoid warnings unless transaction_registry
is actually used.
Many related changes.
Note that AS OF condition must always be pushed down to physical tables,
it cannot be applied to a derived or a view. Thus:
* no versioning for internal temporary tables, they can never store
historical data.
* remove special versioning code from mysql_derived_prepare and
remove ER_VERS_DERIVED_PROHIBITED - derived can have no historical
data and cannot be prohibited for system versioning related reasons.
* do not expand select list for derived/views with sys vers fields,
derived/views can never have historical data.
* remove special invisiblity rules for sys vers fields, they are no
longer needed after the previous change
* remove system_versioning_hide, it lost the meaning after the
previous change.
* remove ER_VERS_SYSTEM_TIME_CLASH, it's no "clash", the inner
AS OF clause always wins.
* non-versioned fields in a historical query
reword the warning text, downgrade to note, don't
replace values with NULLs
trx_undo_page_report_modify(): For SPATIAL INDEX, keep logging
updated off-page columns twice, so that
the minimum bounding rectangle (MBR) will be logged.
Avoiding the redundant logging would require larger changes
to the undo log format.
row_build_index_entry_low(): Handle SPATIAL_UNKNOWN more robustly,
by refusing to purge the record from the spatial index.
We can get this code when processing old undo log from 10.2.10 or
10.2.11 (the releases affected by MDEV-14799, which was a regression
from MDEV-14051).
If translation table present when we materialize the derived table then
change it to point to the materialized table.
Added debug info to see really what happens with what derived.
Other changes done to get this to work:
- Added 'internal_tables' to TABLE object to list which sequence tables
is needed to use the table.
- Mark any expression using DEFAULT() with LEX->default_used.
This is needed when deciding if we should open internal sequence
tables when a table is opened (we don't need to open sequence tables
if the main table is only used with SELECT).
- Create_and_open_temporary_table() can now also open all internal
sequence tables.
- Added option MYSQL_LOCK_USE_MALLOC to mysql_lock_tables()
to force memory allocation to be used with malloc instead of
memroot.
- Added flag to MYSQL_LOCK to remember if allocation was done with
malloc or memroot (makes code simpler and safer).
- init_one_table_for_prelocking() now takes argument for what lock to
use instead of it's a routine or something else.
- Renamed prelocking placeholders to make them more understandable as
they are now used in more code.
- Changed test in check_lock_and_start_stmt() if found table has correct
locks. The old test didn't work for tables that has lock
TL_WRITE_ALLOW_WRITE, which is what sequence tables are using.
- Added VCOL_NOT_VIRTUAL option to ensure that sequence functions can't
be used with virtual columns
- More sequence tests
Merge branch '10.3' into trunk
Both field_visibility and VERS_HIDDEN_FLAG exist independently.
TODO:
VERS_HIDDEN_FLAG should be replaced with SYSTEM_INVISIBLE (or COMPLETELY_INVISIBLE?).
Feature Definition:-
This feature adds invisible column functionality to server.
There is 4 level of "invisibility":
1. Not invisible (NOT_INVISIBLE) — Normal columns created by the user
2. A little bit invisible (USER_DEFINED_INVISIBLE) — columns that the
user has marked invisible. They aren't shown in SELECT * and they
don't require values in INSERT table VALUE (...). Otherwise
they behave as normal columns.
3. More invisible (SYSTEM_INVISIBLE) — Can be queried explicitly,
otherwise invisible from everything. Think ROWID sytem column.
Because they're invisible from ALTER TABLE and from CREATE TABLE
they cannot be created or dropped, they're created by the system.
User cant not create a column name which is same as of
SYSTEM_INVISIBLE.
4. Very invisible (COMPLETELY_INVISIBLE) — as above, but cannot be
queried either. They can only show up in EXPLAIN EXTENDED (might
be possible for a very invisible indexed virtual column) but
otherwise they don't exist for the user.If user creates a columns
which has same name as of COMPLETELY_INVISIBLE then
COMPLETELY_INVISIBLE column is renamed again. So it is completely
invisible from user.
Invisible Index(HA_INVISIBLE_KEY):-
Creation of invisible columns require a new type of index which
will be only visible to system. User cant see/alter/create/delete
this index. If user creates a index which is same name as of
invisible index then it will be renamed.
Syntax Details:-
Only USER_DEFINED_INVISIBLE column can be created by user. This
can be created by adding INVISIBLE suffix after column definition.
Create table t1( a int invisible, b int);
Rules:-
There are some rules/restrictions related to use of invisible columns
1. All the columns in table cant be invisible.
Create table t1(a int invisible); \\error
Create table t1(a int invisible, b int invisble); \\error
2. If you want invisible column to be NOT NULL then you have to supply
Default value for the column.
Create table t1(a int, b int not null); \\error
3. If you create a view/create table with select * then this wont copy
invisible fields. So newly created view/table wont have any invisible
columns.
Create table t2 as select * from t1;//t2 wont have t1 invisible column
Create view v1 as select * from t1;//v1 wont have t1 invisible column
4. Invisibility wont be forwarded to next table in any case of create
table/view as select */(a,b,c) from table.
Create table t2 as select a,b,c from t1; // t2 will have t1 invisible
// column(b), but this wont be invisible in t2
Create view v1 as select a,b,c from t1; // v1 will have t1 invisible
// column(b), but this wont be invisible in v1
Implementation Details:-
Parsing:- INVISIBLE_SYM is added into vcol_attribute(so its like unique
suffix), It is also added into keyword_sp_not_data_type so that table
can have column with name invisible.
Implementation detail is given by each modified function/created function.
(Some function are left as they were self explanatory)
(m= Modified, n= Newly Created)
mysql_prepare_create_table(m):- Extra checks for invisible columns are
added. Also some DEBUG_EXECUTE_IF are also added for test cases.
mysql_prepare_alter_table(m):- Now this will drop all the
COMPLETELY_INVISIBLE column and HA_INVISIBLE_KEY index. Further
Modifications are made to stop drop/change/delete of SYSTEM_INVISIBLE
column.
build_frm_image(m):- Now this allows incorporating field_visibility
status into frm image. To remain compatible with old frms
field_visibility info will be only written when any of the field is
not NOT_INVISIBLE.
extra2_write_additional_field_properties(n):- This will write field
visibility info into buffer. We first write EXTRA2_FIELD_FLAGS into
buffer/frm , then each next char will have field_visibility for each
field.
init_from_binary_frm_image(m):- Now if we get EXTRA2_FIELD_FLAGS,
then we will read the next n(n= number of fields) chars and set the
field_visibility. We also increment
thd->status_var.feature_invisible_columns. One important thing to
note if we find out that key contains a field whose visibility is
> USER_DEFINED_INVISIBLE then , we declare this key as invisible
key.
sql_show.cc is changed accordingly to make show table, show keys
correct.
mysql_insert(m):- If we get to know that we are doing insert in
this way insert into t1 values(1,1); without explicitly specifying
columns, then we check for if we have invisible fields if yes then
we reset the whole record, Why ? Because first we want hidden columns
to get default/null value. Second thing auto_increment has property
no default and no null which voilates invisible key rule 2, And
because of this it was giving error. Reseting table->record[0]
eliminates this issue. More info put breakpoint on handler::write_row
and see auto_increment value.
fill_record(m):- we continue loop if we find invisible column because
this is already reseted/will get its value if it is default.
Test cases:- Since we can not directly add > USER_DEFINED_INVISIBLE
column then I have debug_dbug to create it in mysql_prepare_create_table.
Patch Credit:- Serg Golubchik
* again, as in 10.2, NOW is a keyword only if followed by parentheses
* use AS OF CURRENT_TIMESTAMP or AS OF NOW()
* AS OF CURRENT_TIMESTAMP and AS OF NOW() mean AS OF NOW(6),
not AS OF NOW(0), (same behavior as in a DEFAULT clause)
IS DROPPED
ANALYSIS:
=========
It is advised not to tamper with the system tables.
When primary key is dropped from a system table, certain
operations on the table which tries to access the table key
information may lead to server exit.
FIX:
====
An appropriate error is now reported in such a case.
Add support for direct update and direct delete requests for spider.
A direct update/delete request handles all qualified rows in a single
operation rather than one row at a time.
Contains Spiral patches:
006_mariadb-10.2.0.direct_update_rows.diff MDEV-7704
008_mariadb-10.2.0.partition_direct_update.diff MDEV-7706
010_mariadb-10.2.0.direct_update_rows2.diff MDEV-7708
011_mariadb-10.2.0.aggregate.diff MDEV-7709
027_mariadb-10.2.0.force_bulk_update.diff MDEV-7724
061_mariadb-10.2.0.mariadb-10.1.8.diff MDEV-12870
- The differences compared to the original patches:
- Most of the parameters of the new functions are unnecessary. The
unnecessary parameters have been removed.
- Changed bit positions for new handler flags upon consideration of
handler flags not needed by other Spiral patches and handler flags
merged from MySQL.
- Added info_push() (Was originally part of bulk access patch)
- Didn't include code related to handler socket
- Added HA_CAN_DIRECT_UPDATE_AND_DELETE
Original author: Kentoku SHIBA
First reviewer: Jacob Mathew
Second reviewer: Michael Widenius
THD::vers_update_trt, trx_t::vers_update_trt, trx_savept_t::vers_update_trt:
Remove. Instead, determine from trx_t::mod_tables whether versioned
columns were affected by the transaction.
handlerton::prepare_commit_versioned: Replaces vers_get_trt_data.
Return the transaction start ID and also the commit ID, in case
the transaction modified any system-versioned columns (0 if not).
TR_table::store_data(): Remove (merge with update() below).
TR_table::update(): Add the parameters start_id, end_id.
ha_commit_trans(): Remove a condition on SQLCOM_ALTER_TABLE.
If we need something special for ALTER TABLE...ALGORITHM=INPLACE,
that can be done inside InnoDB by modifying trx_t::mod_tables.
innodb_prepare_commit_versioned(): Renamed from innodb_get_trt_data().
Check trx_t::mod_tables to see if any changes to versioned columns
are present.
trx_mod_table_time_t: A pair of logical timestamps, replacing the
undo_no_t in trx_mod_tables_t. Keep track of not only the first
modification to a persistent table in each transaction, but also
the first modification of a versioned column in a table.
dtype_t, dict_col_t: Add the accessor is_any_versioned(), to check
if the type refers to a system-versioned user or system column.
upd_t::affects_versioned(): Check if an update affects a versioned
column.
trx_undo_report_row_operation(): If a versioned column is affected
by the update, invoke trx_mod_table_time_t::set_versioned().
trx_rollback_to_savepoint_low(): If all changes to versioned columns
were rolled back, invoke trx_mod_table_time_t::rollback_versioned(),
so that trx_mod_table_time_t::is_versioned() will no longer hold.
If compiling a non DBUG binary with
-DDBUG_ASSERT_AS_PRINTF asserts will be
changed to printf + stack trace (of stack
trace are enabled).
- Changed #ifndef DBUG_OFF to
#ifdef DBUG_ASSERT_EXISTS
for those DBUG_OFF that was just used to enable
assert
- Assert checking that could greatly impact
performance where changed to DBUG_ASSERT_SLOW which
is not affected by DBUG_ASSERT_AS_PRINTF
- Added one extra option to my_print_stacktrace() to
get more silent in case of stack trace printing as
part of assert.
- Added TABLE_SHARE->not_usable_by_query_cache
- Moved TABLE->no_replicate to TABLE_SHARE->no_replicate as it's same for
all TABLE instances
- Renamed TABLE_SHARE->cached_row_logging_check to can_do_row_logging
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
"Optimization for equi-joins of derived tables with GROUP BY"
should be considered rather as a 'proof of concept'.
The task itself is targeted at an optimization that employs re-writing
equi-joins with grouping derived tables / views into lateral
derived tables. Here's an example of such transformation:
select t1.a,t.max,t.min
from t1 [left] join
(select a, max(t2.b) max, min(t2.b) min from t2
group by t2.a) as t
on t1.a=t.a;
=>
select t1.a,tl.max,tl.min
from t1 [left] join
lateral (select a, max(t2.b) max, min(t2.b) min from t2
where t1.a=t2.a) as t
on 1=1;
The transformation pushes the equi-join condition t1.a=t.a into the
derived table making it dependent on table t1. It means that for
every row from t1 a new derived table must be filled out. However
the size of any of these derived tables is just a fraction of the
original derived table t. One could say that transformation 'splits'
the rows used for the GROUP BY operation into separate groups
performing aggregation for a group only in the case when there is
a match for the current row of t1.
Apparently the transformation may produce a query with a better
performance only in the case when
- the GROUP BY list refers only to fields returned by the derived table
- there is an index I on one of the tables T used in FROM list of
the specification of the derived table whose prefix covers the
the fields from the proper beginning of the GROUP BY list or
fields that are equal to those fields.
Whether the result of the re-writing can be executed faster depends
on many factors:
- the size of the original derived table
- the size of the table T
- whether the index I is clustering for table T
- whether the index I fully covers the GROUP BY list.
This patch only tries to improve the chosen execution plan using
this transformation. It tries to do it only when the chosen
plan reaches the derived table by a key whose prefix covers
all the fields of the derived table produced by the fields of
the table T from the GROUP BY list.
The code of the patch does not evaluates the cost of the improved
plan. If certain conditions are met the transformation is applied.
IB: Fixes in logic when to do versioned or usual row updates. Now it is
able to do unversioned updates for versioned tables just by disabling
`TABLE_SHARE::versioned` flag.
SQL: DDL tracking for:
* RENAME TABLE, ALTER TABLE .. RENAME TO;
* DROP TABLE;
* data-modifying operations (f.ex. ALTER TABLE .. ADD/DROP COLUMN).
* based on RANGE pruning by COLUMNS (sys_trx_end) condition
* removed DEFAULT; AS OF NOW is always last; current VERSIONING as last non-empty (or first empty)
* Min/Max stats in TABLE_SHARE
* ALTER TABLE ADD PARTITION adds before AS OF NOW partition
* one `AS OF NOW`, multiple `VERSIONING` partitions;
* rotation of `VERSIONING` partitions by record count, time period;
* rotation is multi-threaded;
* conventional subpartitions as bottom level for versioned partitions;
* `DEFAULT` keyword selects first `VERSIONING` partition;
* ALTER TABLE ADD/DROP partition;
* REBUILD PARTITION basic operation.
Syntax extension: TIMESTAMP/TRANSACTION keyword can be used before FROM ... TO, BETWEEN ... AND.
Example:
SELECT * FROM t1 FOR SYSTEM_TIME TIMESTAMP FROM '1-1-1' TO NOW();
Closes#27
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
Working features:
CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name
[ INCREMENT [ BY | = ] increment ]
[ MINVALUE [=] minvalue | NO MINVALUE ]
[ MAXVALUE [=] maxvalue | NO MAXVALUE ]
[ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ]
ENGINE=xxx COMMENT=".."
SELECT NEXT VALUE FOR sequence_name;
SELECT NEXTVAL(sequence_name);
SELECT PREVIOUS VALUE FOR sequence_name;
SELECT LASTVAL(sequence_name);
SHOW CREATE SEQUENCE sequence_name;
SHOW CREATE TABLE sequence_name;
CREATE TABLE sequence-structure ... SEQUENCE=1
ALTER TABLE sequence RENAME TO sequence2;
RENAME TABLE sequence TO sequence2;
DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names
Missing features
- SETVAL(value,sequence_name), to be used with replication.
- Check replication, including checking that sequence tables are marked
not transactional.
- Check that a commit happens for NEXT VALUE that changes table data (may
already work)
- ALTER SEQUENCE. ANSI SQL version of setval.
- Share identical sequence entries to not add things twice to table list.
- testing insert/delete/update/truncate/load data
- Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence)
- Write documentation for NEXT VALUE / PREVIOUS_VALUE
- NEXTVAL in DEFAULT
- Ensure that NEXTVAL in DEFAULT uses database from base table
- Two NEXTVAL for same row should give same answer.
- Oracle syntax sequence_table.nextval, without any FOR or FROM.
- Sequence tables are treated as 'not read constant tables' by SELECT; Would
be better if we would have a separate list for sequence tables so that
select doesn't know about them, except if refereed to with FROM.
Other things done:
- Improved output for safemalloc backtrack
- frm_type_enum changed to Table_type
- Removed lex->is_view and replaced with lex->table_type. This allows
use to more easy check if item is view, sequence or table.
- Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers
that want's to support sequences
- Added handler calls:
- engine_name(), to simplify getting engine name for partition and sequences
- update_first_row(), to be able to do efficient sequence implementations.
- Made binlog_log_row() global to be able to call it from ha_sequence.cc
- Added handler variable: row_already_logged, to be able to flag that the
changed row is already logging to replication log.
- Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify
deny_updates_if_read_only_option()
- Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy
- Moved code for add_table_options() out from sql_show.cc::show_create_table()
- Added String::append_longlong() and used it in sql_show.cc to simplify code.
- Added extra option to dd_frm_type() and ha_table_exists to indicate if
the table is a sequence. Needed by DROP SQUENCE to not drop a table.
Implementing cursor%ROWTYPE variables, according to the task description.
This patch includes a refactoring in how sp_instr_cpush and sp_instr_copen
work. This is needed to implement MDEV-10598 later easier, to allow variable
declarations go after cursor declarations (which is currently not allowed).
Before this patch, sp_instr_cpush worked as a Query_arena associated with
the cursor. sp_instr_copen::execute() switched to the sp_instr_cpush's
Query_arena when executing the cursor SELECT statement.
Now the Query_arena associated with the cursor is stored inside an instance
of a new class sp_lex_cursor (a LEX descendand) that contains the cursor SELECT
statement.
This simplifies the implementation, because:
- It's easier to follow the code when everything related to execution
of the cursor SELECT statement is stored inside the same sp_lex_cursor
object (rather than distributed between LEX and sp_instr_cpush).
- It's easier to link an sp_instr_cursor_copy_struct to
sp_lex_cursor rather than to sp_instr_cpush.
- Also, it allows to perform sp_instr_cursor_copy_struct::exec_core()
without having a pointer to sp_instr_cpush, using a pointer to sp_lex_cursor
instead. This will be important for MDEV-10598, because sp_instr_cpush will
happen *after* sp_instr_cursor_copy_struct.
After MDEV-10598 is done, this declaration:
DECLARE
CURSOR cur IS SELECT * FROM t1;
rec cur%ROWTYPE;
BEGIN
OPEN cur;
FETCH cur INTO rec;
CLOSE cur;
END;
will generate about this code:
+-----+--------------------------+
| Pos | Instruction |
+-----+--------------------------+
| 0 | cursor_copy_struct rec@0 | Points to sp_cursor_lex through m_lex_keeper
| 1 | set rec@0 NULL |
| 2 | cpush cur@0 | Points to sp_cursor_lex through m_lex_keeper
| 3 | copen cur@0 | Points to sp_cursor_lex through m_cursor
| 4 | cfetch cur@0 rec@0 |
| 5 | cclose cur@0 |
| 6 | cpop 1 |
+-----+--------------------------+
Notice, "cursor_copy_struct" and "set" will go before "cpush".
Instructions at positions 0, 2, 3 point to the same sp_cursor_lex instance.
Extended syntax so that it is now possible to set lock_wait_timeout for the
following statements:
SELECT ... FOR UPDATE [WAIT n|NOWAIT]
SELECT ... LOCK IN SHARED MODE [WAIT n|NOWAIT]
LOCK TABLE ... [WAIT n|NOWAIT]
CREATE ... INDEX ON tbl_name (index_col_name, ...) [WAIT n|NOWAIT] ...
ALTER TABLE tbl_name [WAIT n|NOWAIT] ...
OPTIMIZE TABLE tbl_name [WAIT n|NOWAIT]
DROP INDEX ... [WAIT n|NOWAIT]
TRUNCATE TABLE tbl_name [WAIT n|NOWAIT]
RENAME TABLE tbl_name [WAIT n|NOWAIT] ...
DROP TABLE tbl_name [WAIT n|NOWAIT] ...
Valid range of lock_wait_timeout and innodb_lock_wait_timeout was extended so
that 0 is acceptable value (means no wait).
This is amended AliSQL patch. We prefer Oracle syntax for [WAIT n|NOWAIT]
instead of original [WAIT [n]|NO_WAIT].
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
* rename to "keyread" (to avoid conflicts with tokudb),
* change from bool to uint and store the keyread index number there
* provide a bool accessor to check if keyread is enabled
mark_columns_used_by_index used to do
reset + mark_columns_used_by_index_no_reset + start keyread + set bitmaps
Now prepare_for_keyread does that, while mark_columns_used_by_index
does only reset + mark_columns_used_by_index_no_reset,
just as its name suggests.
TABLE::add_read_columns_used_by_index() is conceptually wrong,
it *adds* columns used by index to the bitmap, without clearing
it first. But it also enables keyread, meaning that *only* columns
from the index will be read. It is supposed to be used to
add columns used by an index to a bitmap that already has columns
of a primary key - for engines where a primary key is part of every
index.
The correct fix is to change mark_columns_used_by_index() to
take into account extended keys.
this reverts 1d0acc7754 and cf97cbd1db
move TABLE::key_read into handler. Because in index merge and DS-MRR
there can be many handlers per table, and some of them use
key read while others don't. "keyread" is really per handler,
not per TABLE property.
- Changed error handlers interface so that they can change error level in
the handler
- Give warnings and errors when calculating virtual columns
- On insert/update error is fatal in strict mode.
- SELECT and DELETE will only give a warning if a virtual field generates an error
- Added VCOL_UPDATE_FOR_DELETE and VCOL_UPDATE_INDEX_FOR_REPLACE to be able to
easily detect in update_virtual_fields() if we should use an error
handler to mask errors or not.
When updating a table with virtual BLOB columns, the following might
happen:
- an old record is read from the table, it has no virtual blob values
- update_virtual_fields() is run, vcol blob gets its value into the
record. But only a pointer to the value is in the table->record[0],
the value is in Field_blob::value String (but it doesn't have to be!
it can be in the record, if the column is just a copy of another
columns: ... b VARCHAR, c BLOB AS (b) ...)
- store_record(table,record[1]), old record now is in record[1]
- fill_record() prepares new values in record[0], vcol blob is updated,
new value replaces the old one in the Field_blob::value
- now both record[1] and record[0] have a pointer that points to the
*new* vcol blob value. Or record[1] has a pointer to nowhere if
Field_blob::value had to realloc.
To fix this I have introduced a new String object 'read_value' in
Field_blob. When updating virtual columns when a row has been read,
the allocated value is stored in 'read_value' instead of 'value'. The
allocated blobs for the new row is stored in 'value' as before.
I also made, as a safety precaution, the insert delayed handling of
blobs more general by using value to store strings instead of the
record. This ensures that virtual functions on delayed insert should
work in as in the case of normal insert.
Triggers are now properly updating the read, write and vcol maps for used
fields. This means that we don't need VCOL_UPDATE_FOR_READ_WRITE anymore
and there is no need for any other special handling of triggers in
update_virtual_fields().
To be able to test how many times virtual fields are invoked, I also
relaxed rules that one can use local (@) variables in DEFAULT and non
persistent virtual field expressions.
1. The rows of a recursive CTE at some point may overflow
the HEAP temporary table containing them. At this point
the table is converted to a MyISAM temporary table and the
new added rows are placed into this MyISAM table.
A bug in the of select_union_recursive::send_data prevented
the server from writing the row that caused the overflow
into the temporary table used for the result of the iteration
steps. This could lead, in particular,to a premature end
of the iterations.
2. The method TABLE::insert_all_rows_into() that was used
to copy all rows of one temporary table into another
did not take into account that the destination temporary
table must be converted to a MyISAM table at some point.
This patch fixed this problem. It also renamed the method
into TABLE::insert_all_rows_into_tmp_table() and added
an extra parameter needed for the conversion.
There was a duplicate code to create TYPELIB from List<String>:
- In typelib() and mysql_prepare_create_table(), which was used to initialize
table fields.
- create_typelib() and sp_prepare_create_field(), which was used to initialize
SP variables.
create_typelib() was incomplete and didn't check for wrong SET values.
Fix:
- Moving the code from create_typelib() and mysql_prepare_create_field()
to news methods Column_definition::create_interval_from_interval_list()
and Column_definition::prepare_interval_field().
- Moving the code from calculate_interval_lengths() in sql_table.cc
to a new method Column_definition::calculate_interval_lengths(), as it's now
needed only in Column_definition::create_interval_from_interval_list()
- Reusing the new method Column_definition::prepare_interval_field() in both
mysql_prepare_create_table() and sp_prepare_create_field(), instead of the
old duplicate code pieces
- Removing global functions typelib() and create_typelib()
This patch also fixes:
MDEV-11155 Bad error message when creating a SET column with comma and non-ASCII characters
The problem was that ErrCongString() was called with a wrong "charset" parameter.
otherwise we'd need to store sql_mode *per vcol*
(consider CREATE INDEX...) and how SHOW CREATE TABLE would
support that?
Additionally, get rid of vcol::expr_str, just to make sure
the string is always generated and never leaked in the
original form.
* don't issue an error for ER_KEY_BASED_ON_GENERATED_VIRTUAL_COLUMN
* support keyread on vcols
* callback into the server to compute vcol values from mi_check/mi_repair
* DMLs just work. Automatically.
Functions from sql/statistics.cc don't seem to expect
stat tables to fail or to have inadequate structure.
Table open errors suppressed and some validity
checks added. Invalid tables reported to the server
log.
* revert part of the db7edfe that moved calculations from
fix_fields to val_str for Item_func_sysconst and descendants
* mark session state dependent functions in check_vcol_func_processor()
* re-run fix_fields for all such functions for every statement
* fix CURRENT_USER/CURRENT_ROLE not to use Name_resolution_context
(that is allocated on the stack in unpack_vcol_info_from_frm())
Note that NOW(), CURDATE(), etc use lazy initialization and do *not*
force fix_fields to be re-run. The rule is:
* lazy initialization is *not* allowed, if it changes metadata (so,
e.g. DAYNAME() cannot use it)
* lazy initialization is *preferrable* if it has side effects (e.g.
NOW() sets thd->time_zone_used=1, so it's better to do it when
the value of NOW is actually needed, not when NOW is simply prepared)
* remove a confusing method name - Field::set_default_expression()
* remove handler::register_columns_for_write()
* rename stuff
* add asserts
* remove unlikely unlikely
* remove redundant if() conditions
* fix mark_unsupported_function() to report the most important violation
* don't scan vfield list for default values (vfields don't have defaults)
* move handling for DROP CONSTRAINT IF EXIST where it belongs
* don't protect engines from Alter_inplace_info::ALTER_ADD_CONSTRAINT
* comments
MDEV-10134 Add full support for DEFAULT
- Added support for using tables with MySQL 5.7 virtual fields,
including MySQL 5.7 syntax
- Better error messages also for old cases
- CREATE ... SELECT now also updates timestamp columns
- Blob can now have default values
- Added new system variable "check_constraint_checks", to turn of
CHECK constraint checking if needed.
- Removed some engine independent tests in suite vcol to only test myisam
- Moved some tests from 'include' to 't'. Should some day be done for all tests.
- FRM version increased to 11 if one uses virtual fields or constraints
- Changed to use a bitmap to check if a field has got a value, instead of
setting HAS_EXPLICIT_VALUE bit in field flags
- Expressions can now be up to 65K in total
- Ensure we are not refering to uninitialized fields when handling virtual fields or defaults
- Changed check_vcol_func_processor() to return a bitmap of used types
- Had to change some functions that calculated cached value in fix_fields to do
this in val() or getdate() instead.
- store_now_in_TIME() now takes a THD argument
- fill_record() now updates default values
- Add a lookahead for NOT NULL, to be able to handle DEFAULT 1+1 NOT NULL
- Automatically generate a name for constraints that doesn't have a name
- Added support for ALTER TABLE DROP CONSTRAINT
- Ensure that partition functions register virtual fields used. This fixes
some bugs when using virtual fields in a partitioning function
mysqld maintains a list of TABLE objects for all temporary
tables created within a session in THD. Here each table is
represented by a TABLE object.
A query referencing a particular temporary table for more
than once, however, failed with ER_CANT_REOPEN_TABLE error
because a TABLE_SHARE was allocate together with the TABLE,
so temporary tables always had only one TABLE per TABLE_SHARE.
This patch lift this restriction by separating TABLE and
TABLE_SHARE objects and storing TABLE_SHAREs for temporary
tables in a list in THD, and TABLEs in a list within their
respective TABLE_SHAREs.
- unused TABLE_SHARE::deleting and TABLE_LIST::deleting flags were removed
- kill_delayed_threads_for_table() and intern_close_table() are now private
methods of table cache
- removed free_share flag of closefrm(): it was never used for temporary
tables and was rarely useful for regular tables
Actually mutually recursive CTE were not functional. Now the code
for mutually recursive CTE looks like functional, but still needs
re-writing.
Added many new test cases for mutually recursive CTE.
This fix also fixes a connection hang when trying to do INSERT DELAYED to a crashed table.
Added crash_mysqld.inc to allow easy crash+restart of mysqld