Make all system tables in mysql directory of type
engine=Aria
Privilege tables are using transactional=1
Statistical tables are using transactional=0, to allow them
to be quickly updated with low overhead.
Help tables are also using transactional=0 as these are only
updated at init time.
Other changes:
- Aria store engine is now a required engine
- Update comment for Aria tables to reflect their new usage
- Fixed that _ma_reset_trn_for_table() removes unlocked table
from transaction table list. This was needed to allow one
to lock and unlock system tables separately from other
tables, for example when reading a procedure from mysql.proc
- Don't give a warning when using transactional=1 for engines
that is using transactions. This is both logical and also
to avoid warnings/errors when doing an alter of a privilege
table to InnoDB.
- Don't abort on warnings from ALTER TABLE for changes that
would be accepted by CREATE TABLE.
- New created Aria transactional tables are marked as not movable
(as they include create_rename_lsn).
- bootstrap.test was changed to kill orignal server, as one
can't anymore have two servers started at same time on same
data directory and data files.
- Disable maria.small_blocksize as one can't anymore change
aria block size after system tables are created.
- Speed up creation of help tables by using lock tables.
- wsrep_sst_resync now also copies Aria redo logs.
MDEV-10130 Assertion `share->in_trans == 0' failed in storage/maria/ma_close.c
MDEV-10378 Assertion `trn' failed in virtual int ha_maria::start_stmt
The problem was that maria_handler->trn was not properly reset
at commit/rollback and ha_maria::exernal_lock() could get confused
because.
There was some old code in ha_maria::implicit_commit() that tried
to take care of this, but it was not bullet proof.
Fixed by adding list of all tables that is part of the maria transaction to
TRN.
A nice side effect was of the fix is that loops in
ha_maria::implict_commit() got to be much simpler.
Other things:
- Fixed a bug in mysql_admin_table() where argument open_for_modify
was wrongly reset for the next table in the chain
- rollback admin command also in case of fatal error.
- Split _ma_set_trn_for_table() to three version to simplify code
and debugging.
- Several new asserts to detect the original problem (that file was
not properly removed from trn before calling ma_close())
I was able to repeat the problem with old version of randgen
Reason for crash:
- It's not safe to change share->now_transactional if there are changed
bitmaps in the pagecache as flushing these can cause redo-entries and
the bitmap flush code checks that share->now_transactional is set.
Fixed by flushing bitmaps in _ma_tmp_disable_logging_for_table() before
we set share->now_transactional to 0
Problem was that if copy_data_between_tables() didn't do proper
clean up in case of failures:
- copy object was not properly freed
- end_bulk_insert() was not called
- mysql_trans_prepare_alter_copy_data() set THD->transaction.on to
false which was not properly restored
The last part caused a crash in Aria as Aria depends on that THD
is correct.
Other things:
- Reset info->switched_transactional after usage (safety)
- Reset bulk_insert_single_undo (safety)
Problem was that we the bitmap needs to be flushed before disabling
logging of redo entires, as writing the bitmap to disk by
background checkpoint may cause redo entries.
- Removed test if HA_FT_WTYPE == HA_KEYTYPE_FLOAT as this never worked
(HA_KEYTYPE_FLOAT is an enum)
- Define HA_FT_MAXLEN to 126 (was tested before but never defined)
Modern compilers (such as GCC 8) emit warnings that the
'register' keyword is deprecated and not valid C++17.
Let us remove most use of the 'register' keyword.
Code in 'extra/' is not touched.
This bug happened due to a defect of the implementation of the handler
function ha_delete_all_rows() for the ARIA engine.
The function maria_delete_all_rows() truncated the table, but it didn't
touch the write cache, so the cache's write offset was not reset.
In the scenario like in the function st_select_lex_unit::exec_recursive
when first all records were deleted from the table and then several new
records were added some metadata became inconsistent with the state of
the cache. As a result the table scan function could not read records
at the end of the table.
The same defect could be found in the implementation of ha_delete_all_rows()
for the MYISAM engine mi_delete_all_rows().
Additionally made late instantiation for the temporary table used to store
rows that were used for each new iteration when executing a recursive CTE.
The merge only covered 10.1 up to
commit 4d248974e0.
Actually merge the changes up to
commit 0a534348c7.
Also, remove the unused InnoDB field trx_t::abort_type.
For some simple benchmarks, a majority of time was
spend in find_head() which tries to find the best
place to put the record.
The result of this patch is a 2x or more speedup for
inserts without keys for format PAGE. All changes
are only related to how rows are stored
Should fix some of the problems mentioned in:
MDEV-8132 Temporary tables using Aria with very poor performance
MDEV-9079 Aria very slow for internal temporary tables
MDEV-5841 Mariadb very poor temporary performance
The following changes where done:
- For rows with a small row length that fits into
a page (818 bytes with 8192 pages), stop as soon as we
hit a match.
- Added markers full_head_size and full_tail_size that tells
us where to start searching on the bitmap page
- Ensure that page->used_size is correctly updated when
bitmap grows. This allows us to stop searching at used_size
- Added code to check that the bitmap variables are correct.
- Fixed a wrong test where we set "first_bitmap_with_space".
This shouldn't have caused any notable problems.
my_safe_alloca()/my_safe_afree() work as alloca() or malloc()/free()
depending on the memory size to allocate, that is, depending on
reclength here. They only work correctly if reclength doesn't
change in the middle.
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
This will make it easier to how memory allocation is done when debugging
with either DBUG or gdb.
Will especially help when debugging stored procedures
Main change is a name argument as second argument to init_alloc_root()
init_sql_alloc()
Other things:
- Added DBUG_ENTER/EXIT to some Virtual_tmp_table functions
May also fix: MDEV-14970 "MariaDB crashed with signal 11 and Aria table"
I am not able to reproduce a crash, however there was no protection in
print_keydup_error() if the storage engine reported the wrong key number.
This patch adds such a protection and should stop any further crashes
in this case.
Other things:
- Added extra protection in Aria to not set errkey to more than number of
keys. (Don't think this is cause of this crash, but better safe than
sorry)
- Extend test_if_equal_repl_errors() to handle different cases of
ER_DUP_ENTRY. This is just mainly precaution for the future.
The problem was that max_size was acciently set to 1 in some
cases.
Other things:
- Adjust max_rows if min_rows > max_rows.
- Removed not used variable varchar_length
- Adjusted max_pack_length (safety fix)
This bug happens when locking the same Aria "transactional" table
(page format) more then once with LOCK TABLES and inserting into one
of them with INSERT ... SELECT when the table is empty.
Fixed by ensuring we don't use fast bulk insert if table is opened
twice with LOCK TABLES (as this changes table->s->state)
Code changes:
- Added use_count to MARIA_USED_TABLES to be able to check if
table is opened twice for a statement/lock table
- Don't clear history or reset info->start_state if we
don't have versioning. One reason for the bug was
was that info->start_state was set to point to different
states for the two tables. If there is no versioning
info->start_state should always point to info->s->state.common.
Other things:
- Fixed also some typos that was noticed while scanning the code
- More DBUG_PRINT
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
- The line was accidently removed by dd8474b1dc
- The effect of the missing test was just a few extra malloc when creating
internal temporary tables. Nothing critical, but better got get fixed.
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.
ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
Before this patch running full mtr generated some 70 cores (at least
on systemd). Now no cores should be generated.
- Changed DBUG_ABORT()'s used by mysql-test-run to DBUG_SUICIDE()
- Changed DBUG_ABORT() used to crash server with core to DBUG_ASSERT(0)
- DBUG_ASSERT now flushes DBUG files
If compiling a non DBUG binary with
-DDBUG_ASSERT_AS_PRINTF asserts will be
changed to printf + stack trace (of stack
trace are enabled).
- Changed #ifndef DBUG_OFF to
#ifdef DBUG_ASSERT_EXISTS
for those DBUG_OFF that was just used to enable
assert
- Assert checking that could greatly impact
performance where changed to DBUG_ASSERT_SLOW which
is not affected by DBUG_ASSERT_AS_PRINTF
- Added one extra option to my_print_stacktrace() to
get more silent in case of stack trace printing as
part of assert.
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
end_io_call uses uninitialized values from the new_data_cache
As such we the buffer 0 and check this before calling end_io_cache on it.
Thanks Sergey Vojtovich for the review and for this solution.
Found by Coverity (ref 972481).
Coverity report this as:
CID 971840 (#1 of 1): Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
result_independent_of_operands: 4 | (flags & 1) is always true regardless of the values of its operands. This occurs as the logical first operand of "?:".
The C order of precidence has | of higher precidence than ?:. The
intenting implies an | of the 3 terms.
Adjust to intented meaning.
- Added variable tmp_disk_table_size
- Added variable tmp_memory_table_size as an alias for tmp_table_size
- Changed internal variable tmp_table_size to tmp_memory_table_size
- create_info.data_file_length is now set with tmp_disk_table_size
- Fixed that Aria doesn't reset max_data_file_length for internal tables
- Added status flag if table is full so that we can detect this on next insert.
This ensures that the table is always 'correct', but we get the error one
row after the row that grow the table too big.
- Removed some mutex lock for internal temporary tables
These self references were previously used to avoid having to check the
IO_CACHE's type. However, a benchmark shows that on x86 5930k stock,
the type comparison is marginally faster than the double pointer dereference.
For 40 billion my_b_tell calls, the difference is .1 seconds in favor of performing the
type check. (Basically there is no measurable difference)
To prevent bugs from copying the structure using the equals(=) operator,
and having to do the bookkeeping manually, remove these "convenience"
variables.
This excludes MDEV-12472 (InnoDB should accept XtraDB parameters,
warning that they are ignored). In other words, MariaDB 10.3 will not
recognize any XtraDB-specific parameters.
- Old sequence code forced row based replication for any statements that
refered to a sequence table. What is new is that row based replication
is now sequence aware:
- NEXT VALUE is now generating a short row based event with only
next_value and round being replicated.
- Short row based events are now on the slave updated as trough
SET_VALUE(sequence_name)
- Full row based events are on the slave updated with a full insert,
which is practically same as ALTER SEQUENCE.
- INSERT on a SEQUENCE table does now a EXCLUSIVE LOCK to ensure that
it is logged in binary log before any following NEXT VALUE calls.
- Enable all sequence tests and fixed found bugs
- ALTER SEQUENCE doesn't anymore allow changes that makes the next_value
outside of allowed range
- SEQUENCE changes are done with TL_WRITE_ALLOW_WRITE. Because of this
one can generate a statement for MyISAM with both
TL_WRITE_CONCURRENT_INSERT and TL_WRITE_ALLOW_WRITE. To fix a warning
I had to add an extra test in thr_lock.c for this.
- Removed UPDATE of SEQUENCE (no need to support this as we
have ALTER SEQUENCE, which takes the EXCLUSIVE lock properly.
- Removed DBUG_ASSERT() in MDL_context::upgrade_shared_lock. This was
removed upstream in MySQL 5.6 in 72f823de453.
- Simplified test in decided_logging_format() by using sql_command_flags()
- Fix that we log DROP SEQUENCE correctly.
- Fixed that Aria works with SEQUENCE
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
This affected mainly MyISAM and Aria engines.
Also fixed that end_bulk_insert() detects errors from
internal mi_end_bulk_insert() and ma_end_bulk_insert()
- delete_tree() and delete_tree_element() now has an
extra argument that marks if future calls to
tree->free should be ignored.
- tree->free changed to function returning int, to be
able to signal errors.
- Restored deleting flag in MyISAM that was accidently
disabled in mi_extra(PREPARE_FOR_DROP)
The issue was that my_errno was not set properly when a repair was killed,
which confused the rpl_killed_ddl script.
I also added an extra test line in varchar.inc to ensure we don't give
duplicate error rows.
Other things
- Ensure that ut_d() is set to EXPR if ut_ad() is DEBUG_ASSERT()
If not, we will get a crash in purge_sys_t::~purge_sys_t() as
this ut_ad() code expect's that the ut_d() codes has been executed
Problem was two race condtion in Aria page cache:
- find_block() didn't inform free_block() that it had released requests
- free_block() didn't handle pinned blocks, which could happen if
free_block() was called as part of flush. This is fixed by not freeing
blocks that are pinned. This is safe as when maria_close() is called
when last thread is using a table, there can be no pinned blocks. For
other flush calls it's safe to ignore pinned blocks.
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
This was done to make it clear that a update_row() should not change the
row.
This was not done for handler::write_row() as this function still needs
to update auto_increment values in the row. This should at some point
be moved to handler::ha_write_row() after which write_row can also have
const arguments.
Working features:
CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name
[ INCREMENT [ BY | = ] increment ]
[ MINVALUE [=] minvalue | NO MINVALUE ]
[ MAXVALUE [=] maxvalue | NO MAXVALUE ]
[ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ]
ENGINE=xxx COMMENT=".."
SELECT NEXT VALUE FOR sequence_name;
SELECT NEXTVAL(sequence_name);
SELECT PREVIOUS VALUE FOR sequence_name;
SELECT LASTVAL(sequence_name);
SHOW CREATE SEQUENCE sequence_name;
SHOW CREATE TABLE sequence_name;
CREATE TABLE sequence-structure ... SEQUENCE=1
ALTER TABLE sequence RENAME TO sequence2;
RENAME TABLE sequence TO sequence2;
DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names
Missing features
- SETVAL(value,sequence_name), to be used with replication.
- Check replication, including checking that sequence tables are marked
not transactional.
- Check that a commit happens for NEXT VALUE that changes table data (may
already work)
- ALTER SEQUENCE. ANSI SQL version of setval.
- Share identical sequence entries to not add things twice to table list.
- testing insert/delete/update/truncate/load data
- Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence)
- Write documentation for NEXT VALUE / PREVIOUS_VALUE
- NEXTVAL in DEFAULT
- Ensure that NEXTVAL in DEFAULT uses database from base table
- Two NEXTVAL for same row should give same answer.
- Oracle syntax sequence_table.nextval, without any FOR or FROM.
- Sequence tables are treated as 'not read constant tables' by SELECT; Would
be better if we would have a separate list for sequence tables so that
select doesn't know about them, except if refereed to with FROM.
Other things done:
- Improved output for safemalloc backtrack
- frm_type_enum changed to Table_type
- Removed lex->is_view and replaced with lex->table_type. This allows
use to more easy check if item is view, sequence or table.
- Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers
that want's to support sequences
- Added handler calls:
- engine_name(), to simplify getting engine name for partition and sequences
- update_first_row(), to be able to do efficient sequence implementations.
- Made binlog_log_row() global to be able to call it from ha_sequence.cc
- Added handler variable: row_already_logged, to be able to flag that the
changed row is already logging to replication log.
- Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify
deny_updates_if_read_only_option()
- Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy
- Moved code for add_table_options() out from sql_show.cc::show_create_table()
- Added String::append_longlong() and used it in sql_show.cc to simplify code.
- Added extra option to dd_frm_type() and ha_table_exists to indicate if
the table is a sequence. Needed by DROP SQUENCE to not drop a table.
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:
recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).
Report progress also via systemd using sd_notifyf().
Also, implement MDEV-11027 a little differently from 5.5:
recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.
recv_apply_hashed_log_recs(): Rename the parameter to last_batch.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
my_readline can fail due to missing file. Make my_readline report this
condition separately so that we can catch it and report an appropriate
error message to the user.
it was race condition prone. instead use either a pair of my_delete()
calls with already resolved paths, or a safe high-level function
my_handler_delete_with_symlink(), like MyISAM and Aria already do.
TOCTOU bug. The path is checked to be valid, symlinks are resolved.
Then the resolved path is opened. Between the check and the open,
there's a window when one can replace some path component with a
symlink, bypassing validity checks.
Fix: after we resolved all symlinks in the path, don't allow open()
to resolve symlinks, there should be none.
Compared to the old MyISAM/Aria code:
* fastpath. Opening of not-symlinked files is just one open(),
no fn_format() and lstat() anymore.
* opening of symlinked tables doesn't do fn_format() and lstat() either.
it also doesn't to realpath() (which was lstat-ing every path
component), instead if opens every path component with O_PATH.
* share->data_file_name stores realpath(path) not readlink(path). So,
SHOW CREATE TABLE needs to do lstat/readlink() now (see ::info()),
and certain error messages (cannot open file "XXX") show the real
file path with all symlinks resolved.