The problem occurred because the Spider node was incorrectly handling
timestamp values sent to and received from the data nodes.
The problem has been corrected as follows:
- Added logic to set and maintain the UTC time zone on the data nodes.
To prevent timestamp ambiguity, it is necessary for the data nodes to use
a time zone such as UTC which does not have daylight savings time.
- Removed the spider_sync_time_zone configuration variable, which did not
solve the problem and which interfered with the solution.
- Added logic to convert to the UTC time zone all timestamp values sent to
and received from the data nodes. This is done for both unique and
non-unique timestamp columns. It is done for WHERE clauses, applying to
SELECT, UPDATE and DELETE statements, and for UPDATE columns.
- Disabled Spider's use of direct update when any of the columns to update is
a timestamp column. This is necessary to prevent false duplicate key value
errors.
- Added a new test spider.timestamp to thoroughly test Spider's handling of
timestamp values.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 97cc9d3 on branch bb-10.3-MDEV-16246
The parameter innodb_lock_schedule_algorithm was introduced in
MariaDB Server 10.1.19, 10.2.13, 10.3.4 as part of MDEV-11039.
In MariaDB 10.1, the default value of the parameter is 'fcfs',
that is, the existing algorithm is used by default. But in
later versions of MariaDB Server, the parameter was 'vats',
enabling the new algorithm.
Because the new algorithm is triggering a debug assertion failure
that suggests corruption of the transactional lock data structures,
we will revert to the old algorithm by default until we have
resolved the problem.
Problem:
========
Truncate operation holds MDL on the table (t1) and tries to
acquire InnoDB dict_operation_lock. Purge holds dict_operation_lock
and tries to acquire MDL on the table (t1) to evaluate virtual
column expressions for indexed virtual columns.
It leads to deadlock of purge and truncate table (DDL).
Solution:
=========
If purge tries to acquire MDL on the table then it should do the following:
i) Purge should release all innodb latches (including dict_operation_lock)
before acquiring metadata lock on the table.
ii) After acquiring metadata lock on the table, it should check whether the
table was dropped or renamed. If the table is dropped then purge should
ignore the undo log record. If the table is renamed then it should
release the old MDL and acquire MDL on the new name.
iii) Once purge acquires MDL, it should use the SQL table handle for all
the remaining virtual index for the purge record.
purge_node_t: Introduce new virtual column information to know whether
the MDL was acquired successfully.
This is joint work with Marko Mäkelä.
Add an explicit redo log flush. In this test
innodb_flush_log_at_trx_commit was 2 by default.
It is also possible that this failure occurs because of MDEV-15740.
1. The changed variant did not fail without the patch for MDEV-16629
while the original test case did fail.
2. In any case the test case should go to cte_recursive_not_embedded.test
that was not created yet.
At the end of a test, 'connection default' should be in a usable state.
This was not the case, because there was a preceding 'send' without a
'reap'. If 'reap' was added, an error would be reported because the
server was restarted after the 'send'. It is easiest to 'send' from a
separate connection and do the restart from 'connection default'.
Make dict_table_t::n_ref_count private, and protect it with
a combination of dict_sys->mutex and atomics. We want to be
able to invoke dict_table_t::release() without holding
dict_sys->mutex.
When processing a query containing with clauses a call of the function
check_dependencies_in_with_clauses() before opening tables used in the
query is necessary if with clauses include specifications of recursive
CTEs.
This call was missing if such a query belonged to a stored function.
This caused misbehavior of the server: it could report a fake error
as in the test case for MDEV-16629 or the executed query could hang
as in the test cases for MDEV-16661 and MDEV-15151.
In InnoDB, an INSERT will not create an explicit lock object. Instead,
the inserted record is initially implicitly locked by the transaction
that wrote its trx_t::id to the hidden system column DB_TRX_ID.
(Other transactions would check if DB_TRX_ID is referring to a
transaction that has not been committed.)
If a record was inserted in the current transaction, it would be
implicitly locked by that transaction. Only if some other transaction
is requesting access to the record, the implicit lock should be
converted to an explicit one, so that the waits-for graph can be
constructed for detecting deadlocks and lock wait timeouts.
Before this fix, InnoDB would convert implicit locks to
explicit ones, even if no conflict exists.
lock_rec_convert_impl_to_expl(): Return whether caller_trx
already holds an explicit lock that covers the record.
row_vers_impl_x_locked_low(): Avoid a lookup if the record matches
caller_trx->id.
lock_trx_has_expl_x_lock(): Renamed from lock_trx_has_rec_x_lock().
row_upd_clust_step(): In a debug assertion, check for implicit lock
before invoking lock_trx_has_expl_x_lock().
rw_trx_hash_t::find(): Make do_ref_count a mandatory parameter.
Assert that trx_id is not 0 (the caller should check it).
trx_sys_t::is_registered(): Only invoke find() if id != 0.
trx_sys_t::find(): Add the optional parameter do_ref_count.
lock_rec_queue_validate(): Avoid lookup for trx_id == 0.
Marko mentions, it could be caused by MDEV-15740 where InnoDB does not
flush redo log as often as it should, with innodb_flush_log_at_trx_commit=1
The workaround is to use innodb_flush_log_at_trx_commit=2, which,
according to MDEV-15740 is more durable.
For some reason, some of these suppressions would fail to suppress
when the code is compiled with clang 6.0, Debug and -DWITH_ASAN=ON.
Possibly it is related to the number of .* or the length of the
regular expression strings.
NULL values when there is no DEFAULT
- Merged the alter_non_null test case to alter_not_null test case.
Renamed the alter_non_null_debug to alter_not_null_debug test case
One can create table with the same name for `field` and `table` `check` constraint.
For example:
`create table t(a int check(a>0), constraint a check(a>10));`
But when inserting new rows same error is always raised.
For example with
```insert into t values (-1);```
and
```insert into t values (10);```
same error `ER_CONSTRAINT_FAILED` is obtained and it is not clear which constraint is violated.
This patch solve this error so that in case if field constraint is violated the first parameter
in the error message is `table.field_name` and if table constraint is violated the first parameter
in error message is `constraint_name`.
Correct 898a8c3c0c to work when newer debhelper-10.2 is installed from
xenial-backports (or jessie-backports).
Use gcc version instead of debproxy version, this is likely a gcc
issue (as disabling LTO and gcc's linker plugin fixes it).
* ignore CHECK constraint for historical rows;
* FOREIGN KEY test case.
TODO:
MDEV-16301 IB: use real table name for error messages on ALTER
Closes tempesta-tech/mariadb#491
Closes#748
Disks with native 4K sectors need 4K alignment and size for unbuffered IO
(i.e for files opened with FILE_FLAG_NO_BUFFERING)
Innodb opens redo log with FILE_FLAG_NO_BUFFERING, however it always does
512byte IOs. Thus, the IO on 4K native sectors will fail, rendering
Innodb non-functional.
The fix is to check whether OS_FILE_LOG_BLOCK_SIZE is multiple of logical
sector size, and if it is not, reopen the redo log without
FILE_FLAG_NO_BUFFERING flag.
Building this plugin which requires run-time access to network, uses a lot
of disk space and is slow was already partially disabled. This way we
also ensure that on cmake level it never runs even if it out of some
autodetection reason at times thought it could run.
This fixes the error message:
fatal: unable to access 'https://github.com/awslabs/aws-sdk-cpp.git/':
Problem with the SSL CA cert (path? access rights?)
This complements commit ecb0e0ade4 that
disabled a bunch of plugins from being built on Travis-CI (due to time
and disk space saving reasons).
When the plugins are not built, the packaging phase will fail due to
missing files. This change omits the files from packaging to the process
can complete successfully.
* Exclude some storage engines from Travis to conserve
build time and disk usage per job. Exluded:
TOKUDB MROONGA SPIDER OQGRAPH PERFSCHEMA SPHINX
* Increase travis_wait from default 20m to 30 for MTR
* Use travis_wait for long running MTR command (wait
30m instead of default 20m)
* Increase testcase-timeout to 20m for OSX, 2m for Linux
* Set ccache size only on Linux, adjust timeout again
* Increase cache push timeout to 5 mins
* Remove AWS defines, not needed
* Remove commented out ASAN rules, has been disabled
previously since it has a significant impact on job
runtime, should be used more in buildbot instead
* Misc cleanup and fixes
Several improvements have been made so that builds run
faster and with fewer canceled jobs:
* Set ccache max size to 1GB. Was 512MB for Linux
(too low for MariaDB) and 5GB on macOS with defaults;
* Don't install libasan in Travis if not necessary.
Sicne ASAN is disabled for the time being, save
time/resources for other steps;
* Decrease number of parallel processes. To prevent
resource exhaustion leading to poor performance. According
to Travis docs, a max of 4 concurrent processses should be
run per job:
https://docs.travis-ci.com/user/common-build-problems/#My-build-script-is-killed-without-any-error
* Reconsider tests exec order and split huge main and rocksdb
test suites into their own job, decreasing the chance of going
over the Travis job execution limit and getting killed;
* Increase Travis testcase-timeout to 4 minutes. Occasionally
on Ubuntu target and frequently on macOS, many tests in main,
rpl, binlog suites take longer than 2 minutes, resulting in
many jobs failing, when in reality the failing tests didn't
get a chance to complete. From my testing, along with the other
speedups, i.e. increasing ccache size, a timeout of 4 minutes
should be Ok. Revert to 3 minutes of necessary.
* Build with GCC and Clang version 5,6 only.
* Rename GCC_VERSION to CC_VERSION for clarity. We are using
two compilers after all, GCC and Clang.
* Stop using somewhat obsolete Clang4 in Travis. Also, was the
reason for the failing test suites in MDEV-15430.
MDEV-7257 made a dump thread to read from binlog concurrently with
writers as long as the read bytes are below a water-mark
(MYSQL_BIN_LOG::binlog_end_pos). However it appeared to be possible a
dump thread reader reach out for bytes past the water mark through a
feature of IO_CACHE that fills in the internal buffer and while doing
so it could read what the reader is not supposed to see (the bytes
above MYSQL_BIN_LOG::binlog_end_pos).
The issue is fixed with constraining the IO_CACHE buffer fill to respect
the watermark.
An added unit test proves reading from file is bound to an external
parameter
passed to {IO_CACHE::end_of_file} cache member.
Problem:
push_handler() created sp_handler_entry instances on THD::main_mem_root,
which is freed only after the SP instructions execution.
So in case of a CONTINUE HANDLER inside a loop (e.g. WHILE) this approach
leaked thread memory on every loop iteration.
Changes:
- Removing sp_handler_entry declaration, it's not really needed.
- Fixing the data type of sp_rcontext::m_handlers from
Dynamic_array<sp_handler_entry*> to Dynamic_array<sp_instr_hpush_jump*>
- Fixing sp_rcontext::push_handler() to push the pointer to
an sp_instr_hpush_jump instance to the handler stack.
This instance contains everything we need.
There is no a need to allocate anything else.