Commit graph

21543 commits

Author SHA1 Message Date
Daniel Black
de51acd037 MDEV-18726: innodb buffer pool size not consistent with large pages
Rather than add a small extra amount on the size of chunks, keep it
of the specified size. The rest of the chunk initialization code
adapts to this small size reduction. This has been made in the general
case, not just large pages, to keep it simple.

The chunks size is controlled by innodb-buffer-pool-chunk-size. In the
code increasing this by a descriptor table size length makes it
difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M
the code before this commit would of added a small amount extra to this
value when it tried to allocate this. While not normally a problem it is
with large pages, it now requires addition space, a whole extra large
page. With a number of pools, or with 1G or 16G large pages this is
quite significant.

By removing this additional amount, DBAs can set
innodb-buffer-pool-chunk size to the large page size, or a multiple of
it, and actually get that amount allocated. Previously they had to fudge
a value less.

The innodb.test results show how this is fudged over a number of tests. With
this change the values are just between 488 and 500 depending on architecture
and build options.

Tested with  --large-pages --innodb-buffer-pool-size=256M
--innodb-buffer-pool-chunk-size=2M on x86_64 with 2M default large page
size. Breaking before buf_pool init, one large page was allocated in
MyISAM, by the end of the function 128 huge pages where allocated as
expected. A further 16 pages where allocated for a 32M log buffer and
during startup 1 page was allocated briefly to the redo log.
2019-03-18 21:49:53 +02:00
Marko Mäkelä
6b6fa3cdb1 MDEV-18644: Support full_crc32 for page_compressed
This is a follow-up task to MDEV-12026, which introduced
innodb_checksum_algorithm=full_crc32 and a simpler page format.
MDEV-12026 did not enable full_crc32 for page_compressed tables,
which we will be doing now.

This is joint work with Thirunarayanan Balathandayuthapani.

For innodb_checksum_algorithm=full_crc32 we change the
page_compressed format as follows:

FIL_PAGE_TYPE: The most significant bit will be set to indicate
page_compressed format. The least significant bits will contain
the compressed page size, rounded up to a multiple of 256 bytes.

The checksum will be stored in the last 4 bytes of the page
(whether it is the full page or a page_compressed page whose
size is determined by FIL_PAGE_TYPE), covering all preceding
bytes of the page. If encryption is used, then the page will
be encrypted between compression and computing the checksum.
For page_compressed, FIL_PAGE_LSN will not be repeated at
the end of the page.

FSP_SPACE_FLAGS (already implemented as part of MDEV-12026):
We will store the innodb_compression_algorithm that may be used
to compress pages. Previously, the choice of algorithm was written
to each compressed data page separately, and one would be unable
to know in advance which compression algorithm(s) are used.

fil_space_t::full_crc32_page_compressed_len(): Determine if the
page_compressed algorithm of the tablespace needs to know the
exact length of the compressed data. If yes, we will reserve and
write an extra byte for this right before the checksum.

buf_page_is_compressed(): Determine if a page uses page_compressed
(in any innodb_checksum_algorithm).

fil_page_decompress(): Pass also fil_space_t::flags so that the
format can be determined.

buf_page_is_zeroes(): Check if a page is full of zero bytes.

buf_page_full_crc32_is_corrupted(): Renamed from
buf_encrypted_full_crc32_page_is_corrupted(). For full_crc32,
we always simply validate the checksum to the page contents,
while the physical page size is explicitly specified by an
unencrypted part of the page header.

buf_page_full_crc32_size(): Determine the size of a full_crc32 page.

buf_dblwr_check_page_lsn(): Make this a debug-only function, because
it involves potentially costly lookups of fil_space_t.

create_table_info_t::check_table_options(),
ha_innobase::check_if_supported_inplace_alter(): Do allow the creation
of SPATIAL INDEX with full_crc32 also when page_compressed is used.

commit_cache_norebuild(): Preserve the compression algorithm when
updating the page_compression_level.

dict_tf_to_fsp_flags(): Set the flags for page compression algorithm.
FIXME: Maybe there should be a table option page_compression_algorithm
and a session variable to back it?
2019-03-18 14:08:43 +02:00
Marko Mäkelä
2151aed44d Follow-up fix to MDEV-12026: FIL_SPACE_FLAGS trump fil_space_t::flags
Whenever we are reading the first page of a data file, we may have to
adjust the provisionally created fil_space_t::flags to match what is
actually inside the data files. In this way, we will never accidentally
change the format of a data file.

fil_node_t::read_page0(): After validating the FIL_SPACE_FLAGS,
always assign them to space->flags.

btr_root_adjust_on_import(), Datafile::validate_to_dd(),
fil_space_for_table_exists_in_mem(): Adapt to the fix
in fil_node_t::read_page0().

fsp_flags_try_adjust(): Skip the adjustment if full_crc32 is being
used. This adjustment was introduced in MDEV-11623 for upgrading
from MariaDB 10.1.0 to 10.1.20, which used an accidentally changed
format of FIL_SPACE_FLAGS. MariaDB before 10.4.3 never set the
flag that now indicates the full_crc32 format.
2019-03-18 13:10:28 +02:00
Teemu Ollakka
1ef50a34ec 10.4 wsrep group commit fixes (#1224)
* MDEV-16509 Improve wsrep commit performance with binlog disabled

Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.

Implemented a test which verifies that the transactions release
commit order early.

This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.

* MDEV-18730 Ordering for wsrep binlog group commit

Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().

* MDEV-18730 Added a test case to verify correct commit ordering

* MDEV-16509, MDEV-18730 Review fixes

Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.

* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test

If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.

Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
2019-03-15 07:09:13 +02:00
Marko Mäkelä
e012d26680 row_undo(): Do not return an undefined value
Note: any error from row_undo() will lead to the server being
killed by InnoDB.
2019-03-14 15:49:28 +02:00
Monty
ebd0eab5a3 Removing warning from Aria recovery
The warning was removed as this is a common case that happens if the table
was dropped and later created during the same checkpoint or if there was
a bulk insert done on an empty table.
2019-03-14 12:06:17 +02:00
Marko Mäkelä
e450527938 Merge 10.3 into 10.4 2019-03-12 16:14:31 +02:00
Marko Mäkelä
69b33fca8c MDEV-18878: After-merge fixes
In 10.3, all records will be processed by purge due to MDEV-12288.
But, the insert undo records do not contain a transaction identifier.

row_purge_parse_undo_rec(): Use node->trx_id=TRX_ID_MAX for the
insert undo records. We cannot skip table lookups for these records
after DISCARD TABLESPACE other than by 'detaching' the table from
the undo logs by updating SYS_TABLES.ID on both DISCARD TABLESPACE
and IMPORT TABLESPACE.

Also, remove a redundant condition that was introduced
in the merge commit 814205f306.
2019-03-12 16:02:34 +02:00
Marko Mäkelä
b32bc70e34 Merge 10.2 into 10.3 2019-03-12 14:26:34 +02:00
Marko Mäkelä
bef947b4c9 MDEV-18902 Uninitialized variable in recv_parse_log_recs()
recv_parse_log_recs(): Do not compare type if ptr==end_ptr
(we have reached the end of the redo log parsing buffer),
because it will not have been correctly initialized in that case.
2019-03-12 14:03:10 +02:00
Marko Mäkelä
e070cfe398 MDEV-18878: Fix GCC -flifetime-dse
GCC 6 and later can optimize away the memset() that is part of
mem_heap_zalloc() in a placement new call. So, instead of relying
on that kind of initialization, explicitly initialize the necessary
fields in the constructors.

que_common_t::que_common_t(): Initialize more fields in the
default constructor.

purge_vcol_info_t::purge_vcol_info_t(): Initialize all fields in
the default constructor.

purge_node_t::purge_node_t(): Initialize all necessary fields.

Reference:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388

    https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html
2019-03-12 13:56:58 +02:00
Marko Mäkelä
e374755bae Merge 10.1 into 10.2 2019-03-12 13:11:07 +02:00
Marko Mäkelä
32de60bb2e MDEV-18749: Fix GCC -flifetime-dse
row_merge_create_fts_sort_index(): Initialize dict_col_t in
an unambiguous way. GCC 6 and later appear to be able to optimize
away the memset() that is part of mem_heap_zalloc() in the
placement new call. Let us avoid using placement new in order
to ensure that the objects will actually be initialized.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388

https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html

While the latter reference hints that the optimization is only
applicable to non-POD types (and dict_col_t does not define
any member functions before 10.2), it is most consistent to
use the same initialization across all versions.
2019-03-12 13:03:20 +02:00
Sergei Golubchik
69abd43703 MDEV-17070 Table corruption or Assertion table->file->stats.records > 0 || error' or Assertion !is_set() || (m_status == DA_OK_BULK && is_bulk_op())' failed upon actions on temporary table
This was caused by a combination of factors:
* MyISAM/Aria temporary tables historically never saved the state
  to disk (MYI/MAI), because the state never needed to persist
* certain ALTER TABLE operations modify the original TABLE structure
  and if they fail, the original table has to be reopened to
  revert all changes (m_needs_reopen=1)

as a result, when ALTER fails and MyISAM/Aria temp table gets reopened,
it reads the stale state from the disk.

As a fix, MyISAM/Aria tables now *always* write the state to disk
on close, *unless* HA_EXTRA_PREPARE_FOR_DROP was done first. And
the server now always does HA_EXTRA_PREPARE_FOR_DROP before dropping
a temporary table.
2019-03-12 09:51:42 +01:00
Marko Mäkelä
58f3ff7175 Merge 10.3 into 10.4 2019-03-11 18:27:58 +02:00
Marko Mäkelä
814205f306 Merge 10.2 into 10.3 2019-03-11 17:49:36 +02:00
Marko Mäkelä
28e713dc12 MDEV-18878: Correct a condition
Initialize node->trx_id before checking if a table can be skipped.
2019-03-11 17:29:46 +02:00
Marko Mäkelä
6e76704613 MDEV-18878: Slimmer purge in non-debug builds
purge_node_t::in_progress: Replaces purge_node_t::done.
Only present in debug builds.

purge_node_t::start(): Moved from the start of row_purge_step().

purge_node_t::end(): Replaces row_purge_end().

trx_purge_attach_undo_recs(): Omit a check from non-debug builds.
2019-03-11 17:18:37 +02:00
Marko Mäkelä
1ab049e572 MDEV-18878 Purge: Optimize away futile table lookups
If a table has been dropped, rebuilt, or its tablespace has been
discarded or the table is corrupted, it does not make sense to
look up that table again while purging old undo log records.

purge_node_t::purge_node_t(): Replaces row_purge_node_create().

que_common_t::que_common_t(): Constructor.

row_import_update_index_root(): Remove the constant parameter
dict_locked=true, and update the table->def_trx_id in the cache.

purge_node_t::unavailable_table_id: The latest unavailable table ID,
to avoid future lookups.

purge_node_t::def_trx_id: The latest modification of the table
identified by unavailable_table_id, or TRX_ID_MAX.

purge_node_t::is_skipped(): Determine if a table should be skipped.

purge_node_t::skip(): Note that a table should be skipped.
2019-03-11 17:17:24 +02:00
Marko Mäkelä
c67b306e4f Merge 10.3 into 10.4 2019-03-08 11:19:48 +02:00
Marko Mäkelä
136d21c82f MDEV-13818: Revert an incorrect change
In commit d30f17af49 the change of
the loop iteration broke another error handling path that did
"goto error_handling_drop_uncached". Cover this code path with
fault injection, and revert to the correct iteration.

There are two fault injection labels innodb_OOM_prepare_inplace_alter.
Their order was swapped in MDEV-11369, so that the label that used
to be covered in an ADD INDEX code path would become unreachable
because the label that is executed for any ALTER TABLE was executed
first. Let us introduce the label innodb_OOM_prepare_add_index
for the more specific case.
2019-03-08 09:20:12 +02:00
Marko Mäkelä
2d0dd62cf7 Merge 10.2 into 10.3 2019-03-08 00:26:55 +02:00
Marko Mäkelä
b4cda8bbbc After-merge fix for GCC
GCC does not like MY_ATTRIBUTE((nonnull)) on a reference-to-pointer
parameter. clang did not flag an issue wit that.
2019-03-07 18:54:53 +02:00
Marko Mäkelä
913e33e423 Merge 10.1 into 10.2
Rewrite the MDEV-13818 fix to prevent heap-use-after-free.

Add a test case for MDEV-18272.
2019-03-07 17:52:27 +02:00
Sergei Golubchik
d30f17af49 MDEV-13818 CREATE INDEX leaks memory if running out of undo log space
if create_index_dict() fails, we need to free ctx->add_index[a] too

This fixes innodb.alter_crash and innodb.instant_alter_debug
failures in ASAN_OPTIONS="abort_on_error=1" runs
2019-03-07 15:02:15 +01:00
Marko Mäkelä
e3adf96aeb MDEV-13818 CREATE INDEX leaks memory if running out of undo log space
row_merge_create_index_graph(): Relay the internal state
from dict_create_index_step(). Our caller should free the index
only if it was not copied, added to the cache, and freed.

row_merge_create_index(): Free the index template if it was
not added to the cache. This is a safer variant of the logic
that was introduced in 65070beffd in 10.2.

prepare_inplace_alter_table_dict(): Add additional fault injection
to exercise a code path where we have already added an index
to the cache.
2019-03-07 15:35:55 +02:00
Marko Mäkelä
4c0f43f45a Merge 10.0 into 10.1 2019-03-07 12:27:42 +02:00
Marko Mäkelä
0a0eed8016 Merge 5.5 into 10.0 2019-03-07 12:04:22 +02:00
Marko Mäkelä
8024f8c6b8 MDEV-18272 InnoDB fails to rollback after exceeding FOREIGN KEY recursion depth
row_mysql_handle_errors(): Correct the wrong error handling for
the code DB_FOREIGN_EXCEED_MAX_CASCADE that was introduced in

c0923d396a

    commit 35f5429eda
    Author: Jimmy Yang <jimmy.yang@oracle.com>
    Date:   Wed Oct 6 06:55:34 2010 -0700

        Manual port Bug #Bug #54582 "stack overflow when opening many tables
        linked with foreign keys at once" from mysql-5.1-security to
        mysql-5.5-security again.

        rb://391 approved by Heikki

No known test case exists for repeating the bug before MariaDB 10.2.
The scenario should be that DB_FOREIGN_EXCEED_MAX_CASCADE is returned,
then InnoDB wrongly skips the rollback to the start of the current
row operation, and finally the SQL layer commits the transaction.
Normally the SQL layer would roll back either the entire transaction or
to the start of the statement. In the faulty scenario, InnoDB would
leave the transaction in an inconsistent state, and the SQL layer could
commit the transaction.
2019-03-07 11:57:14 +02:00
Marko Mäkelä
aa4b2c1509 Merge 10.3 into 10.4 2019-03-07 08:02:33 +02:00
Sergei Golubchik
65070beffd MDEV-13818 CREATE INDEX leaks memory if running out of undo log space
free already allocated indexes if row_merge_create_index() fails

This fixes innodb.alter_crash failure
in ASAN_OPTIONS="abort_on_error=1" runs
2019-03-06 15:28:27 +01:00
Sergei Golubchik
2b4027e633 mronga: fix a memory leak
use the correct delete operator

This fixes mroonga/storage.column_generated_stored_add_column failures
in ASAN_OPTIONS="abort_on_error=1" runs
2019-03-06 15:28:27 +01:00
Sergei Golubchik
5f105e756b MDEV-18625 ASAN unknown-crash in my_copy_fix_mb / ha_mroonga::storage_inplace_alter_table_add_column
disable inplace alter for adding stored generated columns.

This fixes mroonga/storage.column_generated_stored_add_column failures
in ASAN_OPTIONS="abort_on_error=1" runs

Also, add a test case that shows the bug without ASAN.
2019-03-06 15:28:27 +01:00
Marko Mäkelä
77103e9832 Merge 10.2 into 10.3 2019-03-06 16:20:13 +02:00
Marko Mäkelä
c155946c90 Merge 10.1 into 10.2 2019-03-06 15:15:59 +02:00
Marko Mäkelä
485dcb07d1 MDEV-18637 Assertion `cache' failed in fts_init_recover_doc
I know no test case for this bug in 10.1. So a test case will be
committed separately in 10.2

fts_reset_get_doc(): properly initialize fts_get_doc_t::cache
2019-03-06 14:46:58 +02:00
Marko Mäkelä
4b5dc47f56 MDEV-18659: Revert a non-functional change
fts_fetch_index_words(): Restore the initialization len=0.
The test innodb_fts.create in 10.2 would end up in an infinite loop
if this assignment is removed, because a following iteration of the
while() loop would assign zip->zp->avail_in=len with the original value
instead of the 0 that was reset in the previous iteration.
2019-03-06 12:45:54 +02:00
Marko Mäkelä
b761211685 MDEV-18659: Fix string truncation/overflow in InnoDB and XtraDB
Fix the warnings issued by GCC 8 -Wstringop-truncation
and -Wstringop-overflow in InnoDB and XtraDB.

This work is motivated by Jan Lindström. The patch mainly differs
from his original one as follows:

(1) We remove explicit initialization of stack-allocated string buffers.
The minimum amount of initialization that is needed is a terminating
NUL character.
(2) GCC issues a warning for invoking strncpy(dest, src, sizeof dest)
because if strlen(src) >= sizeof dest, there would be no terminating
NUL byte in dest. We avoid this problem by invoking strncpy() with
a limit that is 1 less than the buffer size, and by always writing
NUL to the last byte of the buffer.
(3) We replace strncpy() with memcpy() or strcpy() in those cases
when the result is functionally equivalent.

Note: fts_fetch_index_words() never deals with len==UNIV_SQL_NULL.
This was enforced by an assertion that limits the maximum length
to FTS_MAX_WORD_LEN. Also, the encoding that InnoDB uses for
the compressed fulltext index is not byte-order agnostic, that is,
InnoDB data files that use FULLTEXT INDEX are not portable between
big-endian and little-endian systems.
2019-03-06 11:22:27 +02:00
Marko Mäkelä
b21930fb0f MDEV-18749: Uninitialized value upon ADD FULLTEXT INDEX
row_merge_create_fts_sort_index(): Initialize dict_col_t.

This fixes an access to uninitialized dict_col_t::ind when a debug
assertion in MariaDB 10.4 invokes is_dropped() in
rec_get_converted_size_comp_prefix_low(). Older MariaDB versions
seem to be unaffected by the uninitialized values, but it should
not hurt to initialize everything.
2019-03-06 10:37:43 +02:00
Marko Mäkelä
2a791c53ad Merge 10.3 into 10.4 2019-03-06 09:00:52 +02:00
Marko Mäkelä
446b3ebdfc Merge 10.2 into 10.3
FIXME: Properly resolve conflicts between MDEV-18883
and MDEV-7742/MDEV-8305, and record the correct result for
main.log_slow
2019-03-05 12:56:05 +02:00
Marko Mäkelä
730ed9907b After-merge fix: Add explicit type conversion
Lesson learned: A HA_TOPTION_SYSVAR that is bound to
MYSQL_THDVAR_UINT does not have the type uint, but ulonglong.
2019-03-04 18:38:42 +02:00
Marko Mäkelä
a2fc36989e Merge 10.2 into 10.3 2019-03-04 17:01:00 +02:00
Marko Mäkelä
9835f7b80f Merge 10.1 into 10.2 2019-03-04 16:46:58 +02:00
Marko Mäkelä
91e4f00389 MDEV-18732 InnoDB: ALTER IGNORE returns error for NULL
Only starting with MariaDB 10.3.8 (MDEV-16365), InnoDB can actually
handle ALTER IGNORE TABLE correctly when introducing a NOT NULL
attribute to a column that contains a NULL value. Between
MariaDB Server 10.0 and 10.2, we would incorrectly return an error
for ALTER IGNORE TABLE when the column contains a NULL value.
2019-03-04 15:16:27 +02:00
Sergei Golubchik
4ca2079142 MDEV-18486 Database crash on a table with indexed virtual column
don't do anything special for stored generated columns
in MyISAM repair code.

add an assert that if there are virtual indexed columns, they
_must_ be beyond the file->s->base.reclength boundary
2019-03-01 13:23:34 -05:00
Marko Mäkelä
4a1c66290d MDEV-18775 Fix ALTER TABLE error handling for DROP INDEX
On an error (such as when an index cannot be dropped due to
FOREIGN KEY constraints), the field dict_index_t::to_be_dropped
was only being cleared in debug builds, even though the field
is available and being used also in non-debug builds.

This was a regression that was introduced by myself originally
in MySQL 5.7.6 and later merged to MariaDB 10.2.2, in
d39898de8e

An error manifested itself in the MariaDB Server 10.4 non-debug build,
involving instant ADD or DROP column. Because an earlier failed
ALTER TABLE operation incorrectly left the dict_index_t::to_be_dropped
flag set, the column pointers of the index fields would fail to be
adjusted for instant ADD or DROP column (MDEV-15562). The instant
ADD COLUMN in MariaDB Server 10.3 is unlikely to be affected by a
similar scenario, because dict_table_t::instant_add_column() in 10.3
is applying the transformations to all indexes, not skipping
to-be-dropped ones.
2019-03-01 12:16:12 +02:00
Marko Mäkelä
e39d6e0c53 MDEV-18601 Can't create table with ENCRYPTED=DEFAULT when innodb_default_encryption_key_id!=1
The problem with the InnoDB table attribute encryption_key_id is that it is
not being persisted anywhere in InnoDB except if the table attribute
encryption is specified and is something else than encryption=default.
MDEV-17320 made it a hard error if encryption_key_id is specified to be
anything else than 1 in that case.

Ideally, we would always persist encryption_key_id in InnoDB. But, then we
would have to be prepared for the case that when encryption is being enabled
for a table whose encryption_key_id attribute refers to a non-existing key.

In MariaDB Server 10.1, our best option remains to not store anything
inside InnoDB. But, instead of returning the error that MDEV-17320
introduced, we should merely issue a warning that the specified
encryption_key_id is going to be ignored if encryption=default.

To improve the situation a little more, we will issue a warning if
SET [GLOBAL|SESSION] innodb_default_encryption_key_id is being set
to something that does not refer to an available encryption key.

Starting with MariaDB Server 10.2, thanks to MDEV-5800, we could open the
table definition from InnoDB side when the encryption is being enabled,
and actually fix the root cause of what was reported in MDEV-17320.
2019-02-28 23:20:31 +02:00
Jan Lindström
f65f40bb35 Merge remote-tracking branch 'origin/10.1' into 10.2 2019-02-28 13:08:11 +02:00
Sergei Golubchik
f78c0f6f00 MDEV-18722 Assertion `templ->mysql_null_bit_mask' failed in row_sel_store_mysql_rec upon modifying indexed column into blob
don't assert that virtual columns are always nullable
2019-02-27 23:27:43 -05:00