crashes server
This bug is the result of merging the Oracle MySQL follow-up fix
BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX
without merging the base bug fix:
Bug#79475 Insert a token of 84 4-bytes chars into fts index causes
server crash.
Unlike the above mentioned fixes in MySQL, our fix will not change
the storage format of fulltext indexes in InnoDB or XtraDB
when a character encoding with mbmaxlen=2 or mbmaxlen=3
and the length of a word is between 128 and 84*mbmaxlen bytes.
The Oracle fix would allocate 2 length bytes for these cases.
Compatibility with other MySQL and MariaDB releases is ensured by
persisting the used maximum length in the SYS_COLUMNS table in the
InnoDB data dictionary.
This fix also removes some unnecessary strcmp() calls when checking
for the legacy default collation my_charset_latin1
(my_charset_latin1.name=="latin1_swedish_ci").
fts_create_one_index_table(): Store the actual length in bytes.
This metadata will be written to the SYS_COLUMNS table.
fts_zip_initialize(): Initialize only the first byte of the buffer.
Actually the code should not even care about this first byte, because
the length is set as 0.
FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes,
not as 254 bytes.
row_merge_create_fts_sort_index(): Set the actual maximum length of the
column in bytes, similar to fts_create_one_index_table().
row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype.
Use the actual maximum length of the column. Calculate the extra_size
in the same way as row_merge_buf_encode() does.
crashes server
This bug is the result of merging the Oracle MySQL follow-up fix
BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX
without merging the base bug fix:
Bug#79475 Insert a token of 84 4-bytes chars into fts index causes
server crash.
Unlike the above mentioned fixes in MySQL, our fix will not change
the storage format of fulltext indexes in InnoDB or XtraDB
when a character encoding with mbmaxlen=2 or mbmaxlen=3
and the length of a word is between 128 and 84*mbmaxlen bytes.
The Oracle fix would allocate 2 length bytes for these cases.
Compatibility with other MySQL and MariaDB releases is ensured by
persisting the used maximum length in the SYS_COLUMNS table in the
InnoDB data dictionary.
This fix also removes some unnecessary strcmp() calls when checking
for the legacy default collation my_charset_latin1
(my_charset_latin1.name=="latin1_swedish_ci").
fts_create_one_index_table(): Store the actual length in bytes.
This metadata will be written to the SYS_COLUMNS table.
fts_zip_initialize(): Initialize only the first byte of the buffer.
Actually the code should not even care about this first byte, because
the length is set as 0.
FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes,
not as 254 bytes.
row_merge_create_fts_sort_index(): Set the actual maximum length of the
column in bytes, similar to fts_create_one_index_table().
row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype.
Use the actual maximum length of the column. Calculate the extra_size
in the same way as row_merge_buf_encode() does.
There are only 3 logical states for a number. The isfinite is a
single function call rather than multiple leaving scope for compiler
/architecture optimization.
Changed the logic as follows in a few files.
my_isinf(square) || my_isnan(square) -> !isfinite(square)
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
We don't need to drop down to unoptimized crc because of valgrind now.
Valgrind-3.6.1 was released 16 February 2011.
The Power8 ASM instructions seem to be supported in 3.9.0 (31 October 2013).
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
trx_state_eq(): Add the parameter bool relaxed=false, to
allow trx->state==TRX_STATE_NOT_STARTED where a different
state is expected, if an error has been reported.
trx_release_savepoint_for_mysql(): Pass relaxed=true to
trx_state_eq(). That is, allow the transaction to be idle
when ROLLBACK TO SAVEPOINT is attempted after an error
has been reported to the client.
trx_state_eq(): Add the parameter bool relaxed=false, to
allow trx->state==TRX_STATE_NOT_STARTED where a different
state is expected, if an error has been reported.
trx_release_savepoint_for_mysql(): Pass relaxed=true to
trx_state_eq(). That is, allow the transaction to be idle
when ROLLBACK TO SAVEPOINT is attempted after an error
has been reported to the client.
MySQL 5.7 introduced WL#7943: InnoDB: Implement Information_Schema.Files
to provide a long-term alternative for accessing tablespace metadata.
The INFORMATION_SCHEMA.INNODB_* views are considered internal interfaces
that are subject to change or removal between releases. So, users should
refer to I_S.FILES instead of I_S.INNODB_SYS_TABLESPACES to fetch metadata
about CREATE TABLESPACE.
Because MariaDB 10.2 does not support CREATE TABLESPACE or
CREATE TABLE…TABLESPACE for InnoDB, it does not make sense to support
I_S.FILES either. So, let MariaDB 10.2 omit the code that was added in
MySQL 5.7. After this change, I_S.FILES will report the empty result,
unless some other storage engine in MariaDB 10.2 implements the interface.
(The I_S.FILES interface was originally created for the NDB Cluster.)
MariaDB 10.2 incorporates MySQL 5.7. MySQL 5.7.9 (the first GA release
of the series) introduced an informational field to the InnoDB redo log
header, which identifies the server version where the redo log files
were created (initialized, resized or updated), in
WL#8845: InnoDB: Redo log format version identifier.
The informational message would be displayed to the user, for example
if someone tries to start up MySQL 8.0 after killing a MariaDB 10.2 server.
In the current MariaDB 10.2 source code, the identifier string would
misleadingly say "MySQL 5.7.14" (using the hard-coded version number in
univ.i) instead of "MariaDB 10.2.3" (using the contents of the VERSION
file, the build system copies to config.h and my_config.h).
This is only a cosmetic change. The compatibility check is based on a
numeric identifier.
We should probably also change the numeric identifier and some logic
around it. MariaDB 10.2 should refuse to recover from a crashed MySQL 5.7
instance, because the redo log might contain references to shared tablespaces,
which are not supported by MariaDB 10.2. Also, when MariaDB 10.2 creates
an encrypted redo log, there should be a redo log format version tag that
will prevent MySQL 5.7 or 8.0 from starting up.
Revert the XtraDB changes, because 10.2 does not currently build with XtraDB.
Also omit some changes that need further investigation.
Ensure that all callers of partition_info::get_clone() are passing this!=NULL.
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
In functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.
Use #ifdef instead of #if when checking for a configuration flag.
Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.
Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.
ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
Threads may fall asleep forever while acquiring InnoDB rw-lock on Power8. This
regression was introduced along with InnoDB atomic operations fixes.
The problem was that proper memory order wasn't enforced between "writers"
store and "lock_word" load:
my_atomic_store32((int32*) &lock->waiters, 1);
...
local_lock_word = lock->lock_word;
Locking protocol is such that store to "writers" must be completed before load
of "lock_word". my_atomic_store32() was expected to enforce memory order because
it issues strongest MY_MEMORY_ORDER_SEQ_CST memory barrier.
According to C11:
- any operation with this memory order is both an acquire operation and a
release operation
- for atomic store order must be one of memory_order_relaxed
memory_order_release or memory_order_seq_cst. Otherwise the behavior is
undefined.
That is it doesn't say explicitly that this expectation is wrong, but there are
indications that acquire (which is actually supposed to guarantee memory order
in this case) may not be issued along with MY_MEMORY_ORDER_SEQ_CST.
A good fix for this is to encode waiters into lock_word, but it is a bit too
intrusive. Instead we change atomic store to atomic fetch-and-store, which
does memory load and is guaranteed to issue acquire.
Simplify away recursive flag: it is not necessary for rw-locks to operate
properly. Now writer_thread != 0 means recursive.
As we only need correct value of writer_thread only in writer_thread itself
it is rather safe to load and update it non-atomically.
This patch also fixes potential reorder of "writer_thread" and "recursive"
loads (aka MDEV-7148), which was reopened along with InnoDB thread fences
simplification. Previous versions are unaffected, because they have os_rmb
in rw_lock_lock_word_decr(). It wasn't observed at the moment of writing
though.
In InnoDB and XtraDB functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.
Use #ifdef instead of #if when checking for a configuration flag.
Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.
Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.
ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
This is not a fix, this is instrumentation to find out is MySQL frm dictionary
and InnoDB data dictionary really out-of-sync when this assertion is fired,
or is there some other reason (bug).
With "WL#6047 - Do not allocate trx id for read-only transactions"
m_prebuilt->trx->id is always 0 for read-only transactions. This makes
it useless as an index for fuzzy counters.
Use server thread id instead similarly to MySQL.
for InnoDB tables"
Don't use thr_lock.c locks for InnoDB tables. Below is list of changes that
were needed to implement this:
- HANDLER OPEN acquireis MDL_SHARED_READ instead of MDL_SHARED
- HANDLER READ calls external_lock() even if SE is not going to be locked by
THR_LOCK
- InnoDB lock wait timeouts are now honored which are much shorter by default
than server lock wait timeouts (1 year vs 50 seconds)
- with @@autocommit= 1 LOCK TABLES disables autocommit implicitely, though
user still sees @@autocommt= 1
- the above starts implicit transaction
- transactions started by LOCK TABLES are now rolled back on disconnect
(previously everything was committed due to autocommit)
- transactions started by LOCK TABLES are now rolled back by ROLLBACK
(previously everything was committed due to autocommit)
- it is now impossible to change BINLOG_FORMAT under LOCK TABLES (at least
to statement) due to running transaction
- LOCK TABLES WRITE is additionally handled by MDL
- ...in contrast LOCK TABLES READ protection against DML is pure InnoDB
- combining transactional and non-transactional tables under LOCK TABLES
may cause rolled back changes in transactional table and "committed"
changes in non-transactional table
- user may disable innodb_table_locks, which will cause LOCK TABLES to be
noop basically
Removed tests for BUG#45143 and BUG#55930 which cover InnoDB + THR_LOCK. To
operate properly these tests require code flow to go through THR_LOCK debug
sync points, which is not the case after this patch. These tests are removed
by WL#6671 as well. An alternative is to port them to different storage engine.
Issue:
======
Currently the approach we take to find the chunk corresponding to a given
pointer uses srv_buf_pool_chunk_unit based on the assumption that
srv_buf_pool_chunk_unit is the total size of all pages in a buffer pool
chunk. We first step back by srv_buf_pool_chunk_unit bytes and use
std::map::upper_bound() to find the first chunk in the map whose key >= the
resulting pointer.
However, the real size of a chunk (and thus, the total size of its pages)
may differ from the value configured with innodb_buffer_pool_chunk_size
due to rounding up to the OS page size. So, in some cases the above logic
gives us the wrong chunk.
Fix:
====
We find out the chunk corresponding to the give pointer without using
srv_buf_pool_chunk_unit. This is done by using std::map::upper_bound()
to find the next chunk in the map which appears right after the pointer and
decrementing the iterator, which would give us the chunk the pointer
belongs to.
Contribution by Alexey Kopytov.
RB: 13347
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
No -DHAVE_LIBNUMA=1 was passed to the source compile (and the
global include/my_config.h wasn't used).
This also is Linux only so corrected the cmake macro.
Fixed indenting in cmake macro.
Removed NUMA defination from include/my_config.h as its only
in the storage engine.
Thanks Elena Stepanova and Vladislav Vaintroub for the detailed
list of bugs/questions.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Analysis: Problem is that page is encrypted but encryption information
on page 0 has already being changed.
Fix: If page header contains key_version != 0 and even if based on
current encryption information tablespace is not encrypted we
need to check is page corrupted. If it is not, then we know that
page is not encrypted. If page is corrupted, we need to try to
decrypt it and then compare the stored and calculated checksums
to see is page corrupted or not.
Clean-up nolock.h: it doesn't serve any purpose anymore. Appropriate code moved
to x86-gcc.h and my_atomic.h.
If gcc sync bultins were detected, we want to make use of them independently of
__GNUC__ definition. E.g. XLC simulates those, but doesn't define __GNUC__.
HS/Spider: According to AIX manual alloca() returns char*, which cannot be
casted to any type with static_cast. Use explicit cast instead.
MDL: Removed namemangling pragma, which didn't let MariaDB build with XLC.
WSREP: _int64 seem to be conflicting name with XLC, replaced with _integer64.
CONNECT: RTLD_NOLOAD is GNU extention. Removed rather meaningless check if
library is loaded. Multiple dlopen()'s of the same library are permitted,
and it never gets closed anyway. Except for error, which was a bug: it may
close library, which can still be referenced by other subsystems.
InnoDB: __ppc_get_timebase() is GNU extention. Only use it when __GLIBC__ is
defined.
Based on contribution by flynn1973.
Two problems:
(1) When pushing warning to sql-layer we need to check that thd != NULL
to avoid NULL-pointer reference.
(2) At tablespace key rotation if used key_id is not found from
encryption plugin tablespace should not be rotated.
MDEV-10394: Innodb system table space corrupted
Analysis: After we have read the page in buf_page_io_complete try to
find if the page is encrypted or corrupted. Encryption was determined
by reading FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION field from FIL-header
as a key_version. However, this field is not always zero even when
encryption is not used. Thus, incorrect key_version could lead situation where
decryption is tried to page that is not encrypted.
Fix: We still read key_version information from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
field but also check if tablespace has encryption information before trying
encrypt the page.