FULLTEXT INDEX AND CONCURRENT DML.
Problem Statement:
------------------
1) Create a table with FT index.
2) Enable concurrent inserts.
3) In multiple threads do below operations repeatedly
a) truncate table
b) insert into table ....
c) select ... match .. against .. non-boolean/boolean mode
After some time we could observe two different assert core dumps
Analysis:
--------
1)assert core dump at key_read_cache():
Two select threads operating in-parallel on same key
root block.
1st select thread block->status is set to BLOCK_ERROR
because the my_pread() in read_block() is returning '0'.
Truncate table made the index file size as 1024 and pread
was asked to get the block of count bytes(1024 bytes)
from offset of 1024 which it cannot read since its
"end of file" and retuning '0' setting
"my_errno= HA_ERR_FILE_TOO_SHORT" and the key_file_length,
key_root[0] is same i.e. 1024. Since block status has BLOCK_ERROR
the 1st select thread enter into the free_block() and will
be under wait on conditional mutex by making status as
BLOCK_REASSIGNED and goes for wait_on_readers(). Other select
thread will also work on the same block and sees the status as
BLOCK_ERROR and enters into free_block(), checks for BLOCK_REASSIGNED
and asserting the server.
2)assert core dump at key_write_cache():
One select thread and One insert thread.
Select thread gets the unlocks the 'keycache->cache_lock',
which allows other threads to continue and gets the pread()
return value as'0'(please see the explanation above) and
tries to get the lock on 'keycache->cache_lock' and waits
there for the lock.
Insert thread requests for the block, block will be assigned
from the hash list and makes the page_status as
'PAGE_WAIT_TO_BE_READ' and goes for the read_block(), waits
in the queue since there are some other threads performing
reads on the same block.
Select thread which was waiting for the 'keycache->cache_lock'
mutex in the read_block() will continue after getting the my_pread()
value as '0' and sets the block status as BLOCK_ERROR and goes to
the free_block() and go to the wait_for_readers().
Now the insert thread will awake and continues. and checks
block->status as not BLOCK_READ and it asserts.
Fix:
---
In the full text code, multiple readers of index file is not guarded.
Hence added below below code in _ft2_search() and walk_and_match().
to lock the key_root I have used below code in _ft2_search()
if (info->s->concurrent_insert)
mysql_rwlock_rdlock(&share->key_root_lock[0]);
and to unlock
if (info->s->concurrent_insert)
mysql_rwlock_unlock(&share->key_root_lock[0]);
storage/myisam/ft_boolean_search.c:
Since its a recursion function, to avoid confusion in taking and
releasing the locks, renamed _ft2_search() to _ft2_search_internal()
function. And _ft2_search() will take the lock, call
_ft2_search_internal() and release the lock in case of concurrent
inserts.
storage/myisam/ft_nlq_search.c:
Added read locks code in walk_and_match()
FULLTEXT INDEX AND CONCURRENT DML.
Problem Statement:
------------------
1) Create a table with FT index.
2) Enable concurrent inserts.
3) In multiple threads do below operations repeatedly
a) truncate table
b) insert into table ....
c) select ... match .. against .. non-boolean/boolean mode
After some time we could observe two different assert core dumps
Analysis:
--------
1)assert core dump at key_read_cache():
Two select threads operating in-parallel on same key
root block.
1st select thread block->status is set to BLOCK_ERROR
because the my_pread() in read_block() is returning '0'.
Truncate table made the index file size as 1024 and pread
was asked to get the block of count bytes(1024 bytes)
from offset of 1024 which it cannot read since its
"end of file" and retuning '0' setting
"my_errno= HA_ERR_FILE_TOO_SHORT" and the key_file_length,
key_root[0] is same i.e. 1024. Since block status has BLOCK_ERROR
the 1st select thread enter into the free_block() and will
be under wait on conditional mutex by making status as
BLOCK_REASSIGNED and goes for wait_on_readers(). Other select
thread will also work on the same block and sees the status as
BLOCK_ERROR and enters into free_block(), checks for BLOCK_REASSIGNED
and asserting the server.
2)assert core dump at key_write_cache():
One select thread and One insert thread.
Select thread gets the unlocks the 'keycache->cache_lock',
which allows other threads to continue and gets the pread()
return value as'0'(please see the explanation above) and
tries to get the lock on 'keycache->cache_lock' and waits
there for the lock.
Insert thread requests for the block, block will be assigned
from the hash list and makes the page_status as
'PAGE_WAIT_TO_BE_READ' and goes for the read_block(), waits
in the queue since there are some other threads performing
reads on the same block.
Select thread which was waiting for the 'keycache->cache_lock'
mutex in the read_block() will continue after getting the my_pread()
value as '0' and sets the block status as BLOCK_ERROR and goes to
the free_block() and go to the wait_for_readers().
Now the insert thread will awake and continues. and checks
block->status as not BLOCK_READ and it asserts.
Fix:
---
In the full text code, multiple readers of index file is not guarded.
Hence added below below code in _ft2_search() and walk_and_match().
to lock the key_root I have used below code in _ft2_search()
if (info->s->concurrent_insert)
mysql_rwlock_rdlock(&share->key_root_lock[0]);
and to unlock
if (info->s->concurrent_insert)
mysql_rwlock_unlock(&share->key_root_lock[0]);
TABLES IN INCORRECT ENGINE
PROBLEM:
CREATE/ALTER TABLE currently can move system tables like
mysql.db, user, host etc, to engines other than MyISAM. This is not
completely supported as of now, by mysqld. When some of system tables
like plugin, servers, event, func, *_priv, time_zone* are moved
to innodb, mysqld restart crashes. Currently system tables
can be moved to BLACKHOLE also!!!.
ANALYSIS:
The problem is that there is no check before creating or moving
a system table to some particular engine.
System tables are suppose to be residing in MyISAM. We can think
of restricting system tables to exist only in MyISAM. But, there could
be future needs of these system tables to be part of other engines
by design. For eg, NDB cluster expects some tables to be on innodb
or ndb engine. This calls for a solution, by which system
tables can be supported by any desired engine, with minimal effort.
FIX:
The solution provides a handlerton interface using which,
mysqld server can query particular storage engine handlerton for
system tables that it supports. This way each storage engine
layer can define their own system database and system tables.
The check_engine() function uses the new handlerton function
ha_check_if_supported_system_table() to check if db.tablename
provided in the DDL is supported by the SE.
Note: This fix has modified a test in help.test, which was moving
mysql.help_* to innodb. The primary intention of the test was not
to move them between engines.
TABLES IN INCORRECT ENGINE
PROBLEM:
CREATE/ALTER TABLE currently can move system tables like
mysql.db, user, host etc, to engines other than MyISAM. This is not
completely supported as of now, by mysqld. When some of system tables
like plugin, servers, event, func, *_priv, time_zone* are moved
to innodb, mysqld restart crashes. Currently system tables
can be moved to BLACKHOLE also!!!.
ANALYSIS:
The problem is that there is no check before creating or moving
a system table to some particular engine.
System tables are suppose to be residing in MyISAM. We can think
of restricting system tables to exist only in MyISAM. But, there could
be future needs of these system tables to be part of other engines
by design. For eg, NDB cluster expects some tables to be on innodb
or ndb engine. This calls for a solution, by which system
tables can be supported by any desired engine, with minimal effort.
FIX:
The solution provides a handlerton interface using which,
mysqld server can query particular storage engine handlerton for
system tables that it supports. This way each storage engine
layer can define their own system database and system tables.
The check_engine() function uses the new handlerton function
ha_check_if_supported_system_table() to check if db.tablename
provided in the DDL is supported by the SE.
Note: This fix has modified a test in help.test, which was moving
mysql.help_* to innodb. The primary intention of the test was not
to move them between engines.
Fixed failing tests in sys_vars as we have now stricter checking of setting of variables.
mysql-test/suite/sys_vars/r/innodb_adaptive_flushing_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_adaptive_hash_index_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_large_prefix_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_random_read_ahead_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_stats_on_metadata_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_strict_mode_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_support_xa_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/innodb_table_locks_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/rpl_semi_sync_master_enabled_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/r/rpl_semi_sync_slave_enabled_basic.result:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_adaptive_flushing_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_adaptive_hash_index_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_large_prefix_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_random_read_ahead_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_stats_on_metadata_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_strict_mode_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_support_xa_basic.test:
One can now only assign 0 or 1 to boolean variables
mysql-test/suite/sys_vars/t/innodb_table_locks_basic.test:
One can now only assign 0 or 1 to boolean variables
mysys/my_getsystime.c:
Merge + fixed bug that __NR_clock_gettime didn't work in 5.5
The issue was that check/optimize/anaylze did not zerofill the table before they started to work on it.
Added one more element to not often used function handler::auto_repair() to allow handler to decide when to auto repair.
mysql-test/suite/maria/r/maria-autozerofill.result:
Test case for lp:944422
mysql-test/suite/maria/t/maria-autozerofill.test:
Test case for lp:944422
sql/ha_partition.cc:
Added argument to auto_repair()
sql/ha_partition.h:
Added argument to auto_repair()
sql/handler.h:
Added argument to auto_repair()
sql/table.cc:
Let auto_repair() decide which errors to trigger auto-repair
storage/archive/ha_archive.h:
Added argument to auto_repair()
storage/csv/ha_tina.h:
Added argument to auto_repair()
storage/maria/ha_maria.cc:
Give better error & warning messages for auto-repaired tables.
storage/maria/ha_maria.h:
Added argument to auto_repair()
Always auto-repair in case of moved table.
storage/maria/ma_open.c:
Remove special handling of HA_ERR_OLD_FILE (this is now handled in auto_repair())
storage/myisam/ha_myisam.h:
Added argument to auto_repair()
mysql-test/suite/innodb/t/group_commit_crash.test:
remove autoincrement to avoid rbr being used for insert ... select
mysql-test/suite/innodb/t/group_commit_crash_no_optimize_thread.test:
remove autoincrement to avoid rbr being used for insert ... select
mysys/my_addr_resolve.c:
a pointer to a buffer is returned to the caller -> the buffer cannot be on the stack
mysys/stacktrace.c:
my_vsnprintf() is ok here, in 5.5
storage/innobase/include/sync0rw.ic:
Prerequisite for compiling with gcc4 on solaris: ignore result from
os_compare_and_swap_ulint
storage/myisam/mi_dynrec.c:
Prerequisite for compiling with gcc4 on solaris: cast to void*
Fixed wrong mutex order bug in Aria when flush_log_for_bitmap() was called when table is not yet marked for change.
include/my_base.h:
Added flag that table is opened only for status
mysql-test/r/myisam-big.result:
Test case for lp:925377
mysql-test/t/myisam-big.test:
Test case for lp:925377
sql/sql_base.cc:
If thd->version == 0 (happens only when we are opening a table that is flushed under MYSQL_LOCK_IGNORE_FLUSH), open the table in HA_OPEN_FOR_STATUS mode
storage/maria/ma_bitmap.c:
Fixed wrong mutex order bug in Aria when flush_log_for_bitmap() was called when table is not yet marked for change.
storage/maria/ma_dbug.c:
Ignore last_version <= 1 as these are either flushed or only opened for status
storage/maria/ma_open.c:
Use last_version=1 as a marker that table was opened with HA_OPEN_FOR_STATUS.
In this case we just open a new version of the table in read only mode.
storage/myisam/mi_create.c:
Update prototype
storage/myisam/mi_dbug.c:
Ignore last_version <= 1 as these are either flushed or only opened for status
storage/myisam/mi_open.c:
Use last_version=1 as a marker that table was opened with HA_OPEN_FOR_STATUS.
If HA_OPEN_FOR_STATUS is used, we will not assert if there is an old not-to-be-used version of the table existing.
In this case we just open a new version of the table in read only mode.
storage/myisam/myisamdef.h:
Updated prototype
mysql-test-run auto-disables all optional plugins.
mysql-test/include/default_client.cnf:
no @OPT.plugindir anymore
mysql-test/include/default_mysqld.cnf:
don't disable plugins manually - mtr can do it better
mysql-test/suite/innodb/t/innodb_bug47167.test:
mtr now uses suite-dir as an include path
mysql-test/suite/innodb/t/innodb_file_format.test:
mtr now uses suite-dir as an include path
mysql-test/t/partition_binlog.test:
this test uses partitions
storage/example/mysql-test/mtr/t/source.result:
update results. as mysqltest includes the correct overlayed include
storage/innobase/handler/ha_innodb.cc:
the assert is wrong
- In mi_rkey(), do correct handling of case where mi_yield_and_check_if_killed()
detects that the thread was killed (all other similar functions in MyISAM/Aria have
slightly different code and do not have this problem).
- Also fixed assignment in DBUG_ASSERT
- this is 2nd variant of the fix:
= make .result file smaller
= run KILLable statements in a separate connection, otherwise we could end up trying to
KILL the final "DROP TABLE" statement
Fixed README with link to source
Merged InnoDB change to XtraDB
README:
Added information of where to find MariaDB code
storage/archive/ha_archive.cc:
Removed memset() of rows, a MariaDB checksum's doesn't touch not used data.
common icp callback in the handler.cc.
It can also increment status counters, without making the engine
dependent on the exact THD layout (that is different in embedded).
ON 64 BIT MACHINES
PROBLEM: When sorting index during repair of
myisam tables, due to improper casting
of buffer size variables value of myisam_
sort_buffer_size is not set greater than
4GB.
SOLUTION: Proper casting of buffer size variable.
myisam_buffer_size changed to unsigned
long long to handle size > 4GB on
linux as well as windows.
ON 64 BIT MACHINES
PROBLEM: When sorting index during repair of
myisam tables, due to improper casting
of buffer size variables value of myisam_
sort_buffer_size is not set greater than
4GB.
SOLUTION: Proper casting of buffer size variable.
myisam_buffer_size changed to unsigned
long long to handle size > 4GB on
linux as well as windows.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
Don't log updates to performance schema in replication log.
Ensure that we don't call ha_update after ha_index_or_rnd_end() is called on slave.
.bzrignore:
Ignore some generated files
mysql-test/include/show_slave_status.inc:
Ensure that ./ is removed from file names
mysql-test/suite/perfschema/r/binlog_mix.result:
Updated results
mysql-test/suite/perfschema/r/binlog_row.result:
Updated results
mysql-test/suite/perfschema/r/binlog_stmt.result:
Updated results
mysql-test/suite/rpl/r/rpl_cant_read_event_incident.result:
Updated results
mysql-test/suite/rpl/r/rpl_performance_schema.result:
Ensure that we don't crash slave when we update performance schema
mysql-test/suite/rpl/t/rpl_performance_schema.test:
Ensure that we don't crash slave when we update performance schema
sql/log_event.cc:
Ensure that we don't call ha_update after ha_index_or_rnd_end() is called.
Remove old code that is not needed anymore (like restarting read loop over all rows if no matcing row is found)
Simplify code
sql/log_event_old.cc:
Ensure that we don't call ha_update after ha_index_or_rnd_end() is called.
storage/myisam/ha_myisam.cc:
More DBUG_PRINT
storage/perfschema/ha_perfschema.h:
Don't log updates to performance schema in replication log.
WITH MYISAM_USE_MMAP ENABLED
MySQL server can crash due to segmentation fault when
started with myisam_use_mmap.
The reason behind this being, while making a request to
unmap (munmap) the previously mapped memory (mmap), the
size passed was 7 bytes larger than the size requested at
the time of mapping. This can eventually unmap the adjacent
memory mapped block, belonging to some other memory-map pool.
Hence the subsequent call to mmap can map a region which was
still a valid memory mapped area.
Fixed by removing the extra 7-byte margin which was erroneously
added to the size, used for unmappping.
storage/myisam/mi_close.c:
Bug#11756764 48726: MYSQLD KEEPS CRASHING WITH SIGSEGV
WITH MYISAM_USE_MMAP ENABLED
Added a condition to call _mi_unmap_file() in case
of compressed records. mi_munmap_file() is called
otherwise.
storage/myisam/mi_packrec.c:
Bug#11756764 48726: MYSQLD KEEPS CRASHING WITH SIGSEGV
WITH MYISAM_USE_MMAP ENABLED
mi_dynmap_file() function, after successfully executing
mmap, stores the total size in info->s->mapped_length
variable. Now, if mi_dynmap_file() is invoked with a size
with an extra 7-byte margin (MEMMAP_EXTRA_MARGIN),
the margin will eventually also get stored in mapped_length.
So, un-mapping function can simply use the value stored in
mapped_length in order to unmap the previously mapped
region.