stat structure (from <sys/stat.h>) is conditionally defined
to have different layout and size depending on the defined macros.
The correct macro is defined in my_config.h, which means it MUST be
included first (or, at least before <features.h> - so, practically,
before including any system headers).
FROM A FUNCTION
Scenario:
In a stored procedure, CREATE TABLE statement is not allowed. But an
exception is provided for CREATE TEMPORARY TABLE. We can create a temporary
table in a stored procedure.
Let there be two stored functions f1 and f2 and two stored procedures p1 and
p2. Their properties are as follows:
. stored function f1() calls stored procedure p1().
. stored function f2() calls stored procedure p2().
. stored procedure p1() creates temporary table t1.
. stored procedure p2() does DML on t1.
Consider the following situation:
1. Autocommit mode is on.
2. select f1()
3. select f2()
Step 2: In this step, t1 would be created via p1(). A table level transaction
lock would have been taken. The ::external_lock() would not have been called
on this table. At the end of step 2, because of autocommit mode on, this table
level lock will be released.
Step 3: When we execute DML on table t1 via p2() we have two problems:
Problem 1:
The function ha_innobase::external_lock() would have been called but since
it is a select query no table level locks would have been taken. Hence the
following assert will fail:
ut_ad(lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
Solution:
The solution would be to identify this situation and take a table level lock
and use the proper lock type prebuilt->select_lock_type = LOCK_X for DML
operations.
Problem 2:
Another problem is that in step 3, ha_innobase::open() is never called on
the table t1.
Solution:
The solution would be to identify this situation and call re-init the handler
of table t1.
rb#6429 approved by Krunal.
FROM A FUNCTION
Scenario:
In a stored procedure, CREATE TABLE statement is not allowed. But an
exception is provided for CREATE TEMPORARY TABLE. We can create a temporary
table in a stored procedure.
Let there be two stored functions f1 and f2 and two stored procedures p1 and
p2. Their properties are as follows:
. stored function f1() calls stored procedure p1().
. stored function f2() calls stored procedure p2().
. stored procedure p1() creates temporary table t1.
. stored procedure p2() does DML on t1.
Consider the following situation:
1. Autocommit mode is on.
2. select f1()
3. select f2()
Step 2: In this step, t1 would be created via p1(). A table level transaction
lock would have been taken. The ::external_lock() would not have been called
on this table. At the end of step 2, because of autocommit mode on, this table
level lock will be released.
Step 3: When we execute DML on table t1 via p2() we have two problems:
Problem 1:
The function ha_innobase::external_lock() would have been called but since
it is a select query no table level locks would have been taken. Hence the
following assert will fail:
ut_ad(lock_table_has(thr_get_trx(thr), index->table, LOCK_IX));
Solution:
The solution would be to identify this situation and take a table level lock
and use the proper lock type prebuilt->select_lock_type = LOCK_X for DML
operations.
Problem 2:
Another problem is that in step 3, ha_innobase::open() is never called on
the table t1.
Solution:
The solution would be to identify this situation and call re-init the handler
of table t1.
rb#6429 approved by Krunal.
Problem:
Creation of a table fails when innodb_strict_mode is enabled, but the same
table is created without any warning when innodb_strict_mode is enabled.
Solution:
If creation of a table fails with an error when innodb_strict_mode is
enabled, it must issue a warning when innodb_strict_mode is disabled.
rb#6723 approved by Krunal.
Problem:
Creation of a table fails when innodb_strict_mode is enabled, but the same
table is created without any warning when innodb_strict_mode is enabled.
Solution:
If creation of a table fails with an error when innodb_strict_mode is
enabled, it must issue a warning when innodb_strict_mode is disabled.
rb#6723 approved by Krunal.
CHECK.
Analysis:
----------
Issue here is, while creating or altering the InnoDB table,
if the foreign key defined on the table references a parent
table on which the user has no access privileges then the
table is created without reporting any error.
Currently the privilege level REFERENCES_ACL is unused
and is not used for access evaluation while creating the
table with a foreign key constraint or adding the foreign
key constraint to a table. But when no privileges are granted
to user then also access evaluation on parent table is ignored.
Fix:
---------
For DMLs, irrelevant of the fact, support does not want any
changes to avoid permission checks on every operation.
So, as a fix, added a function "check_fk_parent_table_access"
to check whether any of the SELECT_ACL, INSERT_ACL, UDPATE_ACL,
DELETE_ACL or REFERENCE_ACL privileges are granted for user
at table level. If none of them is granted then error is reported.
This function is called during the table creation and alter
operation.
CHECK.
Analysis:
----------
Issue here is, while creating or altering the InnoDB table,
if the foreign key defined on the table references a parent
table on which the user has no access privileges then the
table is created without reporting any error.
Currently the privilege level REFERENCES_ACL is unused
and is not used for access evaluation while creating the
table with a foreign key constraint or adding the foreign
key constraint to a table. But when no privileges are granted
to user then also access evaluation on parent table is ignored.
Fix:
---------
For DMLs, irrelevant of the fact, support does not want any
changes to avoid permission checks on every operation.
So, as a fix, added a function "check_fk_parent_table_access"
to check whether any of the SELECT_ACL, INSERT_ACL, UDPATE_ACL,
DELETE_ACL or REFERENCE_ACL privileges are granted for user
at table level. If none of them is granted then error is reported.
This function is called during the table creation and alter
operation.
* handler::get_auto_increment() was not expecting any errors from the storage engine.
That was wrong, errors could happen.
* ha_partition::get_auto_increment() was doing index lookups in partition under a mutex.
This was redundant (engine transaction isolation was covering that anyway)
and harmful (causing deadlocks).
MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
This problem affects only debug builds on PPC64.
There are at least two race conditions around
rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
- rw_lock_debug_waiters was loaded/stored without setting
appropriate locks/memory barriers.
- there is a gap between calls to os_event_reset() and
os_event_wait() and in such case we're supposed to pass
return value of the former to the latter.
Fixed by replacing self-cooked spinlocks with system mutexes.
These days system mutexes offer much better performance. OTOH
performance is not that critical for debug builds.
MDEV-6450 - MariaDB crash on Power8 when built with advance tool
chain
InnoDB mutex_exit() function calls __sync_test_and_set() to release
the lock. According to manual this function is supposed to create
"acquire" memory barrier whereas in fact we need "release" memory
barrier at mutex_exit().
The problem isn't repeatable with gcc because it creates
"acquire-release" memory barrier for __sync_test_and_set().
ATC creates just "acquire" barrier.
Fixed by creating proper barrier at mutex_exit() by using
__sync_lock_release() instead of __sync_test_and_set().
Problem:
We maintain two rb trees in each dict_table_t. The foreign_rbt must be in
sync with foreign_list. The referenced_rbt must be in sync with
referenced_list. There is one function which checks this consistency and it
failed, resulting in an assert failure.
The root cause of the problem was identified that the search order was
lost in the referenced_rbt. This is because while renaming the table,
we didn't not refresh this referenced_rbt.
Solution:
When a foreign key is renamed, we must delete and re-insert into both
foreign_rbt and referenced_rbt.
rb#6412 approved by Jimmy.
Problem:
We maintain two rb trees in each dict_table_t. The foreign_rbt must be in
sync with foreign_list. The referenced_rbt must be in sync with
referenced_list. There is one function which checks this consistency and it
failed, resulting in an assert failure.
The root cause of the problem was identified that the search order was
lost in the referenced_rbt. This is because while renaming the table,
we didn't not refresh this referenced_rbt.
Solution:
When a foreign key is renamed, we must delete and re-insert into both
foreign_rbt and referenced_rbt.
rb#6412 approved by Jimmy.
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.
- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
if log mutex is locked.
All changes done both for InnoDB and Xtradb
~40% bugfixed(*) applied
~40$ bugfixed reverted (incorrect or we're not buggy)
~20% bugfixed applied, despite us being not buggy
(*) only changes in the server code, e.g. not cmakefiles
This bug only happens in case of paritioned tables used in LOCK TABLES and implicit_commit() was called
(as part of trying to execute a CREATE TABLE withing lock tables)
The problem was that Aria could not move the tables from one transaction to the new one, as thd->open_tables contained
a partitioned tables and not an Aria table.
Fix:
- Store a list of all open tables that are part of a share in share->open_tables
- In maria::implict_commit() use transaction->used_tables & share->open_tables to find out which tables
was part of the current transaction instead of using thd->open_tables, which may contain partitioned tables.
mysql-test/suite/maria/maria_partition.result:
Added test case
mysql-test/suite/maria/maria_partition.test:
Added test case
storage/maria/ha_maria.cc:
Use trn->used tables and share->open_tables to find out which tables was part of the current transaction instead of using thd->open_tables.
storage/maria/ma_close.c:
Remove closed table from share->open_list
storage/maria/ma_open.c:
Add table to share->open_list
storage/maria/ma_state.c:
Added comment
storage/maria/maria_def.h:
Added share->open_list, a list of all tables that is using this share.
InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex)
and
InnoDB: Assertion failure in thread 2868898624 in file buf0lru.c line 1077
InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex)
Analysis: Function buf_LRU_free_block might release LRU_list_mutex on
same cases to avoid mutex order problems, we need to take it back
before accessing list.
IN RECOVERY
During redo log processing, the data dictionary is not available. We should
check it in dict_find_table_by_space() to prevent SEGV error.
rb#5678, approved by Jimmy.
IN RECOVERY
During redo log processing, the data dictionary is not available. We should
check it in dict_find_table_by_space() to prevent SEGV error.
rb#5678, approved by Jimmy.
Problem:
When a unique secondary index is scanned for duplicate checking, gap locks
were not taken if the transaction had isolation level <= READ COMMITTED.
This change was done while fixing Bug #16133801 UNEXPLAINABLE INNODB UNIQUE
INDEX LOCKS ON DELETE + INSERT WITH SAME VALUES (rb#2035). Because of this
the duplicate check logic failed, and resulted in duplicate values in unique
secondary index.
Solution:
When a unique secondary index is scanned for duplicate checking, gap locks
must be taken irrespective of the transaction isolation level. This is
achieved by reverting rb#2035.
rb#5910 approved by Jimmy
Problem:
When a unique secondary index is scanned for duplicate checking, gap locks
were not taken if the transaction had isolation level <= READ COMMITTED.
This change was done while fixing Bug #16133801 UNEXPLAINABLE INNODB UNIQUE
INDEX LOCKS ON DELETE + INSERT WITH SAME VALUES (rb#2035). Because of this
the duplicate check logic failed, and resulted in duplicate values in unique
secondary index.
Solution:
When a unique secondary index is scanned for duplicate checking, gap locks
must be taken irrespective of the transaction isolation level. This is
achieved by reverting rb#2035.
rb#5910 approved by Jimmy
WITH CERTAIN MAX_HEAP_TABLE_SIZE VALUES
Description:
When the system variable 'max_heap_table_size'
is set to 20GB, the server crashes on creation of a
temporary tables or tables using MEMORY storage engine.
Analysis:
The variable 'max_record' determines the amount heap
allocated for the records of the table. This value
is determined using the 'max_heap_table_size' variable.
'records_in_block' in turn uses the max_records to
determine the number of records per block.
When the 'max_heap_table_size' is set to 20GB, then
the 'records_in_block' is calculated to a value of
2^28.
The size of the block determined by multiplying the
'records_in_block' and 'recbuffer' results in overflow
and hence the value becomes zero. As a result, zero bytes
of the heap is allocated for the table. This will
result in a server crash when the table is accessed.
Fix:
The variables 'records_in_block' and 'recbuffer' are
typecasted to 'unsigned long' while calculating the
size of the block.
WITH CERTAIN MAX_HEAP_TABLE_SIZE VALUES
Description:
When the system variable 'max_heap_table_size'
is set to 20GB, the server crashes on creation of a
temporary tables or tables using MEMORY storage engine.
Analysis:
The variable 'max_record' determines the amount heap
allocated for the records of the table. This value
is determined using the 'max_heap_table_size' variable.
'records_in_block' in turn uses the max_records to
determine the number of records per block.
When the 'max_heap_table_size' is set to 20GB, then
the 'records_in_block' is calculated to a value of
2^28.
The size of the block determined by multiplying the
'records_in_block' and 'recbuffer' results in overflow
and hence the value becomes zero. As a result, zero bytes
of the heap is allocated for the table. This will
result in a server crash when the table is accessed.
Fix:
The variables 'records_in_block' and 'recbuffer' are
typecasted to 'unsigned long' while calculating the
size of the block.
than with InnoDB plugin
Fix: os0file.h in XtraDB had OS_AIO_N_PENDING_IOS_PER_THREAD 256
when on InnoDB it is OS_AIO_N_PENDING_IOS_PER_THREAD 32. Changed
XtraDB also to use 32.
Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
MDEV-6281 Typo in mysql_install_db scripts
and collateral changes:
* remove mysql_tableinfo.1 and references to it (there's no mysql_tableinfo)
* for debian: create manpages for mysqlrepair, mysqlanalyze, mysqloptimize
(as symlinks for mysqlcheck.1, just as executables are symlinks to mysqlcheck)
* remove mysqlmanager.8 and references to it
* correct "very long line" error in mysqladmin.1
* simplify and fix table formatting in mysqlbinlog.1 and mysqldump.1
* fix a typo in the help text in mysql_install_db
* aria_chk: say "for Linux on x86_64", like other tools do
(not "for Linux at x86_64")
* add simple manpages for aria_* utilities
Description: Using the temporary file vulnerability an
attacker can create a file with arbitrary content at a
location of his choice. This can be used to create the
file /var/lib/mysql/my.cnf, which will be read as a
configuration file by MySQL, because it is located in the
home directory of the mysql user. With this configuration
file, the attacker can specify his own plugin_dir variable,
which then allows him to load arbitrary code via
"INSTALL PLUGIN...".
Analysis: While creating the ".TMD" file we are not checking
if the file is already exits or not in mi_repair() function.
And we are truncating if the ".TMD" file exits and going ahead
This is creating the security breach.
Fix: We need to use O_EXCL flag along with O_RDWR and O_TRUNC
which will make sure if any user creates ".TMD" file, will
fails the repair table with "cannot create ".TMD" file error".
Actually we are initialing "param.tmpfile_createflag" member
with O_RDWR | O_TRUNC | O_EXCL in myisamchk_init(). And we
are modifying it in ha_myisam::repair() to O_RDWR | O_TRUNC.
So, we need to remove the line which is modifying the
"param.tmpfile_createflag".
Description: Using the temporary file vulnerability an
attacker can create a file with arbitrary content at a
location of his choice. This can be used to create the
file /var/lib/mysql/my.cnf, which will be read as a
configuration file by MySQL, because it is located in the
home directory of the mysql user. With this configuration
file, the attacker can specify his own plugin_dir variable,
which then allows him to load arbitrary code via
"INSTALL PLUGIN...".
Analysis: While creating the ".TMD" file we are not checking
if the file is already exits or not in mi_repair() function.
And we are truncating if the ".TMD" file exits and going ahead
This is creating the security breach.
Fix: We need to use O_EXCL flag along with O_RDWR and O_TRUNC
which will make sure if any user creates ".TMD" file, will
fails the repair table with "cannot create ".TMD" file error".
Actually we are initialing "param.tmpfile_createflag" member
with O_RDWR | O_TRUNC | O_EXCL in myisamchk_init(). And we
are modifying it in ha_myisam::repair() to O_RDWR | O_TRUNC.
So, we need to remove the line which is modifying the
"param.tmpfile_createflag".
Analysis: Can't disable the error message because you may get database
started with incorrect log file size.
Fix: Thus only improve the error message to give more information
to users.
- Fixed bug that we where using wrong checksum algorithm when using VARCHAR with fixed lenth rows
- Ensure in myisampack that HA_OPTION_NULL_FIELDS is set for tables with null fields.
mysql-test/r/myisampack.result:
Updated results
mysql-test/t/myisampack.test:
Added more tests
storage/myisam/mi_open.c:
Use correct checksum algorithm when we have VARCHAR fields with fixed length records
storage/myisam/myisampack.c:
Ensure HA_OPTION_NULL_FIELDS is set for tables with null fields.
(This was not set by default for not compressed tables without checksums to keep MyISAM tables compatible with MySQL)
on select from I_S.INNODB_CHANGED_PAGES
Analysis: limit_lsn_range_from_condition() incorrectly parses
start_lsn and/or end_lsn conditions.
Fix from SergeyP. Added some test cases.
"Table upgrade required..."
The row format is only different for the case where we have a very old MyISAM table with varchar fields, null fields and created with CHECKSUM=1
The table is usable, except that CHECKSUM TABLE will give a wrong result and CHECK TABLE will warn about this.
I added a test for this to warn when a table needs to be upgraded, but forgot to add a test that this was only relevant for tables with CHECKSUM=1
This is now fixed.
storage/myisam/ha_myisam.cc:
Fixed wrong test.
archive table which is using an auto increment column, the
server hangs. In order to recover the mysqld process, it
has to be terminated abnormally using SIGKILL. The problem
is observed in mysql-5.5.
Bug #18065452 "PREPARING" STATE HOGS CPU WITH ARCHIVE
+ SUBQUERY
Analysis: This happens because the server is trapped inside
an infinite loop in the function,
"subselect_indexsubquery_engine::exec()". This function
resolves the correlated suquery by doing an index lookup
for the appropriate engine. In case of archive engine,
after reaching the end of records, "table->status" is not
set to STATUS_NOT_FOUND. As a result the loop is not
terminated.
Fix: The "table->status" is set to STATUS_NOT_FOUND when
the end of records is reached.
archive table which is using an auto increment column, the
server hangs. In order to recover the mysqld process, it
has to be terminated abnormally using SIGKILL. The problem
is observed in mysql-5.5.
Bug #18065452 "PREPARING" STATE HOGS CPU WITH ARCHIVE
+ SUBQUERY
Analysis: This happens because the server is trapped inside
an infinite loop in the function,
"subselect_indexsubquery_engine::exec()". This function
resolves the correlated suquery by doing an index lookup
for the appropriate engine. In case of archive engine,
after reaching the end of records, "table->status" is not
set to STATUS_NOT_FOUND. As a result the loop is not
terminated.
Fix: The "table->status" is set to STATUS_NOT_FOUND when
the end of records is reached.
THE PERFORMANCE UNDER HEAVY INSERT
Problem:
There are three memset call to allocate memory for system fields
in each insert.
Solution:
Instead of calling it in 3 times, we can combine it into
one memset call. It will reduce the CPU usage under heavy insert.
Approved by Marko rb-4916