PROBLEMS
Description:- Server variable "--lower_case_tables_names"
when set to "0" on windows platform which does not support
case sensitive file operations leads to problems. A warning
message is printed in the error log while starting the
server with "--lower_case_tables_names=0". Also according to
the documentation, seting "lower_case_tables_names" to "0"
on a case-insensitive filesystem might lead to index
corruption.
Analysis:- The problem reported in the bug is:-
Creating an INNODB table 'a' and executing a query, "INSERT
INTO a SELECT a FROM A;" on a server started with
"--lower_case_tables_names=0" and running on a
case-insensitive filesystem leads innodb to flat spin.
Optimizer thinks that "a" and "A" are two different tables
as the variable "lower_case_table_names" is set to "0". As a
result, optimizer comes up with a plan which does not need a
temporary table. If the same table is used in select and
insert, a temporary table is needed. This incorrect
optimizer plan leads to infinite insertions.
Fix:- If the server is started with
"--lower_case_tables_names" set to 0 on a case-insensitive
filesystem, an error, "The server option
'lower_case_table_names'is configured to use case sensitive
table names but the data directory is on a case-insensitive
file system which is an unsupported combination. Please
consider either using a case sensitive file system for your
data directory or switching to a case-insensitive table name
mode.", is printed in the server error log and the server
exits.
In versions 5.5 and 5.6 the MySQL version is not logged until
server is started and ready to accept connections. Exiting
server before this point will not have server version information
in the log. But in 5.7 code, we log a server version information
just after we prepare server_version string and logging is initialized.
For 5.5 and 5.6 code also adding this code to print server version
information.
Test results:
================
5.5
-----
Server version will be logged as below on server startup:
141218 8:45:48 [Note] /home/praveen/WorkDir/mysql_local/bug20052694/mysql/sql/mysqld (mysqld 5.5.42-debug-log) starting as process 19697 ...
5.6
----
Server version will be logged as below on server startup:
2014-12-18 09:08:43 0 [Note] /home/praveen/WorkDir/mysql_local/bug20052694/mysql-5.6/sql/mysqld (mysqld 5.6.23-debug-log) starting as process 18474 ...
Description:
THREAD_CONCURRENCY is deprecated and there is no
deprecation warning message while setting this variable
while starting the server.
Analysis:
This variable is specific to Solaris 8 and earlier systems
and is ignored on all other platforms. But since many
customers, who uses other than Solaris, still has this
variable in their configuration file, it is important to
have a deprecation warning.
Fix:
THREAD_CONCURRENCY deprecation warning message is added.
SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
Analysis
--------
Running 'MYSQLD --help --verbose' as ROOT user without
using '--user' option displays the help contents but
aborts at the end with an exit code '1'.
While starting the server, a validation is performed to
ensure when the server is started as root user, it should
be done using '--user' option. Else we abort. In case
of help, we dump the help contents and abort.
Fix:
---
During the validation, we skip aborting the server incase
we are using the help option under the condition mentioned
above.
NOTE: Test case has not been added since it requires using
'root' user.
Problem: The "--local-install" service does not perform as expected for, at least,
Windows.
Fix: A NULL pointer was dereferenced due to which there was crash.A check was introduced
for NULL string before dereferencing it.No test cases written as it is a bug during
installation.
Since log_throttle is not available in 5.5. Logging of
error message for failure of thread to create new connection
in "create_thread_to_handle_connection" is not backported.
Since, function "my_plugin_log_message" is not available in
5.5 version and since there is incompatibility between
sql_print_XXX function compiled with g++ and alog files with
gcc to use sql_print_error, changes related to audit log
plugin is not backported.
PROBLEM:
When large number of connections are continuously made
with wait_timeout of 600 seconds for some hours, some
connections remain after wait_timeout expired and also
new connections get struck under the configuration and
the scenario reported in bug#16196591.
FIX:
The cause of this bug is the issue identified and fixed in
the BUG#16088658 in 5.6.Also LOCK_thread_count contention
issue fixed in BUG#15921866 in 5.6 need to be in 5.5 as
well. Since the issue is not reproducible, it has been
verified at customer configuration the issue could not
be reproduced after a 48-hour test with a non-debug build
which includes the above two fixes backported.
Problem:If Disk becomes full while writing into the binlog,
then the server instance hangs till someone frees the space.
After user frees up the disk space, mysql server crashes
with an assert (m_status != DA_EMPTY)
Analysis: wait_for_free_space is being called in an
infinite loop i.e., server instance will hang until
someone frees up the space. So there is no need to
set status bit in diagnostic area.
Fix: Replace my_error/my_printf_error with
sql_print_warning() which prints the warning in error log.
Details of BUG#11746142: CALLING MYSQLD WHILE ANOTHER
INSTANCE IS RUNNING, REMOVES PID FILE
Fix: Before removing the pid file, ensure it was created
by the same process, leave it intact otherwise.
Analysis:
When thread cache is enabled, it does not properly initialize
thd->start_utime when a thread is picked from the thread cache.
This breaks the quota management mechanism.
THD::time_out_user_resource_limits() resets
m_user_connect->conn_per_hour to 0 based on thd->start_utime
Fix:
Initialize start_utime when cached thread is reused.
Notes:
Enabled back tests which were disabled because of this issue.
Analysis
---------
my_stat() calls stat() and if the stat() call fails we try to set
the variable my_errno which is actually a thread specific data .
We try to get the address of this thread specific data using
my_pthread_getspecifc(),but for the purge thread we have not defined
any thread specific data so it returns null and when dereferencing
null we get a segmentation fault.
init_available_charsets() seen in the core stack is invoked
through pthread_once() .pthread_once is used for one time
initialization.Since free_charsets() is called before innodb plugin
shutdown ,purge thread calls init_avaliable_charsets() which leads
to the crash.
Fix
---
Call free_charsets() after the innodb plugin shutdown,since purge
threads are still using the charsets.
Analysis
---------
my_stat() calls stat() and if the stat() call fails we try to set
the variable my_errno which is actually a thread specific data .
We try to get the address of this thread specific data using
my_pthread_getspecifc(),but for the purge thread we have not defined
any thread specific data so it returns null and when dereferencing
null we get a segmentation fault.
init_available_charsets() seen in the core stack is invoked
through pthread_once() .pthread_once is used for one time
initialization.Since free_charsets() is called before innodb plugin
shutdown ,purge thread calls init_avaliable_charsets() which leads
to the crash.
Fix
---
Call free_charsets() after the innodb plugin shutdown,since purge
threads are still using the charsets.
When a client connects to a MySQL server, first a THD object is created.
If there are any idle server threads waiting, the THD object is then added
to a list and a server thread is woken up. This thread then retrieves the
THD object from the list and starts executing.
The problem was that this list of THD objects waiting for a server thread,
was not working in a FIFO fashion, but rather LIFO. This is unfair, as it means
that the last THD added (=last client connected) will be assigned a server
thread first.
Note however that for this to be a problem, several clients must be able
to connect and have THD objects constructed before any server threads
manages to be woken up. This is not a very likely scenario.
This patch fixes the problem by changing the THD list to work FIFO
rather than LIFO.
This is the 5.1/5.5 version of the patch.
The use of Thread_iterator did not work on windows (linking problems).
Solution: Change the interface between the thread_pool and the server
to only use simple free functions.
This patch is for 5.5 only (mimicks similar solution in 5.6)
SHOW 2012 INSTEAD OF 2011
* Added a new macro to hold the current year :
COPYRIGHT_NOTICE_CURRENT_YEAR
* Modified ORACLE_WELCOME_COPYRIGHT_NOTICE macro
to take the initial year as parameter and pick
current year from the above mentioned macro.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
TO "LOCALHOST" IF LOCALHOST IS BOTH IPV4/IPV6 ENABLED.
Previous commit comments were wrong. The default value has always been NULL.
The original patch for Bug#12762885 just makes it visible in the logs.
This patch uses "0.0.0.0" string if bind-address is not set.
IF LOCALHOST IS BOTH IPV4/IPV6 ENABLED.
The original patch removed default value of the bind-address option.
So, the default value became NULL. By coincedence NULL resolves
to 0.0.0.0 and ::, and since the server chooses first IPv4-address,
0.0.0.0 is choosen. So, there was no change in the behaviour.
This patch restores default value of the bind-address option to "0.0.0.0".
IF LOCALHOST IS BOTH IPV4/IPV6 ENABLED.
The original patch removed default value of the bind-address option.
So, the default value became NULL. By coincedence NULL resolves
to 0.0.0.0 and ::, and since the server chooses first IPv4-address,
0.0.0.0 is choosen. So, there was no change in the behaviour.
This patch restores default value of the bind-address option to "0.0.0.0".
Problem - The cause of the failure is mainly due to the assert added in
the code as a result of the fix of the BUG-13333431. When we
start the server with the --skip-networking option enabled
we have the mysqld_port explicitly to 0. Since the value of
report_port is set to mysqld_port, the assertion that
(report_port!= 0) fails.
Fix - the fix of the problem is to assert the not zero value of
report_port only in the case the --skip-networking option is not
used to start the mysqld server.
IPV4/IPV6 ENABLED
Analysis:
----------------------
The problem was that if a hostname resolves to more than one IP-address,
the server (5.5) does not start due to an error. In 5.1 the server used to
take some IP-address and start.
It's a regression and should be fixed.
5.5 supports IPv6, while 5.1 does not. However, that should not
prevent the server from start -- if a hostname has both IPv4 and IPv6 addresses,
the server should choose some IPv4-address and start.
It's been decided to prefer IPv4-address to be backward compatible with 5.1.
Another problem was that the 5.6 server did not report proper error message
when the specified hostname could not be resolved. So, the code has been
changed to report proper error message.
Testing
================================
5.5
=============================
invalid hostname (localhos):
=> Following error message reported.
120308 15:52:09 [ERROR] Can't start server: cannot resolve hostname!
120308 15:52:09 [ERROR] Aborting
invalid ip_address:
=> Following error message reported.
120308 15:56:06 [Note] Server hostname (bind-address): '123.123.123.123'; port: 3306
120308 15:56:06 [Note] - '123.123.123.123' resolves to '123.123.123.123';
120308 15:56:06 [Note] Server socket created on IP: '123.123.123.123'.
120308 15:56:06 [ERROR] Can't start server: Bind on TCP/IP port: Cannot assign requested address
Only ipv4 host configured:
=> Following message logged
120308 16:02:50 [Note] Server hostname (bind-address): 'localhost'; port: 3306
120308 16:02:50 [Note] - 'localhost' resolves to '127.0.0.1';
120308 16:02:50 [Note] Server socket created on IP: '127.0.0.1'
Only ipv6 host configured:
=> Following message logged
120308 16:04:03 [Note] Server hostname (bind-address): 'localhost'; port: 3306
120308 16:04:03 [Note] - 'localhost' resolves to '::1';
120308 16:04:03 [Note] Server socket created on IP: '::1'.
ipv4 and ipv6 host configured:
=> Following message logged
120308 16:05:02 [Note] Server hostname (bind-address): 'localhost'; port: 3306
120308 16:05:02 [Note] - 'localhost' resolves to '::1';
120308 16:05:02 [Note] - 'localhost' resolves to '127.0.0.1';
120308 16:05:02 [Note] Server socket created on IP: '127.0.0.1'.
=> Non localhost address
120308 16:08:20 [Note] Server hostname (bind-address): 'mysql_addr'; port: 3306
120308 16:08:20 [Note] - 'mysql_addr' resolves to '10.178.58.216';
120308 16:08:20 [Note] - 'mysql_addr' resolves to 'fe80::120b:a9ff:fe69:59ec';
120308 16:08:20 [Note] Server socket created on IP: '10.178.58.216'.
More than one entry for ipv4 and ipv6 address:
=> Following message logged
120308 16:06:19 [Note] Server hostname (bind-address): 'localhost'; port: 3306
120308 16:06:19 [Note] - 'localhost' resolves to '::1';
120308 16:06:19 [Note] - 'localhost' resolves to '::1';
120308 16:06:19 [Note] - 'localhost' resolves to '127.0.0.1';
120308 16:06:19 [Note] - 'localhost' resolves to '127.0.0.1';
120308 16:06:19 [Note] Server socket created on IP: '127.0.0.1'.
Problem - The default port number shown in SHOW SLAVE HOSTS is always 3306
though the slave is actually listening on a different port number.
This is a problem as the user can not be sure whether this port
value can be trusted and so client trying to read replication
topology can get confused.
Fix - 3306 ceases to be the default value of report-port. Moreover report-port
does not have a static default any longer.
Instead we initialize report-port to 0 as the new default value and change
it based on two checks :
1) If report_port is not set, the slave reports the port number its listening
on. (i.e. if report-port is not set we get the actual value of the slave's
port number).
2) If report-port is set, we show the value report-port is set to, as the slave's
port number.
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
MEMORY LEAK.
Background:
- There are caches for stored functions and stored procedures (SP-cache);
- There is no similar cache for events;
- Triggers are cached together with TABLE objects;
- Those SP-caches are per-session (i.e. specific to each session);
- A stored routine is represented by a sp_head-instance internally;
- SP-cache basically contains sp_head-objects of stored routines, which
have been executed in a session;
- sp_head-object is added into the SP-cache before the corresponding
stored routine is executed;
- SP-cache is flushed in the end of the session.
The problem was that SP-cache might grow without any limit. Although this
was not a pure memory leak (the SP-cache is flushed when session is closed),
this is still a problem, because the user might take much memory by
executing many stored routines.
The patch fixes this problem in the least-intrusive way. A soft limit
(similar to the size of table definition cache) is introduced. To represent
such limit the new runtime configuration parameter 'stored_program_cache'
is introduced. The value of this parameter is stored in the new global
variable stored_program_cache_size that used to control the size of SP-cache
to overflow.
The parameter 'stored_program_cache' limits number of cached routines for
each thread. It has the following min/default/max values given from support:
min = 256, default = 256, max = 512 * 1024.
Also it should be noted that this parameter limits the size of
each cache (for stored procedures and for stored functions) separately.
The SP-cache size is checked after top-level statement is parsed.
If SP-cache size exceeds the limit specified by parameter
'stored_program_cache' then SP-cache is flushed and memory allocated for
cache objects is freed. Such approach allows to flush cache safely
when there are dependencies among stored routines.