Bug#11751149 - TRYING TO START MYSQL WHILE ANOTHER INSTANCE

IS STARTING: CONFUSING ERROR

DESCRIPTION
===========
When mysql server processes transactions but has not yet
committed and shuts down abnormally (due to crash, external
killing etc.), a recovery is due from Storage engine side
which takes place the next time mysql server (either
through mysqld or mysqld_safe) is run.

While the 1st server is in mid of recovery, if another
instance of mysqld_safe is made to run, it may result into
2nd instance killing the 1st one after a moment.

ANALYSIS
========
In the "while true" loop, we've a check (which is done
after the server stops) for the existence of pid file to
enquire if it was a normal shutdown or not. If the file is
absent, it means that the graceful exit of server had
removed this file.

However if the file is present, the scripts makes a plain
assumption that this file is leftover of the "current"
server. It misses to consider that it could be a valid pid
file belonging to another running mysql server.

We need to add more checks in the latter case. The script
should extract the PID from this existing file and check if
its running or not. If yes, it means an older instance of
mysql server is running and hence the script should abort.

FIX
===
Checking the status of process (alive or not) by adding a
@CHECK_PID@ in such a case. Aborting if its alive. Detailed
logic is as follows:

- The mysqld_safe script would quit at start only as soon
as it finds that there is an active PID i.e. a mysql server
is already running.
- The PID file creation takes place after InnoDb recovery,
which means in rare case (when PID file isn't created yet)
it may happen that more than 1 server can come up but even
in that case others will have to wait till the 1st server
has released the acquired InnoDb lock. In this case all
these servers will either TIMEOUT waiting for InnoDb lock
or after this they would find that the 1st server is
already running (by reading $pid_file) and would abort.
- Our core fix is that we now check the status of mysql
server process (alive or not) after the server stops
running within the loop of "run -> shutdown/kill/abort ->
run ... ", so that only the script who owns the mysql
server would be able to bring it down if required.

NOTE
====
Removed the deletion of pid file and socket file from entry
of the loop, as it may result in 2nd instance deleting
these files created by 1st instance in RACE condition.
Compensated this by deleting these files at end of the loop

Reverted the changes made in patch to Bug#16776528. So
after this patch is pushed, the concept of mysqld_safe.pid
would go altogether. This was required as the script was
deleting other instance's mysqld_safe.pid allowing multiple
mysqld_safe instances to run in parallel. This patch would
fix Bug#16776528 as well as the resources would be guarded
anyway by InnoDb lock + our planned 5.7 patch.
This commit is contained in:
Shishir Jaiswal 2016-12-22 14:56:02 +05:30
parent 1079066b22
commit e00810b934

View file

@ -790,14 +790,23 @@ then
fi
if [ ! -h "$pid_file" ]; then
rm -f "$pid_file"
fi
if test -f "$pid_file"
then
log_error "Fatal error: Can't remove the pid file:
$pid_file
Please remove it manually and start $0 again;
if test -f "$pid_file"; then
log_error "Fatal error: Can't remove the pid file:
$pid_file.
Please remove the file manually and start $0 again;
mysqld daemon not started"
exit 1
exit 1
fi
fi
if [ ! -h "$safe_mysql_unix_port" ]; then
rm -f "$safe_mysql_unix_port"
if test -f "$safe_mysql_unix_port"; then
log_error "Fatal error: Can't remove the socket file:
$safe_mysql_unix_port.
Please remove the file manually and start $0 again;
mysqld daemon not started"
exit 1
fi
fi
fi
@ -841,14 +850,6 @@ have_sleep=1
while true
do
# Some extra safety
if [ ! -h "$safe_mysql_unix_port" ]; then
rm -f "$safe_mysql_unix_port"
fi
if [ ! -h "$pid_file" ]; then
rm -f "$pid_file"
fi
start_time=`date +%M%S`
eval_log_error "$cmd"
@ -884,6 +885,13 @@ do
if test ! -f "$pid_file" # This is removed if normal shutdown
then
break
else # self's mysqld crashed or other's mysqld running
PID=`cat "$pid_file"`
if @CHECK_PID@
then # true when above pid belongs to a running mysqld process
log_error "A mysqld process with pid=$PID is already running. Aborting!!"
exit 1
fi
fi
@ -941,6 +949,12 @@ do
I=`expr $I + 1`
done
fi
if [ ! -h "$pid_file" ]; then
rm -f "$pid_file"
fi
if [ ! -h "$safe_mysql_unix_port" ]; then
rm -f "$safe_mysql_unix_port"
fi
log_notice "mysqld restarted"
done