Why is monit not restarting the server - monitoring

I have configured monit on a Ubuntu machine with the following configuration:
check process apache with pidfile /var/run/apache2/apache2.pid
start program = "/etc/init.d/apache2 start" with timeout 60 seconds
stop program = "/etc/init.d/apache2 stop"
if cpu > 80% for 5 cycles then restart
if children > 250 then restart
but is not working. The server has become offline on occasions and nothing seemed to have happened.
Any ideas of why it is not restarting?

I don't know what you mean by "server has become offline on occasions". As it can mean that the node where Apache was running was shutdown and it can also mean that http://localhost:80/ was not accessible.
If later was the case then changing the configuration to
check process apache with pidfile /var/run/apache2/apache2.pid
start program = "/etc/init.d/apache2 start" with timeout 60 seconds
stop program = "/etc/init.d/apache2 stop"
if failed host 127.0.0.1 port 80 then restart
if cpu > 80% for 5 cycles then restart
if children > 250 then restart
might work. As your configuration will not restart Apache if its process was running but because of any issue was not accessible at http://localhost:80/

Related

supervisor restart causes zombie uwsgi process

I have a python/Django project (myproject) running on nginx and uwsgi.
I am running uwsgi command via supervisord. This works perfectly, but on restarting supervisord it creates zombie process. what am i doing wrong? What am I overlooking to do this cleanly? any Advise?
Often times supervisor service takes too long. at that point I have found the following in supervisor.log file
INFO waiting for stage2_BB_wsgi, stage3_BB_wsgi, stage4_BB_wsgi to die
Point to Note: I am running multiple staging server in one machine, namely stage2 .. stageN
supervisor.conf file extract
[program:stage2_BB_wsgi]
command=uwsgi --close-on-exec -s /home/black/stage2/shared_locks/uwsgi_bb.sock --touch-reload=/home/black/stage2/shared_locks/reload_uwsgi --listen 10 --chdir /home/black/stage2/myproject/app/ --pp .. -w app.wsgi -C666 -H /home/black/stage2/myproject/venv/
user=black
numprocs=1
stdout_logfile=/home/black/stage2/logs/%(program_name)s.log
stderr_logfile=/home/black/stage2/logs/%(program_name)s.log
autostart=true
autorestart=true
startsecs=10
exitcodes=1
stopwaitsecs=600
killasgroup=true
priority=1000
thanks in advance.
You will want to set your stopsignal to INT or QUIT.
By default supervisord sends out a SIGTERM when restarting a program. This will not kill uwsgi, only reload it and its workers.

Thin processes die without message

I have two Thin servers running for a Rails app. I start them up with bundle exec thin start.
chdir: /[root]/current
environment: production
address: 0.0.0.0
port: 3001
timeout: 30
log: /[root]/log/thin.log
pid: tmp/pids/thin.pid
max_conns: 1024
max_persistent_conns: 100
require: []
wait: 30
threadpool_size: 20
servers: 2
daemonize: true
When I wait a few hours usually one of the two servers is gone (e.g., only see one with htop or with pgrep -lf thin). And even worse, sometimes both of them are gone after 10 hours or so which results in a 500 error via the browser. Furthermore, when I start 3 or 4 servers 2 of the 4 processes die within 1 minute on average.
I don't see error messages in my Rails production.log nor in the thin.[port] log files specified in the app.yml file.
Is there a way to keep the Thin servers running?
Are you sure you can run your server with bundle exec -C app.yml start?
Try bundle exec thin -C app.yml start

Monit logs running process as server not running

I'm trying to monitor my server and just want to restart it when its down. Following is my monit control
check process myserver with pidfile "/home/path/to/myserver.pid"
start program = "/etc/init.d/myserver start"
stop program = "/etc/init.d/myserver stop"
if failed host 127.0.0.1 port 8080 protocol http
then restart
But even if the server is running monit gives error like:
'myserver' process not running
trying to restart 'myserver'
failed to restart myserver.
How do I fix this? Am I making some mistakes?
Also when I try to use 'send' and 'expect' it gives error like
Erro: syntax error 'send'.
You might need to specify UID and GID as monit runs as root.
check process myserver
with pidfile "/home/path/to/myserver.pid"
start program = "/etc/init.d/myserver start"
as uid myserver_uid and gid myserver_gid
stop program = "/etc/init.d/myserver stop"
as uid myserver_uid and gid myserver_gid
if failed host 127.0.0.1 port 8080 protocol http
then restart
To debug, you could try to output to a file and check this file to have more details.
check process myserver
with pidfile "/home/path/to/myserver.pid"
start program = "/etc/init.d/myserver start >> /tmp/myserver.log 2>&1"
as uid myserver_uid and gid myserver_gid
stop program = "/etc/init.d/myserver stop >> /tmp/myserver.log 2>&1"
as uid myserver_uid and gid myserver_gid
if failed host 127.0.0.1 port 8080 protocol http
then restart
For the send and expect, you might not need it for http query as the http protocol is supported.

Unicorn failing to spawn workers on USR2 signal

I'm sending a USR2 signal to the master process in order to achieve zero downtime deploy with unicorn. After the old master is dead, I'm getting the following error:
adding listener failed addr=/path/to/unix_socket (in use)
unicorn-4.3.1/lib/unicorn/socket_helper.rb:140:in `initialize':
Address already in use - /path/to/unix_socket (Errno::EADDRINUSE)
The old master is killed in the before_fork block on the unicorn.rb config file. The process is started via upstart without the daemon (-D) option.
Any Ideia on what's going on?
Well, turns out you have to run in daemonized mode (-D) if you want to be able to do zero downtime deployment. I changed a few things in my upstart script and now it works fine:
setuid username
pre-start exec unicorn_rails -E production -c /path/to/app/config/unicorn.rb -D
post-stop exec kill cat `/path/to/app/tmp/pids/unicorn.pid`
respawn

How to restart individual servers in thin cluster in rails 3.1 app

I have a thin cluster set up to start 3 servers:
/etc/thin/myapp.yml
...
wait: 30
servers: 3
daemonize: true
...
and the I use thin restart -C /etc/thin/myapp.yml to restart. However, I would like to restart each server at a time, to reduce downtime.
Is there a way to restart each server by pid number or location for example?
There is something better for you
try option: --onebyone
you may also add the following line to your config file
onebyone: true
afterwards you able to restart you thin cluster without any downtime.
I know the question has been answered, but I'd like to add the -o option to the mix.
So
thin restart -C /etc/thin/myapp.yml -o 3000
Will only start the server running on port 3000. If let's say you have two other servers running on 3001 and 3002, they'll be left untouched.
-o works with start and stop commands too.

Resources