monit: failed to restart - ruby-on-rails

I am using monit for sidekiq
while I am running the monit log file, it is showing the error.
[EDT Jun 18 09:50:11] error : 'sidekiq_site' process is not running
[EDT Jun 18 09:50:11] info : 'sidekiq_site' trying to restart
[EDT Jun 18 09:50:11] info : 'sidekiq_site' start: /bin/bash
[EDT Jun 18 09:51:41] error : 'sidekiq_site' failed to start
/etc/monit/conf.d/sidekiq.conf
check process sidekiq_site
with pidfile /var/www/project/shared/pids/sidekiq.pid
start program = "bash -c 'cd /var/www/project/current ; RAILS_ENV=production bundle exec sidekiq --index 0 --pidfile /var/www/project/shared/pids/sidekiq.pid --environment production --logfile /var/www/project/shared/log/sidekiq.log --daemon'" as uid root and gid root with timeout 90 seconds
stop program = "bash -c 'if [ -d /var/www/project/current ] && [ -f /var/www/project/shared/pids/sidekiq.pid ] && kill -0 `cat /var/www/project/shared/pids/sidekiq.pid`> /dev/null 2>&1; then cd /var/www/project/current && bundle exec sidekiqctl stop /var/www/project/shared/pids/sidekiq.pid 1 ; else echo 'Sidekiq is not running'; fi'" as uid root and gid root
if totalmem is greater than 200 MB for 2 cycles then restart # eating up memory?
group site_sidekiq
/etc/monit/monitrc
set daemon 30
set logfile /var/log/monit.log
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set eventqueue
basedir /var/lib/monit/events
slots 100
set httpd port 2812
allow admin:""
set httpd port 2812 and
use address xx.xxx.xx.xx
allow xx.xx.xx.xx
check system trrm_server
if loadavg(5min) > 2 for 2 cycles then alert
if memory > 75% for 2 cycles then alert
if cpu(user) > 75% for 2 cycles then alert
include /etc/monit/conf.d/*

When running a start/stop event in monit there is no path variable set, therefore all programs must have absolute paths, even your call to bash.
No environment variables are used by Monit

Related

STDOUT logs not working when using symlink for log file to /proc/1/fd/1 on Kubernetes

I have a cronjob that runs every minute which will redirect output to a log file /var/log/cronjob/cron.log. Since this is running in Kubernetes, I want to redirect the log to STDOUT.
The approach I took was to use a symlink using RUN ln -sf /proc/1/fd/1 /var/log/cronjob/cron.log:
# ls -la /var/log/cronjob/cron.log
lrwxrwxrwx 1 root root 12 Jan 21 19:23 /var/log/cronjob/cron.log -> /proc/1/fd/1
When I run kubectl logs it has no output.
If I (within the container), delete the symlink and create as a normal file, my output as expected appears in the /var/log/cronjob/cron.log file.
# tail -f /var/log/cronjob/cron.log
Running scheduled command: '/usr/bin/php7.3' 'artisan' sync:health_check > '/dev/null' 2>&1
Running scheduled command: ('/usr/bin/php7.3' 'artisan' compute:user_preferences > '/dev/null' 2>&1 ; '/usr/bin/php7.3' 'artisan' schedule:finish "framework/schedule-9019c9dc22ad7439efd038277fe8f370f56958e7") > '/dev/null' 2>&1 &
How can I get the my log via symlink write to STDOUT?
I have tried other things such as:
Use /dev/stdout for the symlink
Tail the /var/log/cronjob/cron.log file within the entrypoint
Edit: More information about files/scripts:
crontab:
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
* * * * * /usr/local/bin/schedule-run.sh
# An empty line is required at the end of this file for a valid cron file
/usr/local/bin/schedule-run.sh:
#!/bin/bash
# Source container environment variables
source /tmp/export
# Run Laravel scheduler
php /var/www/api/artisan schedule:run >> /var/log/cronjob/cron.log 2>&1
Edit #2:
Currently my CMD looks like this which spawns multiple child processes:
CMD export >> /tmp/export && crontab /etc/cron.d/realty-cron && cron && tail -f /var/log/cronjob/cron.log
root#workspace-dev-condos-ca-765dc6686-h8vdl:/var/www/api# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:55 ? 00:00:00 /bin/sh -c export >> /tmp/export && crontab /etc/cron.d/realty-cron && cron && tail -f /var/log/cronjob/cron.log
root 8 1 0 21:55 ? 00:00:00 cron
root 9 1 0 21:55 ? 00:00:00 tail -f /var/log/cronjob/cron.log
root 170 1 0 21:59 ? 00:00:00 ssh-agent -s
root 233 0 0 22:00 pts/0 00:00:00 bash
root 249 1 0 22:00 ? 00:00:00 ssh-agent -s
root 1277 233 0 22:26 pts/0 00:00:00 ps -ef
I'm not sure if that is relevant but through trial and error testing, I noticed that sometimes echo "test1" >> /proc/1/fd/1 or echo "test2" >> /proc/1/fd/2 will output to stdout (kubectl logs) but not both at the same time. I feel like the child processes are related but don't know why.

Monit & Rails sunspot_solr

I've setup monit to monitor my sunspot_solr process, which seems to work at first. If I restart the monit service with sudo service monit restart my sunspot process starts:
ps aux | grep sunspot
root 4086 0.0 0.0 9940 1820 ? Ss 12:41 0:00 bash ./solr start -f -s /ebs/staging/shared/bundle/ruby/2.3.0/gems/sunspot_solr-2.2.4/solr/solr
root 4137 45.1 4.8 1480560 185632 ? Sl 12:41 0:09 java -server -Xss256k -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/ebs/staging/shared/bundle/ruby/2.3.0/gems/sunspot_solr-2.2.4/solr/server/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/ebs/staging/shared/bundle/ruby/2.3.0/gems/sunspot_solr-2.2.4/solr/server -Dsolr.solr.home=/ebs/staging/shared/bundle/ruby/2.3.0/gems/sunspot_solr-2.2.4/solr/solr -Dsolr.install.dir=/ebs/staging/shared/bundle/ruby/2.3.0/gems/sunspot_solr-2.2.4/solr -jar start.jar --module=http
ubuntu 4192 0.0 0.0 10460 936 pts/3 S+ 12:41 0:00 grep --color=auto sunspot
However, I'm also running tail -f /var/logs/monit.log and see this at the same time:
[CST Mar 3 12:42:54] error : 'sunspot_solr' process is not running
[CST Mar 3 12:42:54] info : 'sunspot_solr' trying to restart
[CST Mar 3 12:42:54] info : 'sunspot_solr' start: /usr/bin/sudo
[CST Mar 3 12:43:25] error : 'sunspot_solr' failed to start
Plus, to make sure monit can actually restart the sunspot_solr process, I run sudo kill -9 <the pid> and monit can't restart sunspot_solr:
[CST Mar 3 12:44:25] error : 'sunspot_solr' process is not running
[CST Mar 3 12:44:25] info : 'sunspot_solr' trying to restart
[CST Mar 3 12:44:25] info : 'sunspot_solr' start: /usr/bin/sudo
[CST Mar 3 12:44:55] error : 'sunspot_solr' failed to start
Obviously something is wrong with my monit-solr_sunspot.conf file, but after messing around with it for a few hours now, I'm stumped:
check process sunspot_solr with pidfile /ebs/staging/shared/pids/sunspot-solr.pid
start program = "/usr/bin/sudo -H -u root /bin/bash -l -c 'cd /ebs/staging/releases/20160226191542; bundle exec sunspot-solr start -- -p 8983 -d /ebs/staging/shared/solr/data --pid-dir=/ebs/staging/shared/pids'"
stop program = "/usr/bin/sudo -H -u root /bin/bash -l -c 'cd /ebs/staging/releases/20160226191542; bundle exec sunspot-solr stop -- -p 8983 -d /ebs/staging/shared/solr/data --pid-dir=/ebs/staging/shared/pids'"
I've adapted this monit script to suit my needs: Sample sunspot-solr.monit but am still having no luck!
UPDATE
I've gotten monit to successfully restart sunspot_solr if I kill it, however it still produces the error that it failed to restart in the monit.log file.
I think monit runs as root. You may not want to use sudo both because it prompts for a password and because monit doesn't need it.

Monit failing to restart sidekiq

I'm trying to get monit to restart my sidekiq service on CentOS server. After trying multiple solutions out there, I'm stumped, still failing to start the service.
My sidekiq file from monit.d:
check process sidekiq
with pidfile /var/www/App/tmp/pids/sidekiq.pid
start program = "/bin/bash -l -c 'sudo cd /var/www/App && bundle exec sidekiq --index 0 --pidfile /var/www/App/tmp/pids/sidekiq.pid --environment production --logfile /var/www/App/log/sidekiq.log --daemon'" as uid deploy and gid deploy
stop program = "/bin/bash -l -c 'cd /var/www/App && bundle exec sidekiqctl stop /var/www/App/tmp/pids/sidekiq.pid 10'" as uid deploy and gid deploy
if totalmem is greater than 512 MB for 2 cycles then restart
if 3 restarts within 5 cycles then timeout
If I run start program command manually, it starts the sidekiq fine but the monit doesn't seem to do anything. Just comes up with:
[BST Oct 6 11:51:17] error : 'sidekiq' process is not running
[BST Oct 6 11:51:17] info : 'sidekiq' trying to restart
[BST Oct 6 11:51:17] info : 'sidekiq' start: /bin/bash
[BST Oct 6 11:52:47] error : 'sidekiq' failed to start
So it is including file fine, but somehow doesn't manage to start the service from the script.
What can it be? Some permissions issue of sorts?
You need to update to the latest Monit version (5.14).
Remove your current monit installation and follow these instructions:
https://rtcamp.com/tutorials/monitoring/monit/
Hope it helps!
PS: Found the solution here: https://bitbucket.org/tildeslash/monit/issues/109/failed-to-stop-always-after-60-seconds
according to Debugging monit
I found i need set PATH.
my start program:
/bin/bash -c 'cd /home/vagrant/apps/skylark/current; PATH=/home/vagrant/.rbenv/shims:/home/vagrant/.rbenv/bin:$PATH bundle exec sidekiq -d -e production -C -P /home/vagrant/apps/skylark/shared/tmp/pids/sidekiq.pid -L /home/vagrant/apps/skylark/shared/log/sidekiq.log'
i think the issue is with your user. You need to execute using deploy user.
check process sidekiq
with pidfile /var/www/App/tmp/pids/sidekiq.pid
start program = "/bin/su - deploy -c 'sudo cd /var/www/App && bundle exec sidekiq --index 0 --pidfile /var/www/App/tmp/pids/sidekiq.pid --environment production --logfile /var/www/App/log/sidekiq.log --daemon'" as uid deploy and gid deploy
stop program = "/bin/su - deploy -c 'cd /var/www/App && bundle exec sidekiqctl stop /var/www/App/tmp/pids/sidekiq.pid 10'" as uid deploy and gid deploy
if totalmem is greater than 512 MB for 2 cycles then restart
if 3 restarts within 5 cycles then timeout

monit: Start or stop method not defined -- process sidekiq_site

I am using monit for sidekiq
while I am running the monit log file, it is showing the error.
monit: Start or stop method not defined -- process sidekiq_site
sidekiq.erb
check process sidekiq_site
with pidfile /var/www/project/shared/pids/sidekiq.pid
start program = "if [[ ! -f /var/www/project/shared/pids/sidekiq.pid ]]; then touch /var/www/project/shared/pids/sidekiq.pid; chmod 777 /var/www/project/shared/pids/sidekiq.pid; fi; cd /var/www/project/current ; bundle exec sidekiq --index 0 --pidfile /var/www/project/shared/pids/sidekiq.pid --environment production --logfile /var/www/project/shared/log/sidekiq.log --daemon" with timeout 90 seconds
stop program = "if [ -d /var/www/project/current ] && [ -f /var/www/project/shared/pids/sidekiq.pid ] && kill -0 `cat /var/www/project/shared/pids/sidekiq.pid`> /dev/null 2>&1; then cd /var/www/project/current && bundle exec sidekiqctl stop /var/www/project/shared/pids/sidekiq.pid 1 ; else echo 'Sidekiq is not running'; fi"
if totalmem is greater than 200 MB for 2 cycles then restart # eating up memory?
group site_sidekiq

Managing Resque workers with Monit on RBenv setup

I'm trying to set up Monit to manage Resque workers, but it fails to start saying /home/deployer/.rbenv/shims/bundle: line 4: exec: rbenv: not found
I've checked that it is running commands as deployer user and if I copy and paste the command directly via SSH everything works fine. Below is my Monit configuration. Thanks!
check process resque_worker_1
with pidfile CURRENT_PATH/tmp/pids/resque_worker_1.pid
start program = "/usr/bin/env HOME=/home/deployer RACK_ENV=production PATH=/home/deployer/.rbenv/shims:/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c 'cd CURRENT_PATH; bundle exec rake environment resque:work RAILS_ENV=production QUEUE=high,normal,low VERBOSE=1 PIDFILE=CURRENT_PATH/tmp/pids/resque_worker_1.pid >> CURRENT_PATH/log/resque_worker_.log 2>&1'"
as uid deployer and gid admin
stop program = "/bin/sh -c 'cd CURRENT_PATH && kill -9 $(cat tmp/pids/resque_worker_1.pid) && rm -f tmp/pids/resque_worker_1.pid; exit 0;'"
as uid deployer and gid admin
if totalmem is greater than 300 MB for 10 cycles then restart # eating up memory?
group resque_workers
I'm not sure if this helps, but in my monitrc start line, I have to first su to the user I want to run under. I haven't tried to use the uid and gid flags to know if that works well, so this might be a goose-chase of an answer.
I remember having the same issue as you though... everything worked from the command line, but not when monit would do its thing.
For example, in my monitrc, I am monitoring arsendmail with the following:
# arsendmail_rails3
# daemon that watches and sends mail from the rails app
check process ar_sendmail with pidfile /var/www/rak/log/ar_sendmail.pid
start program "/bin/su - mike && /bin/bash -c 'cd /var/www/rak && ar_sendmail_rails3 -b1000 -d -e production'"
stop program "/bin/ps -ef | /bin/grep ar_sendmail_rails3 | /bin/grep -v grep | /usr/bin/awk '{ /usr/bin/print $2}' | /usr/bin/xargs /bin/kill -9"
I saw that the topic was created in 2012 but I had a similar problem and this thread is top ranked by google.
The problem is that monit launch commands with a restricted env (env -i PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/sh to simulate).
To use monit with rbenv you must specify the correct path before your bundle exec command.
PATH=/home/[USER]/.rbenv/bin:/home/[USER]/.rbenv/shims:$PATH bundle exec ...
Example with unicorn:
check process unicorn_dev with pidfile /home/wizville/app/dev.wizville.fr/shared/pids/unicorn.pid
group dev
start program = "/bin/bash -c 'cd /home/wizville/app/dev.wizville.fr/current && PATH=/home/wizville/.rbenv/bin:/home/wizville/.rbenv/shims:$PATH bundle exec unicorn -c config/unicorn.rb -D'" as uid "wizville"
stop program = "/bin/bash -c 'kill -s QUIT `cat /home/wizville/app/dev.wizville.fr/shared/pids/unicorn.pid`'"
depends on mysql
This worked for me.
check process app_resque_worker with pidfile <%= resque_pid%>
start program = "/usr/bin/env HOME=/home/subcaster RACK_ENV=production PATH=/home/subcaster/.rvm/rubies/ruby-2.0.0-p247/bin/ruby:/usr/local/bin:/usr/local/ruby/bin:/usr/bin:/bin:$PATH /bin/sh -l -c \'cd <%= current_path %>; bundle exec rake environment resque:work RAILS_ENV=production BACKGROUND=yes QUEUE=* PIDFILE=<%= resque_pid %>\'"
stop program = "kill -9 cat <%= resque_pid%> && rm -f <%= resque_pid%>"
if totalmem is greater than 2000 MB for 10 cycles then restart

Resources