Monit config update with reload leaving zombie processes - monitoring

We ran into a rare problem here I hope. We are having monit for running background jobs. In AWS Opsworks, we configured chef recipes to run in deploy cycle. Those recipes will update every time the deployment happened. But there is a problem whenever we update the monitrc file dynamically and issue monit reload command.
Issue goes like this (reproducible):
Initially monitrc file is having job configs say J1, J2, J3, J4.
Now update the monitrc file with J1, J2, J3 configs only.
Now run monit reload. This will simply reload the configuration and updates with J1, J2, J3 jobs only.
Here comes the issue. After this reload monit reload is not clearing the J4 job. Now we trigger a deploy. New code will be there in all J1, J2, J3 jobs because all the recipes issues commands like monit restart <J_ID>. But not in J4 because J4 is no more monitored by monit. Leaving the J4 job running background with old source code.
So if anybody faced this problem before, please help me with your solution.

Related

Rails - Old cron job keeps running, can't delete it

So I'm using Rails and I have a few Sidekiq workers, but none are enabled. I'm using the sidekiq-cron gem, which requires you to put files in app/workers/, configure a sidekiq scheduler in config/sidekiq_schedule.yml, and also add a few lines in config/initializers/sidekiq.rb. However, I've commented everything out from sidekiq_schedule.yml and also commented the following lines out from sidekiq.rb:
# Sidekiq scheduler.
# schedule_file = 'config/sidekiq_schedule.yml'
# if File.exists?(schedule_file) && Sidekiq.server?
# Sidekiq::Cron::Job.load_from_hash! YAML.load_file(schedule_file)
# end
However, if I launch Sidekiq, every minute (which is the old schedule), I see this in the prompt:
2018-01-19T02:54:04.156Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: start
2018-01-19T02:54:04.164Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: fail: 0.008 sec
and it fails because it's trying to launch code a job that's not supposed to be launching.
I've went to the rails console prompt (rails -c) and tried to find the job, but nothing's in there:
irb(main):001:0> Sidekiq::Cron::Job.all
=> []
so I'm not quite sure why it's constantly trying to launch a job. If I go to the rails interface on my application, I don't see anything in the queue, nothing being processed, busy, retries, enqueued, nothing.
Any suggestions would be greatly appreciated. I've been trying to hunt this down for like the last hour and have no success. I even removed ALL of the workers from the workers directory, and yet it's still trying to launch one of them.
Because you have already load jobs, I think that those jobs configuration are still in REDIS. Checking this assumption by opening a new terminal tab with redis-cli:
KEYS '*cron*'
If there are those keys on REDIS, clear them will fix your issue.
Since you mentioned a cron job in your title but not in the question, I'm assuming there's a cronjob running the background sidekiq task.
Try running crontab - l in Terminal to see all your cron jobs. If you see something like "* * * * *", that means there's a job that is running every minute.
Then, use crontab - r to clear your cron tab and delete all scheduled tasks.

Clear worker cache in delayed jobs in production

I am using delayed jobs in my rails application. it works fine but there is an issue occurred on production server. I created a class in lib and call its method from controller to generate a csv file through delayed jobs. It was working fine when I ran the delayed jobs on local and production server but then I made some changes to this class for file naming convention and restarted the delayed jobs on local and then on production server. Now when I call that method through delayed job then it works according to latest changes I made to the class and sometimes it uses the old logic of file naming convention.
What could be the issue?
Delayed job has a hidden "feature" which is to ignore any changes to your app, and just use old settings, env-variables, email-templates, etc. You can clear every cache and restart your server, and it still holds onto data which no longer exists anywhere in your app's codebase.
delayed_job - Performs not up to date code?
Also be aware that DJ's "restart" does not always kill and restart all the workers, so you need to hunt them down and kill them all manually with
ps aux | grep delay
See: Rails + Delayed Job => email view template does not get updated
I have not yet found a "clear delayed job cache" function. If it exists, someone please post it here.
In my case, I just spent almost 4 hours trying everything to delete failing delayed_jobs in Heroku. In case you get here trying to kill a zombie delayed_job, but you're in Heroku, this won't work.
You can not do ps aux like you'd do in a regular server, nor you can do rake jobs:clear, and if you check via Rails console, you'll see the jobs there, but not in the Database, so nothing you can do there either.
What I did was placing the app in maintenance mode, made a deployment totally uninstalling delayed_job gem and all its references, and then another deployment reverting that change. That cleared the zombie cache, and that did the trick.
I had a similar issue in Dokku. My solution was to remove the worker=1 entry from my DOKKU_SCALE file (so all it contained was web=1) and also to remove the worker: bundle exec rake jobs:work line from my Procfile.
I pushed that to my production server, reversed the changes above, pushed again and it was fixed.

Rails.root points to the wrong directory in production during a Resque job

I have two jobs that are queued simulataneously and one worker runs them in succession. Both jobs copy some files from the builds/ directory in the root of my Rails project and place them into a temporary folder.
The first job always succeeds, never have a problem - it doesn't matter which job runs first either. The first one will work.
The second one receives this error when trying to copy the files:
No such file or directory - /Users/apps/Sites/my-site/releases/20130829065128/builds/foo
That releases folder is two weeks old and should not still be on the server. It is empty, housing only a public/uploads directory and nothing else. I have killed all of my workers and restarted them multiple times, and have redeployed the Rails app multiple times. When I delete that releases directory, it makes it again.
I don't know what to do at this point. Why would this worker always create/look in this old releases directory? Why would only the second worker do this? I am getting the path by using:
Rails.root.join('builds') - Rails.root is apparently a 2 week old capistrano release? I should also mention this only happens in the production environment. What can I do
?
Rescue is not being restarted (stopped and started) on deployments which is causing old versions of the code to be run. Each worker continues to service the queue resulting in strange errors or behaviors.
Based on the path name it looks like you are using Capistrano for deploying.
Are you using the capistrano-resque gem? If not, you should give that a look.
I had exactly the same problem and here is how I solved it:
In my case the problem was how capistrano is handling the PID-files, which specify which workers currently exist. These files are normally stored in tmp/pids/. You need to tell capistrano NOT to store them in each release folder, but in shared/tmp/pids/. Otherwise resque does not know which workers are currently running, after you make a new deployment. It looks into the new release's pids-folder and finds no file. Therefore it assumes that no workers exist, which need to be shut down. Resque just creates new workers. And all the other workers still exist, but you cannot see them in the Resque-Dashboard. You can only see them, if you check the processes on the server.
Here is what you need to do:
Add the following lines in your deploy.rb (btw, I am using Capistrano 3.5)
append :linked_dirs, ".bundle", "tmp/pids"
set :resque_pid_path, -> { File.join(shared_path, 'tmp', 'pids') }
On the server, run htop in the terminal to start htop and then press T, to see all the processes which are currently running. It is easy to spot all those resque-worker-processes. You can also see the release-folder's name attached to them.
You need to kill all worker-processes by hand. Get out of htop and type the following command to kill all resque-processes (I like to have it completely clean):
sudo kill -9 `ps aux | grep [r]esque | grep -v grep | cut -c 10-16`
Now you can make a new deploy. You also need to start the resque-scheduler again.
I hope that helps.

How does Delayed Job work in Ruby on Rails ?

I am new to this and is little confused about how Delayed Job works ?
I know it creates a table and puts the jobs in the table and then I need to run
rake jobs:work
to start the background process. Now my question is
Does DJ script checks the table every minute and when the time matches job_at time, it runs that job ?
How it is different than cron (whenever gem) if the script is just checking the table every min ?
Thanks
Does DJ script checks the table every minute and when the time matches job_at time, it runs that job ?
When you run rake jobs:work DelayedJob will poll the delayed_jobs table, performing jobs matching the job_at column value if it's been set. This part you're correct about.
How it is different than cron (whenever gem) if the script is just checking the table every min ?
whenever is a gem that helps you configure a crontab. It has nothing directly to do with performing tasks on your server on a periodic basis.
You could setup a cron to run whatever tasks exist in the queue every minute, but leaving a delayed_job daemon running has multiple benefits.
Even if the cron ran every minute, delayed_job's daemon will see and perform any jobs queued within that 1-minute window between cron runs
Every time the cron would run, it will rebuild a new Rails environment in which to perform the jobs. This is a waste of time and resources when the daemon can just sit there immediately ready to perform a newly queued job.
If you want to configure delayed_job through a cron every minute you can add something like this to your crontab
* * * * * RAILS_ENV=production script/delayed_job start --exit-on-complete
Every minute, delayed_job will spin up, perform whatever jobs are ready for it or which it must retry from a previously failed run, and then quit. I don't recommend this though. Setting up a delayed_job as a daemon is the right way to go.
Does DJ script checks the table every minute and when the time matches
job_at time, it runs that job ?
yes. It checks the database every 5 seconds.
How it is different than cron (whenever gem) if the script is just
checking the table every min ?
In the context of background jobs, they are not that different. Their main difference is how they usually run the jobs.
DJ | Crontab
uses additional database | you should either set up a rake task
table but that's it. easier | or a runner which can be called on the
to code compared to crontab | crontab
------------------------------|------------------------------------------
requires you to run a worker | requires you to setup your cron which
that will poll the database | you can easily do using the whenever gem
------------------------------|------------------------------------------
since this uses a table, it | you have to setup some sort of logging so
is easier to debug errors | that you have an idea what caused the error
when they happen |
------------------------------|------------------------------------------
the worker should always be | as long as your crontab is set up properly,
running to perform the job | you should have no issues
------------------------------|------------------------------------------
harder to setup recurring | easy to setup recurring tasks
tasks |
------------------------------|------------------------------------------

Why would my rake tasks running via cron get invoked twice?

I have a rails app with the whenever gem installed to setup cron jobs which invoke various rake tasks. For reasons unbeknownst to me, each rake task gets invoked twice at precisely the same time. So my db backup task backs up the db twice at 4:00am.
Inspecting crontab reveals correct syntax for all of the cron jobs, so I don't think this is an issue with the whenever gem not correctly configuring the cron jobs. Also confusing is that in both staging and production environments and can invoke tasks on the command line and they only run once.
Any thoughts on what would cause this? I'm at a complete loss troubleshooting wise.
The number of cron jobs that run depends on the number of application instances running in the server box. Are you have two instances of rails application running in the same server box?

Resources