delayed_job process silently quits - ruby-on-rails

I wish I had more information to put here, but i'm kind of just casting out the nets and hoping someone has some ideas on what I can try or what direction to look. Basically I have a rails app that uses delayed jobs. It offloads a process that takes about 10 or 15 minutes to a background task. It was working fine up until yesterday. Now every time I log onto the server, I find that there are no delayed job processes running. I've restarted, stopped and started, etc. a dozen times and am getting nowhere. The second it tries to process the first item in the queue, the process gets killed, and nothing gets logged to the log file.
I tried running it like this:
RAILS_ENV=production script/delayed_job run
Instead of the normal daemon:
RAILS_ENV=production script/delayed_job start
and that did not give me anymore info. Here is the output:
delayed_job: process with pid 4880 started.
Killed
It runs for probably 10 seconds before it just kills. I have no idea where to start on this. I've tried a number of things like downgrading daemon gem to 1.0.10 like suggested in other posts.
Any ideas would be amazing.

The solution to this in case anyone else comes across is, was that it was simply running out of memory and the OS was killing it.
I ran the jobs a couple times, and watched top while waiting. I saw the memory usage for that pid slowly climb till the process was killed and all memory released.
Hope that might help someone.

Related

Ruby delayed_job gem how to stop process

I am currently using the delayed_job_active_record gem to run some scheduled tasks on a long run basis. The processes run in the background on a separate worker dyno on heroku and rarely go wrong but in some cases I would like to be able to stop a process mid run. I have been running the processes locally and because of the setup I have, the scheduled tasks only kick off the process which is essentially a very long loop.
Using
bin/delayed_job stop
only stops the jobs but since the process has started, it doesn't top this.
Because of this, I can't seem to stop the process once it has got going without restarting the entire dyno. This seems a bit excessive but is my only option at the moment.
Any help is greatly appreciated
I don't think there's anyway to interrupt it without essentially killing the process like you are doing. I would usually delete the job record in the database and then terminate the worker running it so it doesn't just retry the job (if you've got retries enabled for that job).
Another option... Since you know it's long running and, I imagine, has multiple steps... Modularize the operation and/or add periodic checks for a 'cancelled' flag you put somewhere in the model(s). If you detect the cancelled request, you can then give up and do any cleanup needed. This is probably preferred anyway so you can manage what happens when it's aborted more explicitly.

running fork in delayed job

we use delayed job in our web application and we need multiple delayed jobs workers happening parallelly, but we don't know how many will be needed.
solution i currently try is running one worker and calling fork/Process.detach inside the needed task.
i was trying to run fork directly in rails application previously but it didnt work too good with passenger.
this solution seems to work well. could there be any caveats in production?
one issue which happened to me today and which anyone trying that should take care of was following:
i noticed that worker is down so i started it. something i didnt think about was that there were 70 jobs waiting in queue. and since processes are forked, they pretty much killed our server for around half an hour by starting all almost immediately and eating all memory in process.. :]
so ensuring that there is god watching over the worker is important.
also worker seems to die often but not sure yet if its connected with forking.

Long-running Sidekiq jobs keep dying

I'm using the sidekiq gem to process background jobs in Rails. For some reason, the job just hang after a while -- the process either becomes unresponsive, showing up on top but not much else, or mysteriously vanishes, without errors (nothing is reported to airbrake.io).
Has anyone had experience with this?
Use the TTIN signal to get a backtrace of all threads in the process so you can figure out where the workers are stuck.
https://github.com/mperham/sidekiq/wiki/Signals
I've experienced this, and haven't found a solution/root cause.
I couldn't resolve this cleanly, but came up with a hack.
I configured God to monitor my Sidekiq processes, and to restart them if a file changed.
I then setup a Cron Job that ran every 5 minutes that checked all the current Sidekiq workers for a queue. If a certain % of the workers had a start time of <= 5 minutes in the past, it meant those workers hung for some reason. If that happened, I touched a file, which made God restart Sidekiq. For me, 5 minutes was ideal but it depends on how long your jobs typically run.
This is the only way I could resolve hanging Sidekiq jobs without manually checking on them every hour, and restarting it myself.

daemon process killed automatically

Recently I run a below command to start daemon process which runs once in a three days.
RAILS_ENV=production lib/daemons/mailer_ctl start, which was working but when I came back after a week and found that process was killed.
Is this normal or not?
Thanks in advance.
Nope. As far as I understand it, this daemon is supposed to run until you kill it. The idea is for it to work at regular intevals, right? So the daemon is supposed to wake up, do its work, then go back to sleep until needed. So if it was killed, that's not normal.
The question is why was it killed and what can you do about it. The first question is one of the toughest ones to answer when debugging detached processes. Unless your daemon leaves some clues in the logs, you may have trouble finding out when and why it terminated. If you look through your logs (and if you're lucky) there may be some clues -- I'd start right around the time when you suspect it last ran and look at your Rails production.log, any log file the daemon may create but also at the system logs.
Let's assume for a moment that you never can figure out what happened to this daemon. What to do about it becomes an interesting question. The first step is: Log as much as you can without making the logs too bulky or impacting performance. At a minimum log wakeup, processing, and completion events, as well as trapping signals and logging them. Best to log to somewhere other than the Rails production.log. You may also want to run the daemon at a shorter interval than 3 days until you are certain it is stable.
Look into using a process monitoring tool like monit (http://mmonit.com/monit/) or god (http://god.rubyforge.org/). These tools can "watch" the status of daemons and if they are not running can automatically start them. You still need to figure out why they are being killed, but at least you have some safety net.

Workling processes multiplying uncontrolably

We have a rails app running on passenger and we background process some tasks using a combination of RabbitMQ and Workling. The workling's worker process is started using the script/workling_client command. There is always only one worker process started, and the script/workling_client has a :multiple => false options, thus allowing only one instance. But sometimes, under mysterious circumstances which I haven't been able to track down, more worklings spawn up. If I let the system run for some time, more and more worklings appear. I'm not sure if these rogue worklings cause any problems, but it is still unsettling not to know why is it happening. We are using Monit to monitor the workling process. So if it dies, it will spawn it up again. But this still does not explain how come there are suddenly more than one of them.
So my question is: does anyone know what can be cause of this and how to make it stop? Is it possible that workling sometimes dies by itself, without deleting it's pid file? Could there be something wrong with the Daemons gem workling_client is build upon?
Not an answer - I have the same problems running RabbitMQ + Workling.
I'm using God to monitor the single workling process as well (:multiple => false)...
I found that the multiple worklings were eating up huge amounts of memory & causing serious resource usage, so it's important that I find a solution for this.
You might find this message thread helpful: http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/ed8edd0368066292/5b17d91cc85c3ada?show_docid=5b17d91cc85c3ada&pli=1

Resources