daemon process killed automatically - ruby-on-rails

Recently I run a below command to start daemon process which runs once in a three days.
RAILS_ENV=production lib/daemons/mailer_ctl start, which was working but when I came back after a week and found that process was killed.
Is this normal or not?
Thanks in advance.

Nope. As far as I understand it, this daemon is supposed to run until you kill it. The idea is for it to work at regular intevals, right? So the daemon is supposed to wake up, do its work, then go back to sleep until needed. So if it was killed, that's not normal.
The question is why was it killed and what can you do about it. The first question is one of the toughest ones to answer when debugging detached processes. Unless your daemon leaves some clues in the logs, you may have trouble finding out when and why it terminated. If you look through your logs (and if you're lucky) there may be some clues -- I'd start right around the time when you suspect it last ran and look at your Rails production.log, any log file the daemon may create but also at the system logs.
Let's assume for a moment that you never can figure out what happened to this daemon. What to do about it becomes an interesting question. The first step is: Log as much as you can without making the logs too bulky or impacting performance. At a minimum log wakeup, processing, and completion events, as well as trapping signals and logging them. Best to log to somewhere other than the Rails production.log. You may also want to run the daemon at a shorter interval than 3 days until you are certain it is stable.
Look into using a process monitoring tool like monit (http://mmonit.com/monit/) or god (http://god.rubyforge.org/). These tools can "watch" the status of daemons and if they are not running can automatically start them. You still need to figure out why they are being killed, but at least you have some safety net.

Related

identifying current passenger actions / interpreting passenger-status output

We frequently have what we call "apache hangs", where our passenger queue starts to grow like crazy, and the application appears not to be responding. We believe that it due to all passenger processes happening to be serving very long running requests simultaneously.
We are trying to figure out how to identify what the actual current request is that each active passenger process is currently trying to serve. Note that I have read Phusion's frozen process blog, and am still digesting it, experimenting with SIGQUIT and other suggestions there.
My question here is, can passenger-status --show=server tell me what each process is doing, and if so, can anyone here help me understand how to interpret its results.
It looks to me like this command is reporting a ton of stuff that is not actually going on at the moment, in addition to whatever is, and I can't figure out how to suss out what of this really are current requests being handled by active passenger processes.
The output of passenger-status --show=server is too long to past here, so here is a gist, including the base passenger-status output too.

delayed_job process silently quits

I wish I had more information to put here, but i'm kind of just casting out the nets and hoping someone has some ideas on what I can try or what direction to look. Basically I have a rails app that uses delayed jobs. It offloads a process that takes about 10 or 15 minutes to a background task. It was working fine up until yesterday. Now every time I log onto the server, I find that there are no delayed job processes running. I've restarted, stopped and started, etc. a dozen times and am getting nowhere. The second it tries to process the first item in the queue, the process gets killed, and nothing gets logged to the log file.
I tried running it like this:
RAILS_ENV=production script/delayed_job run
Instead of the normal daemon:
RAILS_ENV=production script/delayed_job start
and that did not give me anymore info. Here is the output:
delayed_job: process with pid 4880 started.
Killed
It runs for probably 10 seconds before it just kills. I have no idea where to start on this. I've tried a number of things like downgrading daemon gem to 1.0.10 like suggested in other posts.
Any ideas would be amazing.
The solution to this in case anyone else comes across is, was that it was simply running out of memory and the OS was killing it.
I ran the jobs a couple times, and watched top while waiting. I saw the memory usage for that pid slowly climb till the process was killed and all memory released.
Hope that might help someone.

running fork in delayed job

we use delayed job in our web application and we need multiple delayed jobs workers happening parallelly, but we don't know how many will be needed.
solution i currently try is running one worker and calling fork/Process.detach inside the needed task.
i was trying to run fork directly in rails application previously but it didnt work too good with passenger.
this solution seems to work well. could there be any caveats in production?
one issue which happened to me today and which anyone trying that should take care of was following:
i noticed that worker is down so i started it. something i didnt think about was that there were 70 jobs waiting in queue. and since processes are forked, they pretty much killed our server for around half an hour by starting all almost immediately and eating all memory in process.. :]
so ensuring that there is god watching over the worker is important.
also worker seems to die often but not sure yet if its connected with forking.

How do you go about setting up monitoring for a non-web frontend process?

I have a worker process that is running in a server with no web frontend. what is the best way to set up monitoring fot it? It recently died for 3 days, and i did not know about it
There are several ways to do this. One simple option is to run a cron job that checks timestamps on the process's logs (and, of course, make sure the process logs something routinely).
Roll your own reincarnation job. Have your background process get its PID, then write it to a specific pre-determined location when it starts. Have another process (or perhaps cron) read that PID, then check the symbolic link proc/{pid}/exe. If that link does not exist or is not your process, it needs to be re-started.
With PHP, you can use posix_getpid() to obtain the PID. Then fopen() / fwrite() to write it to a file. use readlink() to read the symbolic link (take care to note FALSE as a return value).
Here's a simple bash-ified example of how the symlink works:
tpost#tpost-desktop:~$ echo $$
13737
tpost#tpost-desktop:~$ readlink /proc/13737/exe
/bin/bash
So, once you know the PID that the process started with, you can check to see if its still alive, and that Linux has not recycled the PID (you only see PID recycling on systems that have been running for a very long time, but it does happen).
This is a very, very cheap operation, feel free to let it do its work every minute, 30 seconds or even shorter intervals.

Workling processes multiplying uncontrolably

We have a rails app running on passenger and we background process some tasks using a combination of RabbitMQ and Workling. The workling's worker process is started using the script/workling_client command. There is always only one worker process started, and the script/workling_client has a :multiple => false options, thus allowing only one instance. But sometimes, under mysterious circumstances which I haven't been able to track down, more worklings spawn up. If I let the system run for some time, more and more worklings appear. I'm not sure if these rogue worklings cause any problems, but it is still unsettling not to know why is it happening. We are using Monit to monitor the workling process. So if it dies, it will spawn it up again. But this still does not explain how come there are suddenly more than one of them.
So my question is: does anyone know what can be cause of this and how to make it stop? Is it possible that workling sometimes dies by itself, without deleting it's pid file? Could there be something wrong with the Daemons gem workling_client is build upon?
Not an answer - I have the same problems running RabbitMQ + Workling.
I'm using God to monitor the single workling process as well (:multiple => false)...
I found that the multiple worklings were eating up huge amounts of memory & causing serious resource usage, so it's important that I find a solution for this.
You might find this message thread helpful: http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/ed8edd0368066292/5b17d91cc85c3ada?show_docid=5b17d91cc85c3ada&pli=1

Resources