ruby on rails, fork, long running background process - ruby-on-rails

I am initiating long running processes from a browser and showing results after it completes. I have defined in my controller :
def runthejob
pid = Process.fork
if pid.nil? then
#Child
output = execute_one_hour_job()
update_output_in_database(output)
# Exit here as this child process shouldn't continue anymore
exit
else
#Parent
Process.detach(pid)
#send response - job started...
end
The request in parent completes correctly. But , in child, there is always a "500 internal server error" . rails reports "Completed 500 Internal Server Error in 227192ms" . I am guessing this happens because the request response cycle of the child process is not completed as there is a "exit" in child. How do I fix this?
Is this the correct way to execute long running processes ? Is there any better way to do it?
When child is running, if I do "ps -e | grep rails" , I see that there are two instances of "rails server" . (I use the command "rails server" to start my rails server.)
ps -e | grep rails
75678 ttys002 /Users/xx/ruby-1.9.3-p194/bin/ruby script/rails server
75696 ttys002 /Users/xx/ruby-1.9.3-p194/bin/ruby script/rails server
Does this mean that there are two servers running ? How are the requests handled now? Wont the request go to the second server?
Thanks for helping me.

Try your code in production and see if the same error comes up. If not, your error may be from the development environment being cleared when the first request completes, whilst your forks still need the environment to exist. I haven't verified this with forks, but that is what happens with threads.
This line in your config/development.rb will retain the environment, but any code changes will require server restarts.
config.cache_classes = true

there are a lot of frameworks to do this in rails: DelayedJob, Resque, Sidekiq etc.
you can find a whole lot of examples on railscasts: http://railscasts.com/

Related

Rails - Old cron job keeps running, can't delete it

So I'm using Rails and I have a few Sidekiq workers, but none are enabled. I'm using the sidekiq-cron gem, which requires you to put files in app/workers/, configure a sidekiq scheduler in config/sidekiq_schedule.yml, and also add a few lines in config/initializers/sidekiq.rb. However, I've commented everything out from sidekiq_schedule.yml and also commented the following lines out from sidekiq.rb:
# Sidekiq scheduler.
# schedule_file = 'config/sidekiq_schedule.yml'
# if File.exists?(schedule_file) && Sidekiq.server?
# Sidekiq::Cron::Job.load_from_hash! YAML.load_file(schedule_file)
# end
However, if I launch Sidekiq, every minute (which is the old schedule), I see this in the prompt:
2018-01-19T02:54:04.156Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: start
2018-01-19T02:54:04.164Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: fail: 0.008 sec
and it fails because it's trying to launch code a job that's not supposed to be launching.
I've went to the rails console prompt (rails -c) and tried to find the job, but nothing's in there:
irb(main):001:0> Sidekiq::Cron::Job.all
=> []
so I'm not quite sure why it's constantly trying to launch a job. If I go to the rails interface on my application, I don't see anything in the queue, nothing being processed, busy, retries, enqueued, nothing.
Any suggestions would be greatly appreciated. I've been trying to hunt this down for like the last hour and have no success. I even removed ALL of the workers from the workers directory, and yet it's still trying to launch one of them.
Because you have already load jobs, I think that those jobs configuration are still in REDIS. Checking this assumption by opening a new terminal tab with redis-cli:
KEYS '*cron*'
If there are those keys on REDIS, clear them will fix your issue.
Since you mentioned a cron job in your title but not in the question, I'm assuming there's a cronjob running the background sidekiq task.
Try running crontab - l in Terminal to see all your cron jobs. If you see something like "* * * * *", that means there's a job that is running every minute.
Then, use crontab - r to clear your cron tab and delete all scheduled tasks.

Clear worker cache in delayed jobs in production

I am using delayed jobs in my rails application. it works fine but there is an issue occurred on production server. I created a class in lib and call its method from controller to generate a csv file through delayed jobs. It was working fine when I ran the delayed jobs on local and production server but then I made some changes to this class for file naming convention and restarted the delayed jobs on local and then on production server. Now when I call that method through delayed job then it works according to latest changes I made to the class and sometimes it uses the old logic of file naming convention.
What could be the issue?
Delayed job has a hidden "feature" which is to ignore any changes to your app, and just use old settings, env-variables, email-templates, etc. You can clear every cache and restart your server, and it still holds onto data which no longer exists anywhere in your app's codebase.
delayed_job - Performs not up to date code?
Also be aware that DJ's "restart" does not always kill and restart all the workers, so you need to hunt them down and kill them all manually with
ps aux | grep delay
See: Rails + Delayed Job => email view template does not get updated
I have not yet found a "clear delayed job cache" function. If it exists, someone please post it here.
In my case, I just spent almost 4 hours trying everything to delete failing delayed_jobs in Heroku. In case you get here trying to kill a zombie delayed_job, but you're in Heroku, this won't work.
You can not do ps aux like you'd do in a regular server, nor you can do rake jobs:clear, and if you check via Rails console, you'll see the jobs there, but not in the Database, so nothing you can do there either.
What I did was placing the app in maintenance mode, made a deployment totally uninstalling delayed_job gem and all its references, and then another deployment reverting that change. That cleared the zombie cache, and that did the trick.
I had a similar issue in Dokku. My solution was to remove the worker=1 entry from my DOKKU_SCALE file (so all it contained was web=1) and also to remove the worker: bundle exec rake jobs:work line from my Procfile.
I pushed that to my production server, reversed the changes above, pushed again and it was fixed.

Ruby on Rails config change while server is running?

Hi I'm new to Ruby on Rails. I have installed the Testia Tarantula application and am trying to read up on Ruby.
My question is how do I start/stop the server.
For example: I want to change the Admin email, so I execute the following command to change the configuration of the app:
RAILS_ENV=production rake db:config:app
But is this command ok to execute while the server is running, it has 'db' in the command which is what would warn me that I shouldn't run it while the server is up. Anyone have some helpful tips for learning Ruby on Rails server app management?
Welcome to Rails!
You can run rake db:xxxxx while the server is running and it won't hurt anything. However I usually stop my server, run my rake command and then start it back up to ensure that all changes will be picked up. If running in production, I would think you may want to restart the server just to make sure. I believe that the schema is generated/updated upon server startup, just fyi.
As far as starting and stopping the server, if you are attached to it you can just use ctrl + c. If it is detached, you can search for the pid and then kill -9 .
Running rake db:anything will load rails on its own. It doesn't matter if you have a server up or not. This will happen in the background. Think of it as the same as running a sql script while the server is running. It's a separate process.

delayed_job -i via cron script through ruby will not start after stopping previous processes

So I have a weird situation, I have delayed_job 2.0.7 and daemons 1.0.10 and ruby 1.87 & rails 2.3.5 running on Scientific Linux release 6.3 (Carbon).
I have a rake task that restarts delayed jobs each night and then does a bunch of batch processing. I used to just do ruby script/delayed_job stop and then start. I have added a backport of named queues that has allowed me to do named queues. So because of this I want to start several processes of each type of named queue. To do this, it seems the best way I found is to use -i to name each process differently so they don't collide.
I wrote some ruby code to do this looping and it works great in dev, it works great on the command line, it works great when called from the rails console. But when called from cron it fails silently, the call returns false but no error message.
# this works
system_call_result1 = %x{ruby script/delayed_job stop}
SnapUtils.log_to_both "result of stop all - #{system_call_result1} ***"
# this works
system_call_result2 = %x{mv log/delayed_job.log log/delayed_job.log.#{Date.today.to_s}}
SnapUtils.log_to_both "dj log file has been rotated"
# this fails, result is empty string, if I use system I get false returned
for x in 1..DELAYED_JOB_MAX_THREAD_COUNT
system_call_result = %x{ruby script/delayed_job -i default-#{x} start}
SnapUtils.log_to_both "result of start default queue iteration #{x} - #{system_call_result} ***"
end
# this fails the same way
for y in 1..FOLLOWERS_DELAYED_JOB_MAX_THREAD_COUNT
system_call_result = %x{ruby script/delayed_job --queue=follower_ids -i follower_ids-#{y} start}
SnapUtils.log_to_both "result of start followers queue iteration #{y} - #{system_call_result} ***"
end
So I did lots of trial and found that this problem only happens if I used -i - named processes and only happens if I stop them, then try to start them. If I remove the stops then everything works fine.
Again this is only when I use cron.
If I use command line or console to run, it works fine.
So my question is, what could cron be doing differently that causes these named dj processes not to start if you previously stopped them in the same ruby process?
thanks
Joel
Ok, I finally figured this out, when checking to see if cron would send email, we found that sendmail was broken, the version of mysql that sendmail wanted was not installed, so we fixed that and then our problem magically went away. I would still offer the bounty to anyone that can explain exactly why..

How do I clear stuck/stale Resque workers?

As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.
I'm not sure why they won't clear or how to manually remove them.
I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.
None of these solutions worked for me, I would still see this in redis-web:
0 out of 10 Workers Working
Finally, this worked for me to clear all the workers:
Resque.workers.each {|w| w.unregister_worker}
In your console:
queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"
Otherwise you can try to fake them as being done to remove them, with:
Resque::Worker.working.each {|w| w.done_working}
EDIT
A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.
You probably have the resque gem installed, so you can open the console and get current workers
Resque.workers
It returns a list of workers
#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]
pick the worker and prune_dead_workers, for example the first one
Resque.workers.first.prune_dead_workers
Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).
Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}
I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350
Run this command wherever you ran the command to start the server
$ ps -e -o pid,command | grep [r]esque
you should see something like this:
92102 resque: Processing ProcessNumbers since 1253142769
Make note of the PID (process id) in my example it is 92102
Then you can quit the process 1 of 2 ways.
Gracefully use QUIT 92102
Forcefully use TERM 92102
* I'm not sure of the syntax it's either QUIT 92102 or QUIT -92102
Let me know if you have any trouble.
I just did:
% rails c production
irb(main):001:0>Resque.workers
Got the list of workers.
irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)
... where n is the zero based index of the unwanted worker.
I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.
Fix this using:
Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers
Make sure you restart Redis and your Resque workers.
Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:
"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."
Been used in production and works pretty well for me thus far.
Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.
Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }
I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.
I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku.
Get dead worker ID from the list. Mine was
55ba6f3b-9287-4f81-987a-4e8ae7f51210:2
Run this command in redis directly.
del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"
You can monitor redis db to see what it's doing behind the scenes.
redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"
Second last line deletes the worker.
In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:
Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }
I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.
This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.
If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.
I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.
I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going
have a look at my script here: https://gist.github.com/jobwat/5712437
If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...
Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}
This avoids the problem as long as you have a resque version newer than 1.26.0:
resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work
Keep in mind that it does not let the currently running job finish.
If you use Docker, you can also use this command:
<id> is the worker id.
docker stop <id>
docker start <id>

Resources