Ruby How to create daemon process that will spawn multiple workers - ruby-on-rails

I have a script called 'worker.rb'. When ran this script will perform processing for a while (an hour lets say) and then die.
I need to have another script which is going to be responsible for spawning the worker script above. Let's call this script 'runner.rb'. 'runner.rb' will be called with an argument dictating how many workers it is allowed to spawn.
I'd like runner.rb to do the following: (e.g. 'ruby runner.rb 5')
- Query the database for specific values (e.g. got 100 values)
- Spawn 5 instances of 'worker.rb' (passing the first 5 values respectively)
- Keep checking for any of the instances of 'worker.rb' spawned above to finish and then call 'worker.rb' again with the 6th value from the database and continue this process indefinitely.
I'm using the Daemons gem but am lost as the best way to go about this. The 'runner' script should definitely be daemonized - but should worker also be daemonized?
How should 'runner' go about checking if 'worker' has finished or not? Can this be done using a PID stored in a file?

I used Daemons gem before. But somehow it didn't do well on keep the number of child processes. Then I made a another one, called light_daemon. You could let light_daemon to prefork certain number of worker processes. If one of the worker dies for any reason, the light_daemon will spawn a new one to replace it. If your worker process may cause memory leaking issue, you could let the work to actively die before it gets too big. The parent process will keep the number of the worker processes constant. I used it in the produce site of one of my projects. I worked pretty well.
The following is an example daemon using the light-daemon gem.
require 'rubygems'
require 'light_daemon'
class Client
def initialize
#count = 0
end
def call
`echo "process: #{Process.pid}" >> /tmp/light-daemon.txt`
sleep 3
#count +=1
(#count < 100)? true : false
end
end
LightDaemon::Daemon.start(Client.new, :children=> 2, :pid_file => "/tmp/light-daemon.pid" )
In the daemon, the worker process dies after the method "call" is invoked 100 times. Then a new worker process is spawned and the process continues.

Related

Rails Initializer: An infinite loop in a separate thread to update records in the background

I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:
# in config/initializers/item_loop.rb
Thread.new
loop do
Item.find_each do |item|
# Get price from third-party api and update record.
item.update_price!
# Need to wait a little between requests to avoid getting throttled.
sleep 5
end
end
end
I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.
Yet there are multiple things that concern me:
Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
Deadlock: Should I use Rails.application.executor.wrap?
Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?
Information on any of these subjects is appreciated.
Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler
Some code snippets:
# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
include Sidekiq::Worker
sidekiq_options throttle: { threshold: 720, period: 1.hour }
def perform(item_id)
Item.find(item_id).update_price!
end
end
# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
include Sidekiq::Worker
def perform
Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
end
end
# config/sidekiq.yml
:schedule:
update_price:
cron: '0 */4 * * *' # Runs once per 4 hours - depends on how many Items are there
class: UpdatePriceMasterWorker
Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.
I use rails runner x (god gem or k8s) in our similar case.
Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.
God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).
I think deadlock would happen in any concurrency situation.
I do not think this loop + sleep approach is a design flaw, because:
cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
ActiveJob is good for one-shot long-running task, but it does not fit for daemon.

Memory consumpton inside sidekiq worker

Does loading multiple Models in sidekiq worker can cause memory leak? Does it get garbage collected?
For example:
class Worker
include Sidekiq::Worker
def perform
Model.find_each do |item|
end
end
end
Does using ActiveRecord::Base.connection inside worker can cause problems? Or this connection automatically closes?
I think you are running into a problem that I also had with a "worker" - the actual problem was the code, not Sidekiq in any way, shape or form.
In my problematic code, I thoughtlessly just loaded up a boatload of models with a big, fat, greedy query (hundreds of thousands of instances).
I fixed my worker/code quite simply. For my instance, I transitioned my DB call from all to use find_in_batches with a lower number of objects pulled for the batch.
Model.find_in_batches(100) do |record|
# ... I like find_in_batches better than find_each because you can use lower numbers for the batch size
# ... other programming stuff
As soon as I did this, a job that would bring down Sidekiq after a while (running out of memory on the box) has run with find_in_batches for 5 months without me even having to restart Sidekiq ... Ok, I may have restarted Sidekiq some in the last 5 months when I've deployed or done maintenance :), but not because of the worker!

Resque job not actually backgrounding

It is instead taking up my processor, and then effectually timing out.
I have in my controller :
after_save :handle_file
def handle_test
Resque.enqueue UnpackFileOnS3, parent.id
end
It hits this mark, and then the entire app waits for it to set up and upload the files as prescribed inside my Job. Then it predictably times out because it takes awhile to upload it.
This occurs in my console as well.. If I run :
Resque.enqueue UnpackFileOnS3, 4
Then instead of enqueue'ing it, it locks up my console as it tries to run the entire file. I think that normally, console would just enqueue it to a worker and redis..
Why isn't this actually happening in the background? As I assume if that were the case, the timeouts would not occur.
My guess is that you are running resque in an inline mode. In this mode queing is disabled. Check your configs for this kind of code:
Resque.inline = ENV['RAILS_ENV'] == "cucumber"
#or whatever, important part is the inline option

How to run non blocking command from Ruby?

User goes to page A to create a new multiplayer game
The script in page A generates a unique ID for the game, and creates a worker for it. Something like: rails runner GameWorker.new(:game_id => game_id).start_game
The script in page A redirects the user to page B, where he can see the newly created game, and others can join.
The worker should be alive until the end of the game.
What would be the proper way to run the command that starts the worker? It must be non blocking and ideally redirect output to the log file, in case something goes wrong.
I'm using Rails 3, if it matters.
UPDATE
I'm gonna rephrase my question: How to run a linux command from within ruby and don't wait for the command to end? I mean the equivalent for &>>. In php for instance, &>> works fine and I don't need to use any special php functiont, but in ruby it seems to get overriden by and the script waits for the command to end and grab the output.
I HIGHLY recommend not running a process per game. If you want a non-blocking game that is not turn based, then you probably want to look at event-machine, or something like https://github.com/celluloid/celluloid-io
With either, you'll be creating threads that you'll process at future points in time.
But -- if you do want to just fire off a process in ruby, here you go.. from How to fire and forget a subprocess?
pid = Process.fork
if pid.nil? then
# In child
exec "whatever --take-very-long"
else
# In parent
Process.detach(pid)
end

Permanent daemon for quering a web resource

I have a rails 3 application and looked around in the internet for daemons but didnt found the right for me..
I want a daemon which fetches data permanently (exchange courses) from a web resource and saves it to the database..
like:
while true
Model.update_attribte(:course, http::get.new("asdasd").response)
end
I've only seen cron like jobs, but they only run after a specific time... I want it permanently, depending on how long it takes to end the query...
Do you understand what i mean?
The gem light-daemon I wrote should work very well in your case.
http://rubygems.org/gems/light-daemon
You can write your code in a class which has a perform method, use a queue system like this and at application startup enqueue the job with Resque.enqueue(Updater).
Obviously the job won't end until the application is stopped, personally I don't like that, but if this is the requirement.
For this reason if you need to execute other tasks you should configure more than one worker process and optionally more than one queue.
If you can edit your requirements and find a trigger for the update mechanism the same approach still works, you only have to remove the while true loop
Sample class needed:
Class Updater
#queue = :endless_queue
def self.perform
while true
Model.update_attribute(:course, http::get.new("asdasd").response)
end
end
end
Finaly i found a cool solution for my problem:
I use the god gem -> http://god.rubyforge.org/
with a bash script (link) for starting / stopping a simple rake task (with an infinite loop in it).
Now it works fine and i have even some monitoring with god running that ensures that the rake task runs ok.

Resources