How to queue rails calls so function isn't run in parallel - ruby-on-rails

I have some rails code that calls model1.func1(). A controller action calls this, where multiple people can be hitting it, as does a scheduled rake task. I want to make sure that model1.func1() cannot be called in parallel. If another thread needs to call at the same time, it should wait for model1.func() to finish. I guess I want to queue these calls. I was going to use sidekiq for this, but with only one worker. I read on a forum that
Sidekiq is not appropriate for the serial job and I don't want to make
it appropriate. Different tools are useful for different reasons,
jack of all trades master of none, etc.
What do you guys recommend instead?

I would consider beanstalkd with one worker process.

Related

How to make multiple parallel concurrent requests with Rails and Heroku

I am currently developing a Rails application which takes a long list of links as input, scrapes them using a background worker (Resque), then serves the results to the user. However, in some cases, there are numerous URLs and I would like to be able to make multiple requests in parallel / concurrency such that it would take much less time, rather than waiting for one request to complete to a page, scraping it, and moving on to the next one.
Is there a way to do this in heroku/rails? Where might I find more information?
I've come across resque-pool but I'm not sure whether it would solve this issue and/or how to implement. I've also read about using different types of servers to run rails in order to make concurrency possible, but don't know how to modify my current situation to take advantage of this.
Any help would be greatly appreciated.
Don't use Resque. Use Sidekiq instead.
Resque runs in a single-threaded process, meaning the workers run synchronously, while Sidekiq runs in a multithreaded process, meaning the workers run asynchronously/simutaneously in different threads.
Make sure you assign a URL to scrape per worker. It's no use if one worker scrape multiple URLs.
With Sidekiq, you can pass the link to a worker, e.g.
LINKS = [...]
LINKS.each do |link|
ScrapeWoker.perform_async(link)
end
The perform_async doesn't actually execute the job right away. Instead, the link is just put in a queue in redis along with the worker class, and so on, and later (could be milliseconds later) workers are assigned to execute each job in queue in its own thread by running the perform instance method in ScrapeWorker. Sidekiq will make sure to retry again if exception occur during execution of a worker.
PS: You don't have pass a link to the worker. You can store the links to a table and then pass the ids of the records to workers.
More info about sidekiq
Adding these two lines to your code will also let you wait until the last job is complete before proceeding:
this line ensures that your program waits for at least one job is enqueued before checking that all jobs are completed as to avoid misinterpreting an unfilled queue as the completion of all jobs
sleep(0.2) until Sidekiq::Queue.new.size > 0 || Sidekiq::Workers.new.size > 0
this line ensures your program waits till all jobs are done
sleep(0.5) until Sidekiq::Workers.new.size == 0 && Sidekiq::Queue.new.size == 0

How to schedule Sidekiq Woker.perform.now?

I have a worker with one perform class. In the controller I am calling up that worker class. When I call it up with worker.perform.now I see in console that perform method is being executed as I want it.
How to schedule this callup in controller, to be performed every day at ten o'clock?
PS: When I call worker.perform_async it doesn't do anything.
PS: When I call worker.perform_async it doesn't do anything.
I guess, that you didn't start sidekiq server. Type sidekiq in console to start the server.
As it goes for scheduling sidekiq jobs to perform every period of time, there are several ways to do it.
The one recomended by the Sidekiq's author is to use CRON with Whenever gem. He even provide and example here: https://github.com/mperham/sidekiq/blob/master/examples/scheduling.rb
Other way is to use Sidekiq-cron gem. I've found it easier to setup, but it emulates CRON, that's why the first solution is more solid.

How do I create a worker daemon which waits for jobs and executes them?

I'm new to Rails and multithreading and am curious about how to achieve the following in the most elegant way.
I couldn't find any nice tutorials which explained in detail what's the best design decision for the following task:
I have a couple of HTTP requests which will be run for a user in the background, for example, parsing a couple websites and get some information like HTTP response code, response time, then return the results. For performance reasons, I decided to split the total number of URLs to parse into batches of 25 each, then execute each batch in a thread, join these and write the result to a database.
I decided to use the following gem (http://rubygems.org/gems/thread) to ensure that there's a maximum number of threads that are run simultaneously. So far so good.
The problem is, if two users start their analysis in parallel, the maximum number of threads is two times the maximum of my threadpool.
My solution (imho) is to create a worker daemon which runs on its own and waits for jobs from the clients.
My question is, what's the best way to achieve this in Rails?
Maybe create a Rake task, and use it as a daemon (see: "Daemoninsing a rake task") and (how?) add jobs to it?
Thank you very much in advance!
I'd build a queue in a table in the database, and a bit of code that is periodically started by cron, which walks that table, passing requests to Typhoeus and Hydra.
Here's how the author summarizes the gem:
Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.
As users add requests, append them to the table. You'll want fields like:
A "processed" field so you can tell which were handled in case the system goes down.
A "success" field so you can tell which requests were processed successfully, so you can retry if they failed.
A "retry_count" field so you can retry up to "n" times, then flag that URL as unreachable.
A "next_scan_time" field that says when the URL should be scanned again so you don't DOS a site by hitting it continuously.
Typhoeus and Hydra are easy to use, and do make it easy to handle multiple requests.
There are a bunch of libraries for Rails that can manage queues of long-running background jobs for you. Here are a few:
Sidekiq uses Redis for job storage and supports multiple worker threads.
Resque also uses Redis and a single worker thread.
delayed_job manages a job queue through ActiveRecord (or Mongoid).
Once you've chosen one, I'd recommend using Foreman to simplify launching multiple daemons at once.

rails periodic task

I have a ruby on rails app in which I'm trying to find a way to run some code every few seconds.
I've found lots of info and ideas using cron, or cron-like implementations, but these are only accurate down to the minute, and/or require external tools. I want to kick the task off every 15 seconds or so, and I want it to be entirely self contained within the application (if the app stops, the tasks stop, and no external setup).
This is being used for background generation of cache data. Every few seconds, the task will assemble some data, and then store it in a cache which gets used by all the client requests. The task is pretty slow, so it needs to run in the background and not block client requests.
I'm fairly new to ruby, but have a strong perl background, and the way I'd solve this there would be to create an interval timer & handler which forks, runs the code, and then exits when done.
It might be even nicer to just simulate a client request and have the rails controller fork itself. This way I could kick off the task by hitting the URI for it (though since the task will be running every few seconds, I doubt I'll ever need to, but might have future use). Though it would be trivial to just have the controller call whatever method is being called by the periodic task scheduler (once I have one).
I'd suggest the whenever gem https://github.com/javan/whenever
It allows you to specify a schedule like:
every 15.minutes do
MyClass.do_stuff
end
There's no scheduling cron jobs or monkeying with external services.
Generally speaking, there's no built in way that I know of to create a periodic task within the application. Rails is built on Rack and it expects to receive http requests, do something, and then return. So you just have to manage the periodic task externally yourself.
I think given the frequency that you need to run the task, a decent solution could be just to write yourself a simple rake task that loops forever, and to kick it off at the same time that you start your application, using something like Foreman. Foreman is often used like this to manage starting up/shutting down background workers along with their apps. On production, you may want to use something else to manage the processes, like Monit.
You can either write you own method, something like
class MyWorker
def self.work
#do you work
sleep 15
end
end
run it with rails runner MyWorker.work
There will be a separate process running in the background
Or you can use something like Resque, but that's a different approach. It works like that: something adds a task to the queue, meanwhile a worker is fetching whatever job it is in the queue, and tries to finish it.
So that depends on your own need.
I know it is an old question. But maybe for someone this answer could be helpful. There is a gem called crono.
Crono is a time-based background job scheduler daemon (just like Cron) for Ruby on Rails.
Crono is pure Ruby. It doesn't use Unix Cron and other platform-dependent things. So you can use it on all platforms supported by Ruby. It persists job states to your database using Active Record. You have full control of jobs performing process. It's Ruby, so you can understand and modify it to fit your needs.
The awesome thing with crono is that its code is self explained. In order to do a task periodically you can just do:
Crono.perform(YourJob).every 2.days
Maybe you can also do:
Crono.perform(YourJob).every 30.seconds
Anyway you really can do a lot of things. Another example could be:
Crono.perform(TestJob).every 1.week, on: :monday, at: "15:30"
I suggest this gem instead of whenever because whenever uses Unix Cron table which not always is available.
Throwing out a solution just because it looks somewhat elegant and answers the question without any extra gems. In my scenario I wanted to run some code, but only after all my Sidekiq workers were done doing their thing.
First I defined a method to check if any workers were working...
def workers_working?
workers = Sidekiq::Workers.new.map do |_process_id, _thread_id, work|
work
end
workers.size > 0
end
Then we just call the method with a loop which sleeps between calls.
sleep 5 while workers_working?
Use something like delayed job, and requeue it every so often?
Use thin or other server which uses eventmachine, then just use timers that are part of eventmachine. Example: in config/application.rb
EM.add_periodic_timer(2) do
do_this_every_2_sec
end

Regular delayed jobs

I'm using Delayed Job to manage background work.
However I have some tasks that need to be executed at regular interval. Every hour, every day or every week for example.
For now, when I execute the task, I create a new one to be executed in one day/week/month.
However I don't really like it. If for any reason, the task isn't completely executed, we don't create the next one and we might lose the execution of the task.
How do you manage that kind of things (with delayed job) in your rails apps to be sure your regular tasks list remains correct ?
If you have access to Cron, I highly recommend Whenever
http://github.com/javan/whenever
You specify what you want to run and at what frequency in dead simple ruby, and whenever supplies rake tasks to convert this into a crontab and to update your system's crontab.
If you don't have access to frequent cron (like I don't, since we're on Heroku), then DJ is the way to go.
You have a couple options.
Do what you're doing. DJ will retry each task a certain number of times, so you have some leniency there
Put the code that creates the next DJ job in an ensure block, to make sure it gets created even after an exception or other bad event
Create another DJ that runs periodically, checks to make sure the appropriate DJs exist, and creates them if they don't. Of course, this is just as error prone as the other options, since the monitor and the actual DJ are both running in the same env, but it's something.
Is there any particular reason why you wouldn't use cron for this type of things?
Or maybe something more rubyish like rufus-scheduler, which is quite easy to use and very reliable.
If you don't need queuing, these tools are a way to go, I think.

Resources