How to add a rate limit to a Resque queue? - ruby-on-rails

So I have a Resque worker that calls an API, the issue is the API has a rate limit of 2 requests per second.
Is there a way to add a delay between each job processed in a specific queue?
P.S. the queue could have thousands of pending jobs.

Why not sleep for a given amount of time at the end of the process? Well, perhaps you want your resque worker to be doing something useful instead. In CPU time, half a second is a lot of time - you could have done something useful there, like process a job from another queue that's not rate limited.
I have this same problem myself, so I'm motivated to find a solution. It seems like there are two easy-ish ways to do it. The first idea is to use resque scheduler and pre-compute the time to run the job at before inserting it. This seems error-prone to me. The second is to use a gem like https://github.com/flyerhzm/resque-restriction (disclaimer: just found it through some googling. haven't used it yet) and rate-limit as you pull jobs off the queue. Seems like a robust solution in theory. Note that if you can't execute the job yet, it never comes off the queue, so you'll pull something else instead - much more efficient use of your workers.

Per my comment, I'd recommend just performing a sleep for a given number of seconds at the end of each Resque process method.

Related

How long should a Sidekiq job last?

In the Sidekiq wiki it is stated:
Make your jobs small and simple
I get simple, I get idempotent and transactional, but what is small? Maybe required Memory and Computing Time is a good measure? My Sidekiq jobs take between 10sec and 30min.
I think that 10sec is okay, but what about the long-running task of 30min? I am loading all the data of a certain type from the database into memory, run lengthy computations on them and then write back the results. All three things in one worker job.
Is that fine? Or should I instead invoke from a worker job, multiple worker jobs that run the small computations? The problem is, these small computations may need some complex hash tables to do the computations and it was suggested not to persist this in Redis, only small simple values.
That depends on how often you want/have to invoke the job and whether it is acceptable to you that it takes so long.
If you run the job in shorter intervals than it takes to finish it certainly is too long.
Splitting it up into multiple workers would only help here if you could improve the total run time (e.g. if some of it can be run at the same time)
So rule of thumb as always: as long as it suits your needs it is ok.
However:
on long jobs you should consider that the job might fail mid-execution for whatever reason (server crashes etc).
Can you still continue this lengthy job there or will it be rolled back properly?
Also: What happens if data is changed while you are executing the job?

Grails non time based queuing

I need to process files which get uploaded and it can take as little as 1 second or as much as 10 minutes. Currently my solution is to make a quartz job with a timer of 30 seconds and then process and arbitrary job whenever it hits. There are several problems with this.
One: if the job will take less than a few seconds it is wasteful to make things wait 30 seconds for the job queue.
Two: if there is only one long job in the queue it could feasibly try to do it twice.
What I want is a timeless queue. When things are added the are started immediately if there is a free worker. Is there a solution for this? I was looking at jesque, but I couldn't tell if it can do this.
What you are looking for is a basic message queue. There are lots of options out there, but my favorite for Grails is RabbitMQ. The Grails plugin for it is quite good and it performs well in my experience.
In general, message queues allow you to have N producers (things creating jobs") adding work messages to a queue and then M consumers pulling jobs off of the queue and processing them. When a worker completes it's job, it simply asks the queue for the next job to process and if there is none, it just waits for the queue to give it something to do. The queue also keeps track of success / failure of message processing (you can control this) so that you don't give the same message to more than one worker.
This has the advantage of not relying on polling (so you can start processing as soon as things come in) and it's also much more scaleable. You can scale both your producers and consumers up or down as needed, decoupling the inputs from the outputs so that you can take a traffic spike and then work your way through it as you have the resources (workers) available.
To solve problem one just make the job check for new uploaded files every 5 seconds (or 3 seconds, or 1 second). If the check for uploaded files is quick then there is no reason you can't run it often.
For problem two you just need to record when you start processing a file to ensure it doesn't get picked-up twice. You could create a table in the database, or store the information in memory somewhere.

delayed_job, daemons or other gem for recurring background jobs

I need to build a background job that goes through a list of RSS feeds and analyze them say every 10 minutes.
I have been using delayed_job for handling background jobs and I liked it a lot. I believe though that it's not built for recurring background jobs. I guess I can auto-schedule background job at the end of everyone (maybe with begin..rescue just to ensure it gets executes). Or preschedule say a month of advance worth of jobs and have another one that reschedule the every month..etc
This raised some concerned to me as I started asking myself: what if the server goes down in the middle of execution and the jobs didn't get scheduled?
I have also looked at Daemons gems which seemed the like it runs simple Ruby scripts with start/stop commands. I like the way delayed_job schedules and handles retries.
What do you recommend using in this case? What do you think the best way to design such a system with recurring background jobs? Also do you know a way I can monitor that background process and get notified if it stops?
I just implemented delayed_job for a similar task (using :run_at => 2.days.from_now) and found it to be a perfect fit. The easiest way to handle your concern about a process failing is to make the first step of the job to create the next job. Also, you can create a has_many relationship to the delayed_job model which would allow you to access the :last_error. Or, look at the "Hooks" section of readme and it has a perfect example for failure.
I think that this was a similar question: A cron job for rails: best practices? - not only are there answers, but also links to railscasts about background jobs in rails.
I used cron + delayed_job, but scheduled tasks were supposed to run few times a day, mostly just once.
Take a look at SimpleWorker. It's an elastic scheduling and background processing worker queue. It's cloud based and has persistence and redundancy so you don't need to worry if your servers go down or are restarted.
Very flexible in terms of scheduling, provides great introspection of jobs in the queue as well as notifications on status and errors.
Full disclosure: I work at SimpleWorker.

Ruby long running process to react to queue events

I have a rails 3 app that writes certain events to a queue.
Now on the server I want to create a service that polls the queue every x seconds, and performs other tasks on a scheduled basis.
Other than creating a ruby script and running it via a cron job, are there other alternatives that are stable?
Although spinning up a persistent Rails-based task is an option, you may want to look at more orderly systems like delayed_job or Starling to manage your workload.
I'd advise against running something in cron since the expense of spinning up a whole Rails stack can be significant. Running it every few seconds isn't practical as the ramp-up time on Rails is usually 5-15 seconds depending on your hardware. Doing this a few times a day is usually no big deal, though.
A simple alternative is to create a work loop in a script you can engage with runner:
interval = 15.minutes
next_time = Time.now + interval
while (true)
if (stuff_to_do?)
do_stuff
end
# Figure out how much time is left before the next iteration
delay = next_time.to_i - Time.now.to_i
if (delay > 0)
# If ahead of schedule, take a break
sleep(delay)
end
end
The downside to this is that the Rails stack will remain in memory as long as this background process is running, but this is a trade-off between huge CPU hits and a memory hit.
You have several options for that, including DelayedJob and Resque.
Resque relies on Redis and is the solution I use all the time (and am very happy with).
I would recommend Ryan Bates' railscast on this subject which talks about beanstalkd and the stalker wrapper for it:
http://railscasts.com/episodes/243-beanstalkd-and-stalker
To add to the possibilities here, Using a more heavy-duty queuing system like AMQP (RabbitMQ) is made easy by the 'minion' gem. Similar to beanstalkd:
https://github.com/orionz/minion
#Blankman, you should check out http://www.simpleworker.com, it's made for things like this and takes the burden of running/scheduling/monitoring your processes off of you. And it's very stable.

Rails 3 + Heroku + Delayed jobs - Help me understand!

I'm having problems understanding this article: http://blog.darkhax.com/2010/07/30/auto-scale-your-resque-workers-on-heroku .
I don't quite get it why do I need Redis + Resque when I have delayed jobs provided by Heroku.
From my understanding, I still have to pay for the workers, correct? What's my main advantage of using that solution?
Regards.
If you don't know why you need Resque, then you don't need it ;)
Resque is for high-scalability. delayed_job is fine for smaller-scale stuff, but once you get to the size of, say, Github, you will need something like Resque. If delayed_job works for you, then stay with it. You don't need to worry about replacing it until your background jobs queue gets around 30,000 or so.
To autoscale heroku workers using delayed job, you can hook into the enqueue and after hooks and use the heroku api to query/update the number of workers.
For the most basic implementation on enqueue, check to see if there are workers and if not add a worker. On after, check to see if there are other delayed jobs and if not reduce the workers to 0.
You can obviously make this more sophisticated in the way that you scale.
Here is a basic implementation: https://github.com/phaza/Heroku-Delayed-Job-Autoscale
hirefireapp is a new-ish simple drop-in solution to auto-scaling workers.
It spawns workers for you based on queue size (configurable) and then "fires" them when they are no longer necessary. You pay for the dyno time (to the nearest second) and for the hirefireapp service. In theory you could roll your own using the open source hirefire gem too.
It also handles scaling the web side if you choose, so you can spawn more web dynos based on current latency.
You can also use Hirefireapp.com to monitor and scale your apps

Resources