How to throttle Delayed job to not anger the Facebook API - ruby-on-rails

I'm building an app that will be hitting the Facebook graph api a lot. I learned they have a rate limit of 600 requests every 600 seconds.
I'm using delayed job for all my background processing. What is a good way to schedule delayed job to stay under the fb api rate limit? Are there any tricks with delayed job or do I need to build a separate background task processor to not go over my rate limit?
Thanks

600 requests every 600 sec is 1 per sec on avg.
Not very fast!
1) Depending on your company's size and heft, I'd investigate with FB to see if you can get the limit raised for you.
2) You can stick with DelayedJob, no need to re-invent the wheel. You just need to change the scheduler.
In my DelayedJob installation, I use the "run_at" column for more than just setting the time to retry the jobs--I also use it as the time to run the job in the first place. You can also use it to throttle your jobs.
Changed in the DelayedJob file job.rb:
# added run_at param
# eg Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...'), 0,
# Delayed::Job.db_time_now + 15.minutes
def self.enqueue(object, priority = 0, run_at = nil)
unless object.respond_to?(:perform)
raise ArgumentError, 'Cannot enqueue items which do not respond to perform'
end
Job.create(:payload_object => object, :priority => priority,
:run_at => run_at)
end
For your goal, I would keep track of the last time a FB api call was enqueued, and schedule the next one to run_at a time at least a sec greater.
Benefit: you would be able to interleave other, non-FB tasks, in with the FB api calls.

A bit of a shameless plug but you might want to try SimpleWorker, a cloud-based background processing / worker queue for Ruby apps. You can schedule one or more jobs to come off queue and hit the FB api when you need to. All the scheduling and queue management is handled by SimpleWorker and processing is also done in the cloud.
It's built for just this type of use.
Suggest you also check out the mini_fb gem for working with FB (Appoxy is the creator and maintainer).
Let us know us if you need any help.
Ken # SimpleWorker

Related

How can I use Sidekiq delay with a worker

I have a situation where I have a worker that makes multiple calls to an external API. The problem is that we have a threshold of many calls we can make to this API per hour.
What I'd like to do is to create a worker which will make these many sequential calls to this external API. If in between these calls we get an error because we've reached the number of connections we're allowed in that hour, the worker would then save the document and schedule a new worker to complete the remaining API calls at a later time (maybe 1, 2 hours later. Ideally this should be configurable e.g.: 10mins, 1hour, etc).
Is there someway I could achieve this?
With SideKiq you can scheduled when a job will be executed with a friendly API :
MyWorker.perform_in(3.hours, 'mike', 1) # Expect a duration
MyWorker.perform_at(3.hours.from_now, 'mike', 1) # Expect a date
Check it out : Scheduled Jobs
You want Sidekiq Enterprise and its Rate Limiting API. The alternative is tracking the rate limit yourself and rescheduling the job manually.
https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting

enqueuing jobs using sucker punch

I have one doubt with enqueuing the job using sucker punch.
I have 2000+ search keywords in my database I want to know the google and bing ranking for each keyword in my database. For this I'm using Authority Labs API. But AuthorityLabs will only process 1000 POST request in 1 hour. I'm sending each request to AuthorityLab as a background job using sucker punch. How can I limit only 1000 jobs will run in 1 hour, remaining jobs only start after one hour. Also I want to run this jobs daily for analysing the rank change.
Rate limiting is not a concern of your queue system, much less of SuckerPunch that is not designed to handle advanced delaying/queuing stuff, it just moves asynchronous jobs to a thread from a thread pool.
If you really want to have rate limiting, use a real queue system like Sidekiq, and put some actual code to work.
Sidekiq Enterprise supports it natively: https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting
Sidekiq-throttler seems to provide the same functionality: https://github.com/gevans/sidekiq-throttler
But you can also just delay execution (so pre-emptively limiting the rate), by enqueuing jobs at specific times in the future (each executing 4 minutes after the other) or enqueuing just one job that executes itself (doing next outstanding request) and enqueues itself again with 4 minutes delay.
As always with open source, check the code and decide by yourself.
Could you do something like this?
YourProcessingJob.set(wait: 1.hours).perform_later
Possibly in a custom rake task...

How to lock Resque jobs to one server

I have a "cluster" of Resque servers in my infrastructure. They all have the same exact job priorities etc. I automagically scale the number of Resque servers up and down based on how many pending jobs there are and available resources on the servers to handle said jobs. I always have a minimum of two Resque servers up.
My issue is that when I do a quick, one off job, sometimes both the servers process that job. This is bad.
I've tried adding a lock to my job with something like the following:
require 'resque-lock-timeout'
class ExampleJob
extend Resque::Plugins::LockTimeout
def self.perform
# some code
end
end
This plugin works for longer running jobs. However for these super tiny one off jobs, processing happens right away. The Resque servers both do not see the lock set by its sister server, both set a lock, process the job, unlock, and are done.
I'm not entirely sure what to do at this point or what solutions there are except for having one dedicated server handle this type of job. That would be a serious pain to configure and scale. I really want both the servers to be able to handle it, but once one of them grabs it from the queue, ensure the other does not run it.
Can anyone suggest some viable solution(s)?
Write your lock interpreter to wait T milliseconds before it looks for a lock with a unique_id less than the value of the lock it made.
This will determine who won the race, and the loser will self-terminate.
T is the parallelism latency between all N servers in the pool of a given queue. You can determine this heuristically by scaling back from 1000 milliseconds until you again find the job happening in-duplicate. Give padding for latency variation.
This is called the Busy-Wait solution to mutex thread safety. It is considered one of the trade-offs acceptable given the various scenarios in which one must solve Mutex (e.g. Locking, etc)
I'll post some links when off mobile. Wikipedia entry on mutex should explain all this.
Of this won't work for you, then:
1. Use a scheduler to control duplication.
2. Classify short-running jobs to a queue designed to run them in serial.
TL;DR there is no perfect solution, only good trade-off for your conditions.
It should not be possible for two workers to get the same 'payload' because items are dequeued using BLPOP. Redis will only send the queued item to the first client that calls BLPOP. It sounds like you are enqueueing the job more than once and therefore two workers are able to acquire different payloads with the same arguments. The purpose of 'resque-lock-timeout' is to assure that payloads that have the same method and arguments do not run concurrently; it does not however stop the second payload from being worked if the first job releases the lock before the second job tries to acquire it.
It would make sense that this only happens to short running jobs. Here is what might be happening:
payload 1 is enqueued
payload 2 is enqueued
payload 1 is locked
payload 1 is worked
payload 1 is unlocked
payload 2 is locked
payload 2 is worked
payload 2 is unlocked
Where as in long running jobs the following senario might happen:
payload 1 is enqueued
payload 2 is enqueued
payload 1 is locked
payload 1 is worked
payload 2 is fails to get lock
payload 1 is unlocked
Try turning off Resque and enqueueing your job. Take a look in redis at the list for your Resque queue (or monitor Redis using redis-cli monitor). See if Resque has queued more than one payload. If you still only see one payload then monitor the list to see if another one of your resque workers is calling recreate on failed jobs.
If you want to have 'resque-lock-timeout' hold the lock for longer than the duration it takes to process the job you can override the release_lock! method to set an expiry on the lock instead of just deleting it.
module Resque
module Plugins
module LockTimeout
def release_lock!(*args)
lock_redis.expire(redis_lock_key(*args), 60) # expire lock after 60 seconds
end
end
end
end
https://github.com/lantins/resque-lock-timeout/blob/master/lib/resque/plugins/lock_timeout.rb#l153-155

How to run n jobs each x time with delayed job gem?

My question is how could run x quantities of jobs each n seconds with delayed job gem?
Example
I want send massive newsletter. However when I send #50 ("example") mail, GMail doesn´t send the emails (SPAM) however its contents is a bit different.
So if I send the newsletter by groups of twenty user each 3 minutes perhaps GMail will send the emails correctly.
Thanks in advance
delayed_job has an option :run_at(https://github.com/collectiveidea/delayed_job), you can use that to set at what time the job should run. It doesn't guarantee if job would run at that time, but it would surely not run before that.
So, 20 mails/3 minutes = 1 email/9 seconds.
You can do somethings like this:
now = Time.now
count = 0
# then for each newsletter schedule it at intervals of 9 secs
users.each do |u|
WhateverMailer.delay(:run_at => now + count*9).news_letter(u)
count += 1
end
You can do this type of thing with SimpleWorker, a cloud-based scheduling and background processing service. It offers DelayedJob-like capabilities but without having to manage servers and queues. (I work at Iron.io, makers of SimpleWorker).
You can schedule a job to run every X seconds or schedule multiple jobs to come onto queue at specific times (different priorities have different target run windows). Rather than pre-schedule a lot of jobs though, you'll probably want to have one or more master jobs that run on regular schedules and then queue up one or more slave jobs to send the actual emails (each checking the database to pick up the next set to send).
You can do use the same approach when facing thresholds with fetching data via APIs. Happy to discuss further if you'd like.
Ken

Ruby long running process to react to queue events

I have a rails 3 app that writes certain events to a queue.
Now on the server I want to create a service that polls the queue every x seconds, and performs other tasks on a scheduled basis.
Other than creating a ruby script and running it via a cron job, are there other alternatives that are stable?
Although spinning up a persistent Rails-based task is an option, you may want to look at more orderly systems like delayed_job or Starling to manage your workload.
I'd advise against running something in cron since the expense of spinning up a whole Rails stack can be significant. Running it every few seconds isn't practical as the ramp-up time on Rails is usually 5-15 seconds depending on your hardware. Doing this a few times a day is usually no big deal, though.
A simple alternative is to create a work loop in a script you can engage with runner:
interval = 15.minutes
next_time = Time.now + interval
while (true)
if (stuff_to_do?)
do_stuff
end
# Figure out how much time is left before the next iteration
delay = next_time.to_i - Time.now.to_i
if (delay > 0)
# If ahead of schedule, take a break
sleep(delay)
end
end
The downside to this is that the Rails stack will remain in memory as long as this background process is running, but this is a trade-off between huge CPU hits and a memory hit.
You have several options for that, including DelayedJob and Resque.
Resque relies on Redis and is the solution I use all the time (and am very happy with).
I would recommend Ryan Bates' railscast on this subject which talks about beanstalkd and the stalker wrapper for it:
http://railscasts.com/episodes/243-beanstalkd-and-stalker
To add to the possibilities here, Using a more heavy-duty queuing system like AMQP (RabbitMQ) is made easy by the 'minion' gem. Similar to beanstalkd:
https://github.com/orionz/minion
#Blankman, you should check out http://www.simpleworker.com, it's made for things like this and takes the burden of running/scheduling/monitoring your processes off of you. And it's very stable.

Resources