How can I use Sidekiq delay with a worker - ruby-on-rails

I have a situation where I have a worker that makes multiple calls to an external API. The problem is that we have a threshold of many calls we can make to this API per hour.
What I'd like to do is to create a worker which will make these many sequential calls to this external API. If in between these calls we get an error because we've reached the number of connections we're allowed in that hour, the worker would then save the document and schedule a new worker to complete the remaining API calls at a later time (maybe 1, 2 hours later. Ideally this should be configurable e.g.: 10mins, 1hour, etc).
Is there someway I could achieve this?

With SideKiq you can scheduled when a job will be executed with a friendly API :
MyWorker.perform_in(3.hours, 'mike', 1) # Expect a duration
MyWorker.perform_at(3.hours.from_now, 'mike', 1) # Expect a date
Check it out : Scheduled Jobs

You want Sidekiq Enterprise and its Rate Limiting API. The alternative is tracking the rate limit yourself and rescheduling the job manually.
https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting

Related

Sending http requests every n seconds in my Rails app

What's the best way to send http requests to external api every n number of seconds? Where n is changing after every request.
I have inifinite loop which calculates time interval and sends http request, but I don't know what's the best way to use it in Rails app.
I though sidekiq would be perfect solution. With chained jobs, where job would send request, calculate time interval and schedule another job with set(wait: n). But it looks like Sidekiq has polling interval and set(wait: n) does not run request in exactly n seconds.
How would you do something like this?
You are totally right about Sidekiq. It will be the best solution I think. Polling interval can be configured via average_scheduled_poll_interval . Here there are documentation
Do so:
Create an async job
After the job is completed queue the same job and ask Sidekiq to wait some time. SMSDelegationJob.perform_later(wait: 10.seconds)
Don't forget to develop good logic for exception handling
Don't forget to set low polling interval
Smart root job manually or via console.
Good luck with it.
Is it n seconds between requests (i.e. from when the last one completed to when the next one starts), or should they start every n seconds, regardless of how long the last one took (or if it was successful or not)?
Answering that question should tell you whether the requests need to be made in parallel (using some form of concurrency), or whether you could just do it from a single long-lasting process.

enqueuing jobs using sucker punch

I have one doubt with enqueuing the job using sucker punch.
I have 2000+ search keywords in my database I want to know the google and bing ranking for each keyword in my database. For this I'm using Authority Labs API. But AuthorityLabs will only process 1000 POST request in 1 hour. I'm sending each request to AuthorityLab as a background job using sucker punch. How can I limit only 1000 jobs will run in 1 hour, remaining jobs only start after one hour. Also I want to run this jobs daily for analysing the rank change.
Rate limiting is not a concern of your queue system, much less of SuckerPunch that is not designed to handle advanced delaying/queuing stuff, it just moves asynchronous jobs to a thread from a thread pool.
If you really want to have rate limiting, use a real queue system like Sidekiq, and put some actual code to work.
Sidekiq Enterprise supports it natively: https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting
Sidekiq-throttler seems to provide the same functionality: https://github.com/gevans/sidekiq-throttler
But you can also just delay execution (so pre-emptively limiting the rate), by enqueuing jobs at specific times in the future (each executing 4 minutes after the other) or enqueuing just one job that executes itself (doing next outstanding request) and enqueues itself again with 4 minutes delay.
As always with open source, check the code and decide by yourself.
Could you do something like this?
YourProcessingJob.set(wait: 1.hours).perform_later
Possibly in a custom rake task...

How to run n jobs each x time with delayed job gem?

My question is how could run x quantities of jobs each n seconds with delayed job gem?
Example
I want send massive newsletter. However when I send #50 ("example") mail, GMail doesn´t send the emails (SPAM) however its contents is a bit different.
So if I send the newsletter by groups of twenty user each 3 minutes perhaps GMail will send the emails correctly.
Thanks in advance
delayed_job has an option :run_at(https://github.com/collectiveidea/delayed_job), you can use that to set at what time the job should run. It doesn't guarantee if job would run at that time, but it would surely not run before that.
So, 20 mails/3 minutes = 1 email/9 seconds.
You can do somethings like this:
now = Time.now
count = 0
# then for each newsletter schedule it at intervals of 9 secs
users.each do |u|
WhateverMailer.delay(:run_at => now + count*9).news_letter(u)
count += 1
end
You can do this type of thing with SimpleWorker, a cloud-based scheduling and background processing service. It offers DelayedJob-like capabilities but without having to manage servers and queues. (I work at Iron.io, makers of SimpleWorker).
You can schedule a job to run every X seconds or schedule multiple jobs to come onto queue at specific times (different priorities have different target run windows). Rather than pre-schedule a lot of jobs though, you'll probably want to have one or more master jobs that run on regular schedules and then queue up one or more slave jobs to send the actual emails (each checking the database to pick up the next set to send).
You can do use the same approach when facing thresholds with fetching data via APIs. Happy to discuss further if you'd like.
Ken

what would be the possible approach to go : SQS or SNS?

I am going to make the rails application which integrates the Amazon's cloud services.
I have explore amazon's SNS service which gives the facility of public subscription which i don't want to do. I want to notify only particular subscriber.
For example if I have 5 subscriber in one topic then the notification should be goes to particular subscriber.
I have also explored amazon's SQS in which i have to write a poller which monitor the queue for message. SQS has also a lock mechanism but the problem is that it is distributed so there would be a chance of getting same message from another copy of queue for process.
I want to know that what would be the possible approach to go.
SQS sounds like what you want.
You can run multiple "worker" processes that compete over messages in the queue. Each message is only consumed once. The logic behind the "lock" / timeout that you mention is as follows: if one of your workers were to die after downloading a message, but before processing it, then you want that message to eventually time out and be re-downloaded for processing on another node.
Yes, SQS is built on a polling model. For example, I have a number of use cases in which I use a minutely cron job to poll for new messages in the queue and take action on any messages found. This pattern is stupid simple to build and works wonders for a bunch of use cases -- a handy little "client" script that pushes a message into the queue, and the cron activated script that will process that message within a minute or so.
If your message pattern is extremely sparse -- eg, only a few messages a day -- it may seem wasteful to poll constantly while the queue is empty. It hardly matters.
My original calculation was that a minutely cron job would cost $0.04 (now $0.02) per month. Since then, SQS added a "Long-Polling" feature that lets you achieve sub-second latency on processing new messages by sending 1 "long-poll" message every 20 seconds to poll an idle queue. Plus, they dropped the price 50%. So per month, that's 131k messages (~$0.06), a little bit more expensive, but with near realtime request processing.
Keep in mind that a minutely cron job I described only costs ~$0.04 / month in request load (30d*24h*60m * 1c / 10k msgs). So at a minutely clip, cost shouldn't really be a concern here. Even polling every second, the price rises only to $2.59 / mo, not exactly a bank buster.
However, it is possible to avoid frequent polling using a webservice that takes an SNS HTTP message. Such an architecture would work as follows: client pushes message to SNS, which pushes message to SQS and routes an HTTP request to your webservice, triggering it to drain the queue. You'd still want to poll the queue hourly or daily, just in case an HTTP request was dropped. In the end though, I'm not sure I can think of any scenario which really justifies such complexity. I'd much rather pay $0.04 a month to have a dirt simple cron job polling my queue.

How to throttle Delayed job to not anger the Facebook API

I'm building an app that will be hitting the Facebook graph api a lot. I learned they have a rate limit of 600 requests every 600 seconds.
I'm using delayed job for all my background processing. What is a good way to schedule delayed job to stay under the fb api rate limit? Are there any tricks with delayed job or do I need to build a separate background task processor to not go over my rate limit?
Thanks
600 requests every 600 sec is 1 per sec on avg.
Not very fast!
1) Depending on your company's size and heft, I'd investigate with FB to see if you can get the limit raised for you.
2) You can stick with DelayedJob, no need to re-invent the wheel. You just need to change the scheduler.
In my DelayedJob installation, I use the "run_at" column for more than just setting the time to retry the jobs--I also use it as the time to run the job in the first place. You can also use it to throttle your jobs.
Changed in the DelayedJob file job.rb:
# added run_at param
# eg Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...'), 0,
# Delayed::Job.db_time_now + 15.minutes
def self.enqueue(object, priority = 0, run_at = nil)
unless object.respond_to?(:perform)
raise ArgumentError, 'Cannot enqueue items which do not respond to perform'
end
Job.create(:payload_object => object, :priority => priority,
:run_at => run_at)
end
For your goal, I would keep track of the last time a FB api call was enqueued, and schedule the next one to run_at a time at least a sec greater.
Benefit: you would be able to interleave other, non-FB tasks, in with the FB api calls.
A bit of a shameless plug but you might want to try SimpleWorker, a cloud-based background processing / worker queue for Ruby apps. You can schedule one or more jobs to come off queue and hit the FB api when you need to. All the scheduling and queue management is handled by SimpleWorker and processing is also done in the cloud.
It's built for just this type of use.
Suggest you also check out the mini_fb gem for working with FB (Appoxy is the creator and maintainer).
Let us know us if you need any help.
Ken # SimpleWorker

Resources