I've an app that's set up to make scheduled calls to a number of APIs once a day. This works very nicely but i'm aware that some of the APIs i'm calling (Twitter for example) have a rate limit. As the number of calls i'm making is set to continually grow, can anyone recommend a way to throttle my calls so I can send in bursts of x per hour/minute etc?
I've found the Glutton Ratelimit gem, is anyone using this and is it any good? Are there others I should be looking at?
If you're using some kind of background worker to perform your API calls, you could reschedule the task to be reperformed in the next time slot, when the rate limits have been reset.
class TwitterWorker
include Sidekiq::Worker
def perform(status_id)
status = Twitter.status(status_id)
# ...
rescue Twitter::Error::TooManyRequests
# Reschedule the query to be performed in the next time slot
TwitterWorker.perform_in(15.minutes, status_id)
end
end
No scientific solution though, there's e.g. the risk that a query might be rescheduled each time if you try to perform much more API calls in a day than the rate limit allows for. But until then, something easy might do the trick!
Another solution is to buy proxies which allow you to send request with different IP addresses
Use standard http lib http://ruby-doc.org/stdlib-2.0/libdoc/net/http/rdoc/Net/HTTP.html#method-c-Proxy
I am not sure that you will not be blocked but maybe it is worth to try. Randomly choosen IP should increase your limits
Unless you're making concurrent requests there's not much to it.
Figure out how much delay you need per request
Check the time before the request, subtract from the time after the
request and sleep the rest.
With concurrent requests you can be more accurate, I once blogged about that here
I know this is an old question, but wanted to mention something in case it helps others with the same question.
If the work can be queued to jobs using resque, then you could use the gem I've just released which pauses a queue when you hit a rate_limit - and unpauses it some time later.
https://github.com/pavoni/resque-rate_limited_queue
Related
I have an App that has the locations of 10 different places.
Given your current location, the app should return the estimated arrival time for those 10 locations.
However, Apple has said that
Note: Directions requests made using the MKDirections API are server-based and require a network connection.
There are no request limits per app or developer ID, so well-written apps that operate correctly should experience no problems. However, throttling may occur in a poorly written app that creates an extremely large number of requests.
The problem is that they make no definition on what a well written app is. Is 10 requests bad? Is 20 requests an extremely large number?
Has any one done an app like this before to provide some guidance? If Apple does begin throttling the requests, then people will blame my app and not Apple. Some advice please..
Hi investigate class MKRoute this class contains all information you need.
This object contains
expectedTravelTime
Also you should consider LoadingThrottled
The data was not loaded because data throttling is in effect. This
error can occur if an app makes frequent requests for data over a
short period of time.
For prevent your request from bing throttled, reduce number of requests.
Try to use Completion Handlers to know if you request is finished and only after send another request or cancel previous. From my experience try to handle this request just as regular network request just be sure you are not spamming unnecessary requested to the Apple API. But this is not 100% guarantee that Apple won't throttle your requests.
Is there a way in Rails to send out a request at a certain time?
I'm using an external credit card charging API, and I want to adjust each monthly subscription based on how many referrals they have (10% each, 10 referrals max). The API has a beta referral system built in, but it doesn't seem to work the way I need it to. Plus, there are just too many unknowns that I'd rather not get into at the moment. I just want to get it up and working, and since my system is fairly simple, I'd rather just do it manually.
There's a billing date for each subscription, and what I want to do is just manually adjust the price of the subscription based on how many active users there are containing the referral code of the user being charged. I'd like to just send out this request to the API just before they're billed. Like sometime around subscription.next_billing_at - 1.minute.
Then just set the subscription.price to price - (price * (User.where(referral_code: current_user_code)).count / 10).
I'm aware this is far from an optimal approach, considering the amount of extra requests being made each month, but since we're small right now, it shouldn't be a problem. Again, it's just a temporary solution so we can get things running now.
There are two options which directly answer your question.
Write a rake task and run it daily with cron via the Whenever gem. If you take this approach, you will have to have the task just load all subscriptions which are due to be billed in the next cycle and update them as required.
Alternatively, use something like Resque-scheduler, which would allow you to run some task at next_billing_at - 1.minute or something.
But if you are small, why not just update the price every time a new referral is created using a callback? Unless there are specific rate or query limits on this API, I doubt a card processor is going to be affected by the traffic you generate. Of course if there are other requirements, like, a referral only applies after a month or something like that, you are going to be stuck with one of the first 2 options, and the Cron + Rake task is probably the best solution in that case.
Would like to maintain a local record of the price of all the phone calls that my application makes.
Am not sure what a good pattern for this would be. It looks like the price is not available in the arguments provided during the status call back when the call is closed. I assume this means I'll need to query Twilio's servers to find the price of the call. Can I do this immediately or do I need to wait a certain amount of time for the price to populate?
Is there another pattern that would be simpler, require fewer steps, or be less error prone that I am not seeing here?
Thanks!
Twilio evangelist here.
I'd recommend checking out the Usage Records API. These handy API's give you an easy way to get rollup data for your account, like how much your account spent yesterday, or how many outbound calls it made.
You can also set up Usage Triggers to proactively notify you when threshholds are met.
Hope that helps.
In my Rails application, I have a long calculation requiring a lot of database access.
To make it short, my calculation took 25 seconds.
When implementing the same calculation within a background job (a big single worker), the same calculation take twice the same time (ie 50 seconds). I have try several technics to put the job in a background process put none add an impact on my performances => using DelayJob / Sidekiq / doing the process within my rails but in a thread created for the work, but all have the same impact on my performances *2.
This performance difference only exist in rails 'production' environment. It looks like there is an optimisation done by rails that is not done in my background job.
My technical environment is the following =>
I am using ruby 2.0 / rails 4
I am using unicorn (but I have same problem without it).
The job is using Rails.cache to store some partial computation.
I am using postgresql
Does anybody has an clue where this impact might come from ?
I'm assuming you're comparing the background job speed to the speed of running the operation during a web request? If so, you're likely benefiting from Rails's QueryCache, which caches db queries during a web request. Try disabling it like described here:
Disabling Rails SQL query caching globally
If that causes the web request version of the job to take as long as the background job, you've found your culprit. You can then enable the query cache on your background job to speed it up (if it makes sense for your application).
Background job is not something that need to used for speed-up things. It's main meaning is to 'fire and forget' and remove 25 seconds of calculating synchronously and adding some more of calculating asynchronously. So you can give user response that she's request is processing and return with calculation later.
You may take speed gain from background job by splitting big task on some small and running them at same time. In your case I think it's something impossible to use, because of dependency of operations in yours calculation.
So if you want to speed you calculation, you need to look into denormalization of your data structure, storing some calculated values for your big calculation on moment when source data for this calculation updated. So you will calculate less on user request for results and more on data storage. And it's good place for use background job. So you finish your update of data, create background task for update caches. And if user request for calculation comes before this task is finished you will still need to wait for cache fill-up.
Update: I think I am still need to answer your main question. So basically this additional time on background task processing is comes from implementation. Because of 'fire and forget' approach no one need that background task scheduler will consume big amount of processor time just monitoring for new jobs. I am not sure completely but think that if your calculation will be two times more complex, time gain will be same 25 seconds.
My guess is that the extra time is coming from the need for your background worker to load rails and all of your application. My clue is that you said the difference was greatest with Rails in production mode. In production mode, subsequent calls to the app make use of the app and class cache.
How to check this hypotheses:
Change your background job to do the following:
print a log message before you initiate the worker
start the worker
run your calculation. As part of your calculation startup, print a log message
print another log message
run your calculation again
print another log message
Then compare the two times for running your calculation.
Of course, you'll also gain some extra time benefits from database caching, code might remain resident in memory, etc. But if the second run is much much faster, then the fact that the second run didn't restart Rails is more significant.
Also, the time between the log message from steps 1 and 3 will also help you understand the start up times.
Fixes
Why wait?
Most important: why do you need the results faster? Eg, tell your user that the result will be emailed to them after it is calculated. Or let your user see that the calculation is proceeding in the background, and later, show them the result.
The key for any long running calculation is to do it in the background and encourage the user to not wait for the result. They should be able to do something else until they get the result.
Start the calculation automatically As soon as the user logs in, or after they do something interesting, start the calculation. That way, when (and if) the user asks for the calculation, the answer will either be already done or will soon be done.
Cache the result and bust the cache as needed Similar to the above, start the calculation periodically and automatically. If the user changes some data, then restart the calculation by busting the cache. There are also ways to halt any on-going calculation if data is changed during the calculation.
Pre-calculate part of the calculation Why are you taking 25 seconds or more for a dbms calculation? Could be that you should change the calculation. Investigate adding indexes, summary tables, de-normalizing, splitting the calculation into smaller steps that can be pre-calculated, etc.
I'd like to infrequently open a Twitter streaming connection with TweetStream and listen for new statuses for about an hour.
How should I go about opening the connection, keeping it open for an hour, and then closing it gracefully?
Normally for background processes I would use Resque or Sidekiq, but from my understanding those are for completing tasks as quickly as possible, not chilling and keeping a connection open.
I thought about using a global variable like $twitter_client but that wouldn't horizontally scale.
I also thought about building a second application that runs on one box to handle this functionality, but that seems excessive if it can be integrated into the main app somehow.
To clarify, I have no trouble starting a process, capturing tweets, and using them appropriately. I'm just not sure what I should be starting. A new app? A daemon of some sort?
I've never encountered a problem like this, and am completely lost. Any direction would be much appreciated!
Although not a direct fix, this is what I would look at:
Time
You're working with time, so I'd look at what time-centric processes could be used to induce the connection for an hour
Specifically, I'd look at running a some sort of job on the server, which you could fire at specific times (programmatically if required), to open & close the connection. I only have experience with resque, but as you say, it's probably not up to the job. If I find any better solutions, I'll certainly update the answer
Storage
Once you've connected to TweetStream, you'll want to look at how you can capture the tweets for that time period. It seems a waste to create a data table just for the job, so I'd be inclined to use something like Redis to store the tweets that you need
This can then be used to output the tweets you need, allowing you to simulate storing / capturing them, but then delete them after the hour-window has passed
Delivery
I don't know what context you're using this feature in, so I'll just give you as generic process idea as possible
To display the tweets, I'd personally create some sort of record in the DB to show the time you're pinging TweetStream that day (if it changes; if it's constant, just set a constant in an initializer), and then just include some logic to try and get the tweets from Redis. If you're able to collect them, show them as you wish, else don't print anything
Hope that gives you a broader spectrum of ideas?