How to control Rails app requests to external api? - ruby-on-rails

I am building a Rails 4 (postgres) app on the back of a third party API. For now, the third party API allows with 100 requests per min.
The roundtrip for the user takes about 2000 ms so I want to move this into a worker.
I considered using sidekiq, but with each new user and new background thread comes the possibility that I'll exceed my API quota.
What is the best way to control my applications interaction with the third party API? Do I need a single serial queue to control the rate limit effectively?

I assume you'll get an error (like exception) when you are over the 100 requests limit. If all API requests are in a sidekiq worker, the worker will automatically retry on error. Initially the retry will be quite soon, but you can overwrite the retry time with something like:
sidekiq_retry_in do
rand(60..75)
end
In this way each retry will be 60 to 75 seconds after the error.
You can check more about sidekiq's error handling here: https://github.com/mperham/sidekiq/wiki/Error-Handling

Related

Microsoft Graph API - Throttling

Is there a specific number of requests/minute (specific to a tenant) that an application can make to Microsoft Graph APIs before requests start getting throttled?
No, not specific to a tenant (at least not for the Outlook-related parts of the Graph). Throttling is done per user per app. The threshold is 10000 requests every 10 minutes.
https://blogs.msdn.microsoft.com/exchangedev/2017/04/07/throttling-coming-to-outlook-api-and-microsoft-graph/
For non-Outlook stuff, I'm not sure what the limits are. All Graph has to say about it is here:
https://developer.microsoft.com/en-us/graph/docs/concepts/throttling
The takeaway here is you should not depend on a specific threshold since we can always change it if we need to in order to protect the integrity of the service. Ensure that your app can gracefully handle being throttled by handling the 429 error response properly.

What is FedEx API limits? How many times I can call the APIs?

I have gone through documentation here provided http://www.fedex.com/us/developer/web-services/index.html but I am not able to find how many times I can query FedEx APIs
Does anyone have one idea or experience?
I don't want to put a bug in production code, so taking precautions.
Documentation I follow
Thanks
There are no hard API limits for web services. FedEx does audit logs so they will shut you down if you're sending too many requests, especially with tracking.
Seems like they do have rate limits as of now (2022):
According to this page
The throttling limit is set to 250 transaction over 10 seconds. If
this limit is reached in the first few seconds, HTTP error code 429
Too many requests will be returned and transactions will be restricted
until 10 seconds is reached; transactions will then resume again. For
example, if we receive 250 requests in the first four seconds, an HTTP
error code 429 Too many requests - ‘We have received too many requests
in a short duration. Please wait a while to try again.’ will be
returned and transactions will be restricted for the next six seconds
and then resume again.

When to hand off work from web server to worker

I'm trying to solidify my understanding of what blocking means in terms of requests to a web server and when it's smart to hand off requests to a separate worker (i.e. sidekiq).
Consider the following examples:
Login with Facebook
def sign_in
response = Faraday.get("https://graph.facebook.com/me?fields=id,email&access_token=#{some_token}")
user_info = JSON.parse(response.body)
#user = User.find_by(uid: user_info["id"])
...
end
Send push notification through Google Firebase
def send_push_notification
...
fcm = FCM.new(FCM_KEY)
registration_ids = [recipient_token]
resp = fcm.send(registration_ids, data: {body: notification_params[:body]})
...
end
In both examples, the web requests to a 3rd-party service are synchronous and possibly costly. Intuitively, I would try to handle these cases with a separate worker because they block the main application. But, I am not 100% sure what blocking means. Does it mean that when there are 100 users trying to sign_in and each Faraday.get call takes 1 second, it will take 100 seconds for all the users to sign in?
Does it mean that when there are 100 users trying to sign_in and each
Faraday.get call takes 1 second, it will take 100 seconds for all the
users to sign in?
Simplistic answer: yes.
In a very simple scenario, the 1st user will wait 1 second, the 2nd user will wait 2 seconds and so on.
If your application/web server doesn't abort the user request, the 100th user will wait for 100 seconds.
A bit more detailed: depends.
Today, modern web applications (like Puma) have more than 1 process worker running in your machine. This means that your application is able to handle more than 1 request concurrently.
For example: if you have Puma configured to use 2 workers, your application will handle the requests of 2 users at same time.
Thus, the 1st and 2nd users will wait 1 second, the 3rd and 4th users will wait 2 seconds and the 99th and 100th users will wait 50 seconds.
As each Puma process consumes a considerable amount of CPU and memory, you cannot have infinite workers. That's why is very interesting have a background process to send these push notifications.
In Sidekiq (for example) the cost to delegate a job to a worker is extremely slow and thus the users of your website won't be penalized.

Using Puma and Sidekiq in a backend Rails app

I have a backend Rails server with Sidekiq, which serves as API server. The app works as follow:
My Rails server receives many requests from incoming API clients at the same time.
For each of these requests, the Rails server will allocate jobs to a Sidekiq server. Sidekiq server makes requests to external APIs (such as Facebook) to get data, and analyze it and return a result to Rails server.
For example, if I receive 10 incoming requests from my API clients, for each request, I need to make 10 requests to external API servers, get data and process it.
My challenge is to make my app responds to incoming requests concurrently. That is, for each incoming request, my app should process in parallel: make calls to external APIs, get data and return result.
Now, I know that Puma can add concurrency to Rails app, while Sidekiq is multi-threaded.
My question is: Do I really need Sidekiq if I already have Puma? What would be the benefit of using both Puma and Sidekiq?
In particular, with Puma, I just invoke my external API calls, data processing etc. from my Rails app, and they will automatically be concurrent.
Yes, you probably do want to use Puma and Sidekiq. There are really two issues at play here.
Concurrency (as it seems you already know) is the number of web requests that can be handled simultaneously. Using an app server like Puma or Unicorn will definitely help you get better concurrency than the default web brick server.
The other issue at play is the length of time that it takes your server to process a web request.
The reason that these two things are related is that number or requests per second that your app can process is a function of both the average processing time for each request and the number of worker processes that are accepting requests. Say your average response time is 100ms. Then a single web worker can process 10 requests per second. If you have 5 workers, then you can handle 50 requests per second. If your average response time is 500ms, then you can handle 2 reqs/sec with a single worker, and 10 reqs/sec with 5 workers.
Interacting with external APIs can be slow at times, and in the worst cases it can be very unreliable with unresponsive servers on the remote end, or network outages or slowdowns. Sidekiq is a great way to insulate your application (and your end users) from the possibility that the remote API is responding slowly. Imagine that the remote API is running slowly for some reason and that the average response time from it has slowed down to 2 seconds per request. In that case you'd only be able to handle 2.5 reqs/sec with 5 workers. With anymore traffic than that your end users might start to have a long wait time before any page on your app could respond, even those that don't make remote API calls, because all of your web workers might be waiting for the slow remote API to respond. As traffic continues to increase your users would start getting connection timeouts.
The idea with using Sidekiq is that you separate the time spent waiting on the external API from your web workers. You'd basically take the request for data from your user, pass it to Sidekiq, and then immediately return a response to the user that basically says "we're processing your request". Sidekiq can then pick up the job and make the external request. After it has the data it can save that data back into your application. Then you can use web sockets to push a notification to the user that the data is ready. Or even push the data directly to them and update the page accordingly. (You could also use polling to have the page continually asking "is it ready yet?", but that gets very inefficient very quickly.)
I hope this makes sense. Let me know if you have any questions.
Sidekiq, like Resque and Delayed Job, is designed to provide asynchronous job processing from a queue.
If you don't need jobs to be queued up and run asynchronously, there's no substantial benefit (or harm) to using Sidekiq.
If the tasks need to run synchronously (which it sounds like you might—it's not clear if clients are waiting for data or just requesting that jobs run), Sidekiq and its relatives are likely the wrong tool for the job. There is no guaranteed processing time when using Sidekiq or other solutions; jobs are pushed onto the end of the stack, however long that may be, and won't be processed until their turn comes up. If clients are waiting for data, they may time out long before your worker pool ever processes their jobs.

Deferring blocking Rails requests

I found a question that explains how Play Framework's await() mechanism works in 1.2. Essentially if you need to do something that will block for a measurable amount of time (e.g. make a slow external http request), you can suspend your request and free up that worker to work on a different request while it blocks. I am guessing once your blocking operation is finished, your request gets rescheduled for continued processing. This is different than scheduling the work on a background processor and then having the browser poll for completion, I want to block the browser but not the worker process.
Regardless of whether or not my assumptions about Play are true to the letter, is there a technique for doing this in a Rails application? I guess one could consider this a form of long polling, but I didn't find much advice on that subject other than "use node".
I had a similar question about long requests that blocks workers to take other requests. It's a problem with all the web applications. Even Node.js may not be able to solve the problem of consuming too much time on a worker, or could simply run out of memory.
A web application I worked on has a web interface that sends request to Rails REST API, then the Rails controller has to request a Node REST API that runs heavy time consuming task to get some data back. A request from Rails to Node.js could take 2-3 minutes.
We are still trying to find different approaches, but maybe the following could work for you or you can adapt some of the ideas, I would love to get some feedbacks too:
Frontend make a request to Rails API with a generated identifier [A] within the same session. (this identifier helps to identify previous request from the same user session).
Rails API proxies the frontend request and the identifier [A] to the Node.js service
Node.js service add this job to a queue system(e.g. RabbitMQ, or Redis), the message contains the identifier [A]. (Here you should think about based on your own scenario, also assuming a system will consume the queue job and save the results)
If the same request send again, depending on the requirement, you can either kill the current job with the same identifier[A] and schedule/queue the lastest request, or ignore the latest request waiting for the first one to complete, or other decision fits your business requirement.
The Front-end can send interval REST request to check if the data processing with identifier [A] has completed or not, then these requests are lightweight and fast.
Once Node.js completes the job, you can either use the message subscription system or waiting for the next coming check status Request and return the result to the frontend.
You can also use a load balancer, e.g. Amazon load balancer, Haproxy. 37signals has a blog post and video about using Haproxy to off loading some long running requests that does not block shorter ones.
Github uses similar strategy to handle long requests for generating commits/contribution visualisation. They also set a limit of pulling time. If the time is too long, Github display a message saying it's too long and it has been cancelled.
YouTube has a nice message for longer queued tasks: "This is taking longer than expected. Your video has been queued and will be processed as soon as possible."
I think this is just one solution. You can also take a look EventMachine gem, that helps to improve the performance, handler parallel or async request.
Since this kind of problem may involve one or more services. Think about possibility of improving performance between those services(e.g. database, network, message protocol etc..), if caching may help, try out caching frequent requests, or pre-calculate results.

Resources