How to do parallel HTTP requests in Heroku? - ruby-on-rails

I'm building a Ruby on Rails app that access about 6-7 APIs, grabs information from them based on user's input, compares and display results to the users (the information is not saved in the database). I will be using Heroku to deploy the app. I would like those HTTP requests to access the APIs to be done in parallel so the answer time is better instead of doing it sequential. What do you think is the best way to achieve this in Heroku?
Thank you very much for any suggestions!

If you want to actually do the requests on the server side (tfe's javascript solution is a good idea), your best bet would be using EventMachine. Using EventMachine gives a simple way to do non-blocking IO.
Also check out EM-Synchrony for a set of Ruby 1.9 fiber aware clients (including HTTP).
All you need to do for a non-blocking HTTP request is something like:
require "em-synchrony"
require "em-synchrony/em-http"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
# iterator will execute async blocks until completion, .each, .inject also work!
results = EM::Synchrony::Iterator.new(urls, concurrency).map do |url, iter|
# fire async requests, on completion advance the iterator
http = EventMachine::HttpRequest.new(url).aget
http.callback { iter.return(http) }
http.errback { iter.return(http) }
end
p results # all completed requests
EventMachine.stop
end
Goodluck!

You could always make the requests client-side using Javascript. Then not only can you run them in parallel, but you won't even need the round-trip to your own server.

I haven't tried parallelizing requests like that. But I've tried parallel on heroku, works like a charm! This is my simple blog post about it.
http://olemortenamundsen.wordpress.com/2010/10/17/spawning-multiple-threads-at-heroku-using-parallel/

Have a look at creating each request as a background job:
http://blog.heroku.com/archives/2009/7/15/background_jobs_with_dj_on_heroku/
The more 'Workers' you buy from Heroku, the more background jobs can be processed concurrently, leaving your 'Dynos' to serve your users.

Related

Rails avoid blocking worker in slow controller

Generally any DB/File IO even external HTTP requests are pretty quick, but I am finding slower ones can hold up all my workers (and memory limits how many Ruby instances I can run), and creating large numbers of threads per worker has other issues (with CPU or memory heavy actions clogging up the system).
Can I have Rails process these actions in an async manner (more like NodeJS) or else introduce threads for that action in some way?
Since I want to respond to the original request, neither workers or just spawning another thread myself seems appropriate, since Rails will ensure the original thread sends a response when it returns from the controller.
def my_action
#data1 = get_data("https://slow.com/data") #e.g. Net::HTTP
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
def my_action
get_data("https://slow.com/data").then do |data1| # e.g. internal thread, not sure on other options
get_data("https://slow.com/data2?group_id=#{data["id"]}").then do |data2|
#data1 = data1
#data2 = data2
render # Appears to have no effect
end
end
# Rails does an implicit "render" on return
end
def my_action
Thread.new do # explicit thread just for this request
#data1 = get_data("https://slow.com/data")
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
end
In a Rails application, you're better off relying on an external process to run background jobs rather than using Ruby Threads.
Sidekiq is a pretty standard gem now for this purpose.
If it takes 10 seconds to process a request, and you want to send your response to the original HTTP request, then you've got to hold open that HTTP connection for 10 seconds. You can't get around that. If your server can handle X HTTP connections, and you have X+1 people making these slow requests... someone is going to get blocked.
There are only three possible solutions:
Figure out a way to process the requests faster. This is ideal, if you can do it.
Don't hold open the HTTP connection. Run a background task (using Sidekiq or similar gem) to do the work. When it's done, send it via websocket, or have the client poll for it. It makes your API more complicated for the client, but as a client I'd rather deal with a little complexity than having my requests blocked and maybe time out.
Scale up your server until it can handle the traffic. This is the "throw money at the problem" solution. I generally disapprove of this, since you'll have to keep throwing more money every time demand grows. But if your organization has more money than dev time, it might work for a while.
Those are your options.

Asynchronous GET request in Rails

I'm working on a Ruby on Rails app that relies on my app making some simple URL calls for user metrics. For part of the tracking I need to make a server-side call prior to the rendering of my index page. This is achieved by calling a specially formatted URL. Currently I'm achieving this in the following way:
url = URI.parse('https://example.tracking.url')
result = Net::HTTP.start(url.host, use_ssl: true, verify_mode: OpenSSL::SSL::VERIFY_NONE) do
|http| http.get url.request_uri, 'User-Agent' => 'MyLib v1.2'
end
The loading of my page seems to be, at times, somewhat delayed. Short of it being a Database latency issue I assume it's just that sometimes the URL takes a extra time to respond and that this is a synchronous request. What is the best way to make asynchronous requests in Rails, Threads maybe? Thanks.
Have you looked into using a delayed job or Thread.new?
I would move it to a helper method and then call Thread.new on the helper method. Personally, I like using delayed_job for handling things that may present a delay with the user interface.

Testing rate-limited external API calls with VCR and RSpec

In my Rails project, I'm using VCR and RSpec to test HTTP interactions against an external REST web service that only allows calls to it once per second.
What this means so far is that I end up running my test suite until it fails due to a "number of calls exceeded" error from the web service. At that stage though, at least some cassettes get recorded, so I just continually run the test suite until eventually I get them all recorded and the suite can run using only cassettes (my default_cassette_options = { record: :new_episodes }). This doesn't seem like an optimal way to do things, especially if I find I need to re-record my cassettes in the future often, and I worry that constant calls could land me on a blacklist with the web service (there's no test server they have that I know about).
So, I ended up trying putting calls to sleep(1) in my Rspec it blocks directly before the call to the web service is made, and then refactored those calls up into the VCR configuration:
spec/support/vcr.rb
VCR.configure do |c|
# ...
c.after_http_request do |request, response|
sleep(1)
end
end
Although this seems to work fine, is there a better way to do this? At the moment, if a call to an external service that doesn't have a cassette already is the final test in the suite, then the suite sleeps unnecessarily for 1 second. Likewise, if the time between 2 web service calls without cassettes in the test suite is more than once second, then there's another unnecessary pause. Has anyone made any kind of logic to test for these kinds of conditions, or is there a way to elegantly do this in the VCR configuration?
First off, I would recommend against using :new_episodes as your record mode. It has it's uses, but the default (:once) is generally what you want. For accuracy, you want to record a cassette as a sequence of HTTP requests that were made in a single pass. With :new_episodes, you can wind up with cassettes that contain HTTP interactions that were recorded months apart but are now being played back together, and the real HTTP server may not respond in that same fashion.
Secondly, I'd encourage you to listen to the pain exposed by your tests, and find ways to decouple most of your test suite from these HTTP requests. Can you find a way to make it so that just the tests focused on the client, and the end-to-end acceptance tests make the requests? If you wrap the HTTP stuff in a simple interface, it should be easy to substitute a test double for all the other tests, and more easily control your inputs.
That's a longer term fix, though. In the short term, you can tweak your VCR config like so:
VCR.configure do |vcr|
allow_next_request_at = nil
filters = [:real?, lambda { |r| URI(r.uri).host == 'my-throttled-api.com' }]
vcr.after_http_request(*filters) do |request, response|
allow_next_request_at = Time.now + 1
end
vcr.before_http_request(*filters) do |request|
if allow_next_request_at && Time.now < allow_next_request_at
sleep(allow_next_request_at - Time.now)
end
end
end
This uses hook filters (as documented) to run the hooks only on real requests to the API host. allow_next_request_at is used to sleep the minimum amount of time necessary.
An alternative may be to use APICache as a proxy around your HTTP library, as it will handle rate limiting on your behalf.
APICache.get("my_albums", period => 1) do
FlickrRb.get_all_sets
end
This will raise APICache::CannotFetch when you attempt to call the API more often than your limit.
Here's a link to the APICache Github repo

Rails 2.3.X - Execute code after request was rendered and returned?

is it possible in rails 2.3.X to start a new chain of commands after a request has been rendered and returned to the requestor?
I need that feature in order to work with an asynchronous API on the other side: They expect a response to their request and after that response is done my rails app should send a new http-request to them (post something to their API)...
What are the possibilities here? Is there something like a after_render hook?
Should I make use of threads or background tasks and how could this be done?
I would be very glad for some solutions :-)
Kind regards
UPDATE: The Return-Code (eg. 200) should be sent to the requestor before the other calls are executed
The easiest thing to do is spawn a new thread. This is assuming that it is a lightweight call and you don't need advanced error logging or retry logic.
Thread.new do
puts "call the api"
end
The two most popular solutions for this are Delayed Job (that Lars mentioned), and Resque:
https://github.com/tobi/delayed_job
https://github.com/defunkt/resque
How about using something like Delayed Job?
I could be wrong, but I think code execution continues after a render, unless you put a return. This is why you get an error if you try to render twice..
Are you rendering html? If so, maybe you can insert some javascript into the rendered page to make a new request to your controller and initiate the further action that you need to take.

My web site need to read a slow web site, how to improve the performance

I'm writing a web site with rails, which can let visitors inputing some domains and check if they had been regiestered.
When user clicked "Submit" button, my web site will try to post some data to another web site, and read the result back. But that website is slow for me, each request need 2 or 3 seconds. So I'm worried about the performance.
For example, if my web server allows 100 processes at most, that there are only 30 or 40 users can visit my website at the same time. This is not acceptable, is there any way to improve the performance?
PS:
At first, I want to use ajax reading that web site, but because of the "cross-domain" problem, it doesn't work. So I have to use this "ajax proxy" solution.
It's a bit more work, but you can use something like DelayedJob to process the requests to the other site in the background.
DelayedJob creates separate worker processes that look at a jobs table for stuff to do. When the user clicks submit, such a job is created, and starts running in one of those workers. This off-loads your Rails workers, and keeps your website snappy.
However, you will have to create some sort of polling mechanism in the browser while the job is running. Perhaps using a refresh or some simple AJAX. That way, the visitor could see a message such as “One moment, please...”, and after a while, the actual results.
Rather than posting some data to the websites, you could use an HTTP HEAD request, which (I believe) should return only the header information for that URL.
I found this code by googling around a bit:
require "net/http"
req = Net::HTTP.new('google.com', 80)
p req.request_head('/')
This will probably be faster than a POST request, and you won't have to wait to receive the entire contents of that resource. You should be able to determine whether the site is in use based on the response code.
Try using typhoeus rather than AJAX to get the body. You can POST the domain names for that site to check using typhoeus and can parse the response fetched. Its extremely fast compared to other solutions. A snippet that i ripped from the wiki page from the github repo http://github.com/pauldix/typhoeus shows that you can run requests in parallel (Which is probably what you want considering that it takes 1 to 2 seconds for an ajax request!!) :
hydra = Typhoeus::Hydra.new
first_request = Typhoeus::Request.new("http://localhost:3000/posts/1.json")
first_request.on_complete do |response|
post = JSON.parse(response.body)
third_request = Typhoeus::Request.new(post.links.first) # get the first url in the post
third_request.on_complete do |response|
# do something with that
end
hydra.queue third_request
return post
end
second_request = Typhoeus::Request.new("http://localhost:3000/users/1.json")
second_request.on_complete do |response|
JSON.parse(response.body)
end
hydra.queue first_request
hydra.queue second_request
hydra.run # this is a blocking call that returns once all requests are complete
first_request.handled_response # the value returned from the on_complete block
second_request.handled_response # the value returned from the on_complete block (parsed JSON)
Also Typhoeus + delayed_job = AWESOME!

Resources