Testing rate-limited external API calls with VCR and RSpec - ruby-on-rails

In my Rails project, I'm using VCR and RSpec to test HTTP interactions against an external REST web service that only allows calls to it once per second.
What this means so far is that I end up running my test suite until it fails due to a "number of calls exceeded" error from the web service. At that stage though, at least some cassettes get recorded, so I just continually run the test suite until eventually I get them all recorded and the suite can run using only cassettes (my default_cassette_options = { record: :new_episodes }). This doesn't seem like an optimal way to do things, especially if I find I need to re-record my cassettes in the future often, and I worry that constant calls could land me on a blacklist with the web service (there's no test server they have that I know about).
So, I ended up trying putting calls to sleep(1) in my Rspec it blocks directly before the call to the web service is made, and then refactored those calls up into the VCR configuration:
spec/support/vcr.rb
VCR.configure do |c|
# ...
c.after_http_request do |request, response|
sleep(1)
end
end
Although this seems to work fine, is there a better way to do this? At the moment, if a call to an external service that doesn't have a cassette already is the final test in the suite, then the suite sleeps unnecessarily for 1 second. Likewise, if the time between 2 web service calls without cassettes in the test suite is more than once second, then there's another unnecessary pause. Has anyone made any kind of logic to test for these kinds of conditions, or is there a way to elegantly do this in the VCR configuration?

First off, I would recommend against using :new_episodes as your record mode. It has it's uses, but the default (:once) is generally what you want. For accuracy, you want to record a cassette as a sequence of HTTP requests that were made in a single pass. With :new_episodes, you can wind up with cassettes that contain HTTP interactions that were recorded months apart but are now being played back together, and the real HTTP server may not respond in that same fashion.
Secondly, I'd encourage you to listen to the pain exposed by your tests, and find ways to decouple most of your test suite from these HTTP requests. Can you find a way to make it so that just the tests focused on the client, and the end-to-end acceptance tests make the requests? If you wrap the HTTP stuff in a simple interface, it should be easy to substitute a test double for all the other tests, and more easily control your inputs.
That's a longer term fix, though. In the short term, you can tweak your VCR config like so:
VCR.configure do |vcr|
allow_next_request_at = nil
filters = [:real?, lambda { |r| URI(r.uri).host == 'my-throttled-api.com' }]
vcr.after_http_request(*filters) do |request, response|
allow_next_request_at = Time.now + 1
end
vcr.before_http_request(*filters) do |request|
if allow_next_request_at && Time.now < allow_next_request_at
sleep(allow_next_request_at - Time.now)
end
end
end
This uses hook filters (as documented) to run the hooks only on real requests to the API host. allow_next_request_at is used to sleep the minimum amount of time necessary.

An alternative may be to use APICache as a proxy around your HTTP library, as it will handle rate limiting on your behalf.
APICache.get("my_albums", period => 1) do
FlickrRb.get_all_sets
end
This will raise APICache::CannotFetch when you attempt to call the API more often than your limit.
Here's a link to the APICache Github repo

Related

Rails avoid blocking worker in slow controller

Generally any DB/File IO even external HTTP requests are pretty quick, but I am finding slower ones can hold up all my workers (and memory limits how many Ruby instances I can run), and creating large numbers of threads per worker has other issues (with CPU or memory heavy actions clogging up the system).
Can I have Rails process these actions in an async manner (more like NodeJS) or else introduce threads for that action in some way?
Since I want to respond to the original request, neither workers or just spawning another thread myself seems appropriate, since Rails will ensure the original thread sends a response when it returns from the controller.
def my_action
#data1 = get_data("https://slow.com/data") #e.g. Net::HTTP
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
def my_action
get_data("https://slow.com/data").then do |data1| # e.g. internal thread, not sure on other options
get_data("https://slow.com/data2?group_id=#{data["id"]}").then do |data2|
#data1 = data1
#data2 = data2
render # Appears to have no effect
end
end
# Rails does an implicit "render" on return
end
def my_action
Thread.new do # explicit thread just for this request
#data1 = get_data("https://slow.com/data")
#data2 = get_data("https://slow.com/data2?group_id=#{#data["id"]}")
render
end
end
In a Rails application, you're better off relying on an external process to run background jobs rather than using Ruby Threads.
Sidekiq is a pretty standard gem now for this purpose.
If it takes 10 seconds to process a request, and you want to send your response to the original HTTP request, then you've got to hold open that HTTP connection for 10 seconds. You can't get around that. If your server can handle X HTTP connections, and you have X+1 people making these slow requests... someone is going to get blocked.
There are only three possible solutions:
Figure out a way to process the requests faster. This is ideal, if you can do it.
Don't hold open the HTTP connection. Run a background task (using Sidekiq or similar gem) to do the work. When it's done, send it via websocket, or have the client poll for it. It makes your API more complicated for the client, but as a client I'd rather deal with a little complexity than having my requests blocked and maybe time out.
Scale up your server until it can handle the traffic. This is the "throw money at the problem" solution. I generally disapprove of this, since you'll have to keep throwing more money every time demand grows. But if your organization has more money than dev time, it might work for a while.
Those are your options.

Pull/push status in rails 3

I have a longer running task in the background, and how exactly would I let pull status from my background task or would it better somehow to communicate the task completion to my front end?
Background :
Basically my app uses third party service for processing data, so I want this external web service workload not to block all the incoming requests to my website, so I put this call inside a background job (I use sidekiq). And so when this task is done, I was thinking of sending a webhook to a certain controller which will notify the front end that the task is complete.
How can I do this? Is there a better solution for this?
Update:
My app is hosted on heroku
Update II:
I've done some research on the topic and I found out that I can create a seperate app on heroku which will handle this, found this example :
https://github.com/heroku-examples/ruby-websockets-chat-demo
This long running task will be run per user, on a website with a lot of traffic, is this a good idea?
I would implement this using a pub/sub system such as Faye or Pusher. The idea behind this is that you would publish the status of your long running job to a channel, which would then cause all subscribers of that channel to be notified of the status change.
For example, within your job runner you could notify Faye of a status change with something like:
client = Faye::Client.new('http://localhost:9292/')
client.publish('/jobstatus', {id: jobid, status: 'in_progress'})
And then in your front end you can subscribe to that channel using javascript:
var client = new Faye.Client('http://localhost:9292/');
client.subscribe('/jobstatus', function(message) {
alert('the status of job #' + message.jobid + ' changed to ' + message.status);
});
Using a pub/sub system in this way allows you to scale your realtime page events separately from your main app - you could run Faye on another server. You could also go for a hosted (and paid) solution like Pusher, and let them take care of scaling your infrastructure.
It's also worth mentioning that Faye uses the bayeaux protocol, which means it will utilise websockets where it is available, and long-polling where it is not.
We have this pattern and use two different approaches. In both cases background jobs are run with Resque, but you could likely do something similar with DelayedJob or Sidekiq.
Polling
In the polling approach, we have a javascript object on the page that sets a timeout for polling with a URL passed to it from the rails HTML view.
This causes an Ajax ("script") call to the provided URL, which means Rails looks for the JS template. So we use that to respond with state and fire an event for the object to response to when available or not.
This is somewhat complicated and I wouldn't recommend it at this point.
Sockets
The better solution we found was to use WebSockets (with shims). In our case we use PubNub but there are numerous services to handle this. That keeps the polling/open-connection off your web server and is much more cost effective than running the servers needed to handle these connection.
You've stated you are looking for front-end solutions and you can handle all the front-end with PubNub's client JavaScript library.
Here's a rough idea of how we notify PubNub from the backend.
class BackgroundJob
#queue = :some_queue
def perform
// Do some action
end
def after_perform
publish some_state, client_channel
end
private
def publish some_state, client_channel
Pubnub.new(
publish_key: Settings.pubnub.publish_key,
subscribe_key: Settings.pubnub.subscribe_key,
secret_key: Settings.pubnub.secret_key
).publish(
channel: client_channel,
message: some_state.to_json,
http_sync: true
)
end
end
The simplest approach that I can think of is that you set a flag in your DB when the task is complete, and your front-end (view) sends an ajax request periodically to check the flag state in db. In case the flag is set, you take appropriate action in the view. Below are code samples:
Since you suggested that this long running task needs to run per user, so let's add a boolean to users table - task_complete. When you add the job to sidekiq, you can unset the flag:
# Sidekiq worker: app/workers/task.rb
class Task
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
# Long running task code here, which executes per user
user.task_complete = true
user.save!
end
end
# When adding the task to sidekiq queue
user = User.find(params[:id])
# flag would have been set to true by previous execution
# In case it is false, it means sidekiq already has a job entry. We don't need to add it again
if user.task_complete?
Task.perform_async(user.id)
user.task_complete = false
user.save!
end
In the view you can periodically check whether the flag was set using ajax requests:
<script type="text/javascript">
var complete = false;
(function worker() {
$.ajax({
url: 'task/status/<%= #user.id %>',
success: function(data) {
// update the view based on ajax request response in case you need to
},
complete: function() {
// Schedule the next request when the current one's complete, and in case the global variable 'complete' is set to true, we don't need to fire this ajax request again - task is complete.
if(!complete) {
setTimeout(worker, 5000); //in miliseconds
}
}
});
})();
</script>
# status action which returns the status of task
# GET /task/status/:id
def status
#user = User.find(params[:id])
end
# status.js.erb - add view logic based on what you want to achieve, given whether the task is complete or not
<% if #user.task_complete? %>
$('#success').show();
complete = true;
<% else %>
$('#processing').show();
<% end %>
You can set the timeout based on what the average execution time of your task is. Let's say your task takes 10 minutes on average, so their's no point in checking it at a 5sec frequency.
Also in case your task execution frequency is something complex (and not 1 per day), you may want to add a timestamp task_completed_at and base your logic on a combination of the flag and timestamp.
As for this part:
"This long running task will be run per user, on a website with a lot of traffic, is this a good idea?"
I don't see a problem with this approach, though architectural changes like executing jobs (sidekiq workers) on separate hardware will help. These are lightweight ajax calls, and some intelligence built into your javascript (like the global complete flag) will avoid the unnecessary requests. In case you have huge traffic, and DB reads/writes are a concern then you may want to store that flag directly into redis instead (since you already have it for sidekiq). I believe that will resolve your read/write concerns, and I don't see that it is going to cause problems. This is the simplest and cleanest approach I can think of, though you can try achieving the same via websockets, which are supported by most modern browsers (though can cause problems in older versions).

Testing this rails controller - While making API Calls?

I am no stranger to testing. I pride my self on have 97% - 100% test coverage. In fact anything below 95% is poor (but thats off topic). I have the following rails controller:
module Api
module Internal
class TwitterController < Api::V1::BaseController
# Returns you 5 tweets with tons of information.
#
# We want 5 specific tweets with the hash of #AisisWriter.
def fetch_aisis_writer_tweets
tweet_array = [];
tweet = twitter_client.search("#AisisWriter").take(5).each do |tweet|
tweet_array.push(tweet)
end
render json: tweet_array
end
private
# Create a twitter client connection.
def twitter_client
client = Twitter::REST::Client.new do |config|
config.consumer_key = ENV['CONSUMER_KEY']
config.consumer_secret = ENV['CONSUMER_SECRET_KEY']
config.access_token = ENV['ACCESS_TOKEN']
config.access_token_secret = ENV['ACCESS_TOKEN_SECRET']
end
end
end
end
end
It's extremely basic to see whats going on. Now I could write the rspec tests to say call this action, I expect json['bla']['text'] to eql bla.
But there is a couple issues. In order to effectively test this you need twitter API credentials. Thats coupling my code with another service that I am hoping is up and running.
In fact my controller is essentially coupled to twitter.
So - My question is, with out having to mock a web service or a api call (I have seen some blog posts out there on this, and for this piece of code, I feel they are over kill) - How would you test this?
Some people have suggested VCR. Any thoughts on testing API calls like this?
I've found VCR to be a great tool for tests like this - where you don't need a ton of control over what the external service returns, because you don't have a lot of cases to test. You just want to eliminate test flakiness based on whether or not the service is up, and you want to make sure that you get exactly the same fake response every time. I wouldn't say VCR is overkill at all, it's very simple to use - you just wrap your test in a use_cassette block, run your test, and VCR records the actual response from the service and uses it as the mocked response from then on.
I will say that the "cassettes" that VCR uses to store the mocked responses are fairly complex YAML, and they're not super readable/easy to edit. If you want to be able to easily manipulate the data that's returned so that you can test several code paths, and easily read it so that your mocked data can serve as documentation of the code, I'd look into something more like HttpMock.
One other option, of course, would be to just stub out the private method that calls the external service and have it return your mock data directly. Usually I'd avoid that, so that you can refactor your private method and still be covered, but it might be an option in some cases where the private method is dead simple and unlikely to change, and stubbing it out makes for significantly cleaner tests.

writing spec for method that hits a web service

I'm writing a spec to verify that my Video model will create a proper thumbnail for a vimeo video when it is created. It looks something like this:
it "creates thumbnail url" do
vimeo_url = "http://player.vimeo.com/video/12345"
vid = Factory.build(:video, video_url:vimeo_url)
# thumbnail created when saved
vid.save!
expect do
URI.parse(vid.thumbnail_url)
end.to_not raise_error
end
The problem is that my test is super slow because it has to hit vimeo.com. So I'm trying to stub the method that calls to the server. So two questions:
1) Is this the correct way/time to stub something
2) If yes, how do I stub it? In my Video model I have a method called get_vimeo_thumbnail() that hits vimeo.com. I want to stub that method. But if in my spec I do vid.stub(:get_vimeo_thumbnail).and_return("http://someurl.com") it doesn't work. When I run the test it still hits vimeo.com.
The VCR gem is probably worth considering. It hits the real Web service first time you run it and records the response so that it can be replayed next time you run the test (making subsequent tests fast).
I can't see anything wrong with the stub call you are making if you are calling stub before save!.
I also second the use of the 'vcr' gem.
There's also a (pro)-episode of Railscast available about VCR:
http://railscasts.com/episodes/291-testing-with-vcr
VCR can be used to record all outgoing webservice calls into "cassettes" (fixtures) that will be replayed when the tests are run again. So you get the initial set of "real-world" responses but will not hit the remote api anymore.
It also has options to do "on demand" requests when there is no recorded response available locally, and also to make explicit "live" requests.
You can, and should, run tests agains the live endpoint from time to time to verify.

How to do parallel HTTP requests in Heroku?

I'm building a Ruby on Rails app that access about 6-7 APIs, grabs information from them based on user's input, compares and display results to the users (the information is not saved in the database). I will be using Heroku to deploy the app. I would like those HTTP requests to access the APIs to be done in parallel so the answer time is better instead of doing it sequential. What do you think is the best way to achieve this in Heroku?
Thank you very much for any suggestions!
If you want to actually do the requests on the server side (tfe's javascript solution is a good idea), your best bet would be using EventMachine. Using EventMachine gives a simple way to do non-blocking IO.
Also check out EM-Synchrony for a set of Ruby 1.9 fiber aware clients (including HTTP).
All you need to do for a non-blocking HTTP request is something like:
require "em-synchrony"
require "em-synchrony/em-http"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
# iterator will execute async blocks until completion, .each, .inject also work!
results = EM::Synchrony::Iterator.new(urls, concurrency).map do |url, iter|
# fire async requests, on completion advance the iterator
http = EventMachine::HttpRequest.new(url).aget
http.callback { iter.return(http) }
http.errback { iter.return(http) }
end
p results # all completed requests
EventMachine.stop
end
Goodluck!
You could always make the requests client-side using Javascript. Then not only can you run them in parallel, but you won't even need the round-trip to your own server.
I haven't tried parallelizing requests like that. But I've tried parallel on heroku, works like a charm! This is my simple blog post about it.
http://olemortenamundsen.wordpress.com/2010/10/17/spawning-multiple-threads-at-heroku-using-parallel/
Have a look at creating each request as a background job:
http://blog.heroku.com/archives/2009/7/15/background_jobs_with_dj_on_heroku/
The more 'Workers' you buy from Heroku, the more background jobs can be processed concurrently, leaving your 'Dynos' to serve your users.

Resources