Need caching techniques for REST service calls - ruby-on-rails

I am building a Ruby on Rails application where I need to be able to consume a REST API to fetch some data in (Atom) feed format. The REST API has a limit to number of calls made per second as well as per day. And considering the amount of traffic my application may have, I would easily be exceeding the limit.
The solution to that would be to cache the REST API response feed locally and expose a local service (Sinatra) that provides the cached feed as it is received from the REST API. And of course a sweeper would periodically refresh the cached feed.
There 2 problems here.
1) One of the REST APIs is a search API where search results are returned as an ATOM feed. The API takes in several parameters including the search query. What should be my caching strategy so that cached feed can be uniquely identified against the parameters? That is, for example, if I search for say
/search?q=Obama&page=3&per_page=25&api_version=4
and I get a feed response for these parameters. How do I cache the feed so that for the exact same parameters passed in a call some time later, the cached feed is returned and if the parameters change, a new call should be made to the REST API?
2) The other problem is regarding the sweeper. I don't want to sweep a cached feed which is rarely used. That is, search query Best burgers in Somalia would obviously be very less wanted than say Barak Obama. I do have the data of how many consumers have subscribed to the feed. The strategy here should be that given the number of subscribers to this search query, sweep the cached feeds based on how large this number is. Since the caching needs to happen in the Sinatra application, how would one go about implementing this kind of sweeping strategy? Some code will help.
I am open to any ideas here. I want these mechanisms to be very good on performance. Ideally I would want to do this without database and by pure page caching. However, I am open to possibility of trying other things.

Why would you want to replicate the REST service as a Sinatra app? You could easily just make a model inside your existing Rails app to cache the Atom feeds (storing the whole feed as a string inside for example).
a CachedFeed Model which is updated when its "updated_at" is far enough away to be renewed.
You could even use static caching for your cachedFeed Controller to reduce the strain on your system.
Having the cache inside your Rails app would greatly reduce complexity in terms of when to renew your cache or even count the requests performed against the rest api you query.
You could have model logic to distribute the calls you have to the most popular feeds. Tthe search parameter could just an attribute of your model so you can easily find and distinguish them

Related

Should I query and filter on the back-end (Rails API) or front-end (React/Redux)

I have an app that allows users to sort and filter through 30,000 items of data. Right now I make fetch requests from Redux actions to my rails API, with the queries being handled by scope methods on my rails end. My instructor is recommending that I move all my querying to my front-end for efficiency, but I'm wondering if it really will be more performant to manage a Redux state object with 30,000 objects in it, each with 50 of their own attributes.
(A couple extra notes: Right now I've only run the app locally and I'm doing the pagination server-side so it runs lightning fast, but I'm a bit nervous about when I launch it somewhere like Heroku. Also, I know that if I move my querying to the front-end I'll have more options to save the query state in the URL with react-router, but I've already sort of hacked a way around that with my existing set-up.)
Let's have a look at the pros and cons of each approach:
Querying on Front End
👍 Querying does not need another network request
👎 Network requests are slower because there is more data to send
👎 App must store much more data in memory
👎 Querying is not necessarily more efficient because the client has to do the filtering and it usually does not have the mechanisms to do so effectively (caching and indexing).
Querying on Back End
👍 Less data to send to client
👍 Querying can be quite fast if database indexes are set up properly
👍 App is more lightweight, it only holds the data it needs to display
👎 Each query will require a network request
The pros of querying on Back End heavily outweighs that on Front End. I would have to disagree with your instructor's opinion. Imagine you want to search for something on Google and Google sends all relevant results you want to your browser and does the pagination and sorting within your browser, your browser would feel extremely sluggish. With proper caching and adding database indexes to your data, network requests will not be a huge disadvantage.

Best practice for Rails third party API calls?

I have a rails app that calls a third party API for weather.
The problem is that the API call is generally very slow and sometimes fails.
Showing the weather is not a necessity but it adds a nice bit of extra and pertinent information.
Right now I call the Wunderground API using Barometer gem in the controller which means the pages takes forever to load if the API is slow or fails.
I was hoping to move to this call to an AJAX call from the page once the page is loaded. I don't mind if the information shows but a bit delayed because as mentioned it is not hugely important.
I was just curious the best practices for making such a call? What is the Rails way?
The recommended way is to call to the API in the background (using a scheduler) and save the result in the database. Then in the controller you can get the data from the database and there won't be any delay.
I would say that you are quite correct in moving to an AJAX call from the browser- that way your page load is unaffected and it can take as long as it likes without your server having to wait on it. This is a classic case for loading the data asynchronously ( through callbacks and/or jQuery's deferredapproach ) so that everything else is available while the data loads and your users aren't waiting on some information that they might not be very interested in to start with.
In terms of keeping it Rails, your main consideration is whether you can and/or want to make the call directly from the browser to the service, or whether you want to proxy it through your application to some degree, which would save on potential cross-domain request problems. Again this is very much your decision and will depend on whether you have any API keys you need to transmit with requests and so on, but if the request can run directly from the user to the weather API then that would allow you to cut out the intermediate step on your part.

Multiple RESTful Web Service Calls vs. MySQL JOINs

I am currently constructing a RESTful web service using node.js for one of my current iPhone applications. At the moment, the system works as follows:
client makes requests to node.js server, server does appropriate computations and MySQL lookups, and returns data
client's reactor handles the response and updates the UI
One thing that I've been thinking about is the differences (in terms of performance and best practice) of making multiple API calls to my server vs one call which executes multiple join statements in the MySQL database and then returns a constructed object.
For example:
Lets say I am loading a user profile to display in the UI. A user has a profile picture, basic info, and news feed items. Using option one I would do as follows:
Make a getUser request to the server, which would do a query in the DB like this:
Select * from user join user_info on user.user_id=user_info.user_id left join user_profile_picture on user_profile_picture.user_id=user.user_id.
The server would then return a constructed user object containing the info from each table
Client waits for a response for the server and updates everything at once
Option 2 would be:
Make 3 asynchronous requests to the server:
getUser
getUserInfo
getUserProfile
Whenever any of the requests are received, the UI is updated
So given these 2 options, I am wondering which would offer better scalability.
At the moment, I am thinking of going with option 2 for these reasons:
Each of the async requests will be faster than the query in option a, therefore displaying something to the user faster
I am also integrating Memecache and I feel that the 3 separate calls will be easier for caching specific results (e.g not caching a user profile, but caching user, user_info and user_profile_picture).
Any thoughts or experiences?
I think the key question here is whether or not these API calls will always be made together. If they are, it makes more sense to setup a of a single endpoint and perform a join. However, if that is not the case then you should keep the separate.
Now, what you can do is of course use a query syntax that let's you specify whether or not a particular endpoint should give you more data and combine it with a join. This does require more input sanitation, but it might be worth it, since you could then minimize requests and still get an adaptable system.
On the server side, it's unlikely that either of your two approaches should be noticably slower than the other unless you're dealing with thousands of rows at a time

Caching when consuming a webservice

Let’s say I have a weather web service that I’m hitting (consuming) every page load. Not very efficient or smart and probably going to exceed my API limit or make the webservice owners mad. So instead of fetching directly from a controller action, I have a helper / job / method (some layer) that has the chance to cache the data a little bit. Let’s also say that I don’t care too much about the real-time-ness of the data.
Now what I’ve done in the past is simply store the attributes from the weather service in a table and refresh the data every so often. For example, the weather service might look like this:
Weather for 90210 (example primary key)
-----------------------------
Zip Name: Beverly Hills
Current Temperature: 90
Last Temp: 89
Humidity: 0
... etc.
So in this case, I would create columns for each attribute and store them when I fetch from the webservice. I could have an expiring rails action (page caching) to do the refresh or I could do a background job.
This simple approach works well except if the webservice has a large list of attributes (say 1000). Now I’m spending a lot of time creating and maintaining DB columns repeating someone else’s attributes that already exist. What would be great is if I could simply cache the whole response and refer to it as a simple Hash when I need it. Then I’d have all the attributes cached that the webservice offers for “free” because all the capabilities of the web service would be in my Hash instead of just caching a subset.
To do this, I could maybe fetch the webservice response, serialize it (YAML maybe) and then fetch the serialized object if it exists. Meh, not great. Serializing can get weird with special characters. It’d be really cool if I could just follow a memcached type model but I don’t think you can store complex objects in memcached right? I'd also like to limit the amount of software introduced, so a stand-alone proxy layer would be suboptimal imo.
Anyone done something similar or have a name for this?
If the API you're hitting is RESTful and respects caching, don't reinvent the wheel. HTTP has caching built into it (see RFC 2616), so try to use it as far as possible. You have two options:
Just stick a squid proxy between your app and the API and you're done.
Use Wrest - we wrote it to support HTTP 2616 caching and it's the only Ruby HTTP wrapper that I know that does.
If the API doesn't respect caching (most do) then the other advice you've received makes sense. What you actually use to hold your cache (mongodb/memcached/whatever) depend on a bunch of other factors, so really, that depends on your situation.
You can use MongoDB (or another JSON datastore) and get the results of the API in JSON, store the results into your mongo collection. Then get the data and attributes that you care about, and ignore the rest.
For your weather API call, you can check to see if that city exists in your mongo collection, and if not get via the API (and then store in mongo).
It would be a modification of the Rails.cache pattern.

Searching for a song while using multiple API's

I'm going to attempt to create an open project which compares the most common MP3 download providers.
This will require a user to enter a track/album/artist name i.e. Deadmau5 this will then pull the relevant prices from the API's.
I have a few questions that some of you may have encountered before:
Should I have one server side page that requests all the data and it is all loaded simultaneously. If so, how would you deal with timeouts or any other problems that may arise. Or should the page load, then each price get pulled in one by one (ajax). What are your experiences when running a comparison check?
The main feature will to compare prices, but how can I be sure that the products are the same. I was thinking running time, track numbers but I would still have to set one source as my primary.
I'm making this a wiki, please add and edit any issues that you can think of.
Thanks for your help. Look out for a future blog!
I would check amazon first. they will give you a SKU (the barcode on the back of the album, I think amazon calls it an EAN) If the other providers use this, you can make sure they are looking at the right item.
I would cache all results into a database, and expire them after a reasonable time. This way when you get 100 requests for Britney Spears, you don't have to hammer the other sites and slow down your application.
You should also make sure you are multithreading whatever requests you are doing server side. Curl for instance allows you to pull multiple urls, and assigns a user defined callback. I'd have the callback send a some data so you can update your page with as the results come back. GETTUNES => curl callback returns some data for each url while connection is open that you parse it on the client side.

Resources