Let’s say I have a weather web service that I’m hitting (consuming) every page load. Not very efficient or smart and probably going to exceed my API limit or make the webservice owners mad. So instead of fetching directly from a controller action, I have a helper / job / method (some layer) that has the chance to cache the data a little bit. Let’s also say that I don’t care too much about the real-time-ness of the data.
Now what I’ve done in the past is simply store the attributes from the weather service in a table and refresh the data every so often. For example, the weather service might look like this:
Weather for 90210 (example primary key)
-----------------------------
Zip Name: Beverly Hills
Current Temperature: 90
Last Temp: 89
Humidity: 0
... etc.
So in this case, I would create columns for each attribute and store them when I fetch from the webservice. I could have an expiring rails action (page caching) to do the refresh or I could do a background job.
This simple approach works well except if the webservice has a large list of attributes (say 1000). Now I’m spending a lot of time creating and maintaining DB columns repeating someone else’s attributes that already exist. What would be great is if I could simply cache the whole response and refer to it as a simple Hash when I need it. Then I’d have all the attributes cached that the webservice offers for “free” because all the capabilities of the web service would be in my Hash instead of just caching a subset.
To do this, I could maybe fetch the webservice response, serialize it (YAML maybe) and then fetch the serialized object if it exists. Meh, not great. Serializing can get weird with special characters. It’d be really cool if I could just follow a memcached type model but I don’t think you can store complex objects in memcached right? I'd also like to limit the amount of software introduced, so a stand-alone proxy layer would be suboptimal imo.
Anyone done something similar or have a name for this?
If the API you're hitting is RESTful and respects caching, don't reinvent the wheel. HTTP has caching built into it (see RFC 2616), so try to use it as far as possible. You have two options:
Just stick a squid proxy between your app and the API and you're done.
Use Wrest - we wrote it to support HTTP 2616 caching and it's the only Ruby HTTP wrapper that I know that does.
If the API doesn't respect caching (most do) then the other advice you've received makes sense. What you actually use to hold your cache (mongodb/memcached/whatever) depend on a bunch of other factors, so really, that depends on your situation.
You can use MongoDB (or another JSON datastore) and get the results of the API in JSON, store the results into your mongo collection. Then get the data and attributes that you care about, and ignore the rest.
For your weather API call, you can check to see if that city exists in your mongo collection, and if not get via the API (and then store in mongo).
It would be a modification of the Rails.cache pattern.
Related
I have an app that allows users to sort and filter through 30,000 items of data. Right now I make fetch requests from Redux actions to my rails API, with the queries being handled by scope methods on my rails end. My instructor is recommending that I move all my querying to my front-end for efficiency, but I'm wondering if it really will be more performant to manage a Redux state object with 30,000 objects in it, each with 50 of their own attributes.
(A couple extra notes: Right now I've only run the app locally and I'm doing the pagination server-side so it runs lightning fast, but I'm a bit nervous about when I launch it somewhere like Heroku. Also, I know that if I move my querying to the front-end I'll have more options to save the query state in the URL with react-router, but I've already sort of hacked a way around that with my existing set-up.)
Let's have a look at the pros and cons of each approach:
Querying on Front End
👍 Querying does not need another network request
👎 Network requests are slower because there is more data to send
👎 App must store much more data in memory
👎 Querying is not necessarily more efficient because the client has to do the filtering and it usually does not have the mechanisms to do so effectively (caching and indexing).
Querying on Back End
👍 Less data to send to client
👍 Querying can be quite fast if database indexes are set up properly
👍 App is more lightweight, it only holds the data it needs to display
👎 Each query will require a network request
The pros of querying on Back End heavily outweighs that on Front End. I would have to disagree with your instructor's opinion. Imagine you want to search for something on Google and Google sends all relevant results you want to your browser and does the pagination and sorting within your browser, your browser would feel extremely sluggish. With proper caching and adding database indexes to your data, network requests will not be a huge disadvantage.
We've built a dynamic questionnaire with a Angular front-end and RoR backend. Since there are a lot of dynamic parts to handle, it was impossible to utilise ActionView or jbuilder cache helpers. With each Questionnaire request, there are quite a lot of queries to be done, such as checking validity of answers, checking dependencies, etc. Is there a recommended strategy to cache dynamic JSON responses?
To give an idea..
controller code:
def advance
# Decrypt and parse parameters
request = JSON.parse(decrypt(params[:request]))
# Process passed parameters
if request.key?('section_index')
#result_set.start_section(request['section_index'].to_i)
elsif request.key?('question_id')
if valid_answer?(request['question_id'], request['answer_id'])
#result_set.add_answer(request['question_id'],
request['answer_id'],
request['started_at'],
request['completed_at'])
else
return invalid_answer
end
end
render_item(#result_set.next_item)
end
The next_item could be a question or section, but progress indicator data and possibly a previously given answer (navigation is possible) are returned as well. Also, data is sent encrypted from and to the front-end.
We've also built an admin area with an Angular front-end. In this area results from the questionnaire can be viewed and compared. Quite some queries are being done to find subquestions, comparable questions etc. Which we found hard to cache. After clicking around with multiple simultaneous users, you could fill up the servers memory.
The app is deployed on Passenger and we've fine-tuned the config based on the server configuration. The results are stored in a Postgres database.
TLDR: In production, we found out memory usage becomes an issue. Some optimisations to queries (includes specifically) are possible, but is there a recommended strategy for caching dynamic JSON responses?
Without much detail as to how you are storing and retrieving your data, it is a bit tough. But what it sounds like you are saying is that your next_item method is CPU and memory intensive to try to find the next item. Is that correct? Assuming that, you might want to take a look at a Linked List. Each node (polymorphic) would have a link to the next node. You could implement it as a doubly linked list if you needed to step forward and backwards.
How often does the data change? If you have can cache big parts of it and you'll find a trigger attribute (e.g. updated_at) that you can do fragment caching in the view. Or even better to HTTP caching in the controller. You can mix both.
It's a bit complex. Please have a look at http://www.xyzpub.com/en/ruby-on-rails/4.0/caching.html
Is there an ATOM Client or framework that enables capture of a feed entry EXACTLY once?
If not, what is the best architecture?
We would like to parse an ATOM feed, persisting all syndication feed entries locally as individual records (in database or file system). These records may have to be deleted periodically for efficiency. So client must keep track of which entries it has already looked at, independently - of the said persistence.
Have you looked at Superfeedr? Its a software as a Service platform that does just that: fetches feeds, parsed them and sends the new entries to your endpoints when they're available.
Answering my own, per working solution developed. In one word, the architectural solution to capturing only new and unique entries from a syndication feed is - CACHING.
Specifically, the entries must be stored by the client to support the logic "does the feed have anything new for me?". I believe there is no shortcut to this "client-side" solution.
Conditional-GET is not a complete solution, even if supported server side by the syndicated feed. For instance, if the client client does not send the exact If-Modified-Since time-stamp, the server can ignore the header and simply generate all entries again. Per Chris Berry, Bryon Jacob. Updated 10/27/08,
...a better way to do this is to use the start-index parameter, where the client sets this value to the end-Index, returned in the previous page. Using start-index ensures that the client will never see the same response twice, or miss Entries, since several Entries could have the same "update date", but will never have the same "update index".
In short, there is no standard server side solution guaranteeing "new/unique". The very idea of "uniqueness" is a client-side issue anyway and same opinion may not be shared by the server. From that perspective, it would be impossible for server to satisfy all clients. And, anyway, the question does not pertain to developing a better syndication server, but a smarter client, therefore, Caching is the way to go.
Cache implementation must persist between feed polls and the time-to-live (ttl), time-to-idle (tti) properties of entries stored in cache must be set appropriately to BOTH limit the performance/size of cache AND also to adequately cover the feed's oldest entries between polling cycles. Cache could be memory resident, database, file system or network array. A product like EHCache, ehcache.org, provides just about all the functionality needed.
The feed entries may persisted as-is, but the best (most efficient) method would be to identify contents, or combinations thereof, that make them unique. Methods like serialization in java or Google's Protocol Buffers may be used to create unique compact keys to persist in the cache. In my simple solution, I did not even bother storing the entries, just the keys generated as an MD5 hash of a couple of entry fields by which I defined how an entry would be unique for my purpose.
Hope this flowchart is helpful.
For a rails 3 app I am building, a user gets to share a post which has numerous different parameters. Some parameters are optional, others are required. While the user is filling out the parameters, I want to generate a preview for how the post will look on the fly. Some parameters are URLs which need to be sent back to the server to process, so basically, the preview cannot be 100% generated client side.
I was wondering what it the best way to go about this. Since it could be a lot of data, I don't want to send all the data back to the server every time something changes to regenerate the preview. I would rather only like to send the data that has changed. But in this case, where is the rest of the data stored? In a session, perhaps? Also, I would prefer to not rebuild the model object with all the data every time. Is there a way to persist the model object that represents the post as it is being created?
Thanks.
How big is that "a lot of data"? If you send it all, does it have a noticeable impact on performance, or are you just imagining that it would?
As you provided not too much information, here's basic info on what I would do:
process client-side. As much as possible.
data that can't be processed on the client - send to the server (only that part, not the rest of it). Receive result of processing and incorporate into what you already built.
no sessions, partially built models and any other state on the server. Stateless protocols are simple. Simplicity is prerequisite for reliability.
I am building a Ruby on Rails application where I need to be able to consume a REST API to fetch some data in (Atom) feed format. The REST API has a limit to number of calls made per second as well as per day. And considering the amount of traffic my application may have, I would easily be exceeding the limit.
The solution to that would be to cache the REST API response feed locally and expose a local service (Sinatra) that provides the cached feed as it is received from the REST API. And of course a sweeper would periodically refresh the cached feed.
There 2 problems here.
1) One of the REST APIs is a search API where search results are returned as an ATOM feed. The API takes in several parameters including the search query. What should be my caching strategy so that cached feed can be uniquely identified against the parameters? That is, for example, if I search for say
/search?q=Obama&page=3&per_page=25&api_version=4
and I get a feed response for these parameters. How do I cache the feed so that for the exact same parameters passed in a call some time later, the cached feed is returned and if the parameters change, a new call should be made to the REST API?
2) The other problem is regarding the sweeper. I don't want to sweep a cached feed which is rarely used. That is, search query Best burgers in Somalia would obviously be very less wanted than say Barak Obama. I do have the data of how many consumers have subscribed to the feed. The strategy here should be that given the number of subscribers to this search query, sweep the cached feeds based on how large this number is. Since the caching needs to happen in the Sinatra application, how would one go about implementing this kind of sweeping strategy? Some code will help.
I am open to any ideas here. I want these mechanisms to be very good on performance. Ideally I would want to do this without database and by pure page caching. However, I am open to possibility of trying other things.
Why would you want to replicate the REST service as a Sinatra app? You could easily just make a model inside your existing Rails app to cache the Atom feeds (storing the whole feed as a string inside for example).
a CachedFeed Model which is updated when its "updated_at" is far enough away to be renewed.
You could even use static caching for your cachedFeed Controller to reduce the strain on your system.
Having the cache inside your Rails app would greatly reduce complexity in terms of when to renew your cache or even count the requests performed against the rest api you query.
You could have model logic to distribute the calls you have to the most popular feeds. Tthe search parameter could just an attribute of your model so you can easily find and distinguish them