Optimising Rails dynamic JSON CPU/memory intensive operations (caching?)

Optimising Rails dynamic JSON CPU/memory intensive operations (caching?) - ruby-on-rails

We've built a dynamic questionnaire with a Angular front-end and RoR backend. Since there are a lot of dynamic parts to handle, it was impossible to utilise ActionView or jbuilder cache helpers. With each Questionnaire request, there are quite a lot of queries to be done, such as checking validity of answers, checking dependencies, etc. Is there a recommended strategy to cache dynamic JSON responses?
To give an idea..
controller code:
def advance
# Decrypt and parse parameters
request = JSON.parse(decrypt(params[:request]))
# Process passed parameters
if request.key?('section_index')
#result_set.start_section(request['section_index'].to_i)
elsif request.key?('question_id')
if valid_answer?(request['question_id'], request['answer_id'])
#result_set.add_answer(request['question_id'],
request['answer_id'],
request['started_at'],
request['completed_at'])
else
return invalid_answer
end
end
render_item(#result_set.next_item)
end
The next_item could be a question or section, but progress indicator data and possibly a previously given answer (navigation is possible) are returned as well. Also, data is sent encrypted from and to the front-end.
We've also built an admin area with an Angular front-end. In this area results from the questionnaire can be viewed and compared. Quite some queries are being done to find subquestions, comparable questions etc. Which we found hard to cache. After clicking around with multiple simultaneous users, you could fill up the servers memory.
The app is deployed on Passenger and we've fine-tuned the config based on the server configuration. The results are stored in a Postgres database.
TLDR: In production, we found out memory usage becomes an issue. Some optimisations to queries (includes specifically) are possible, but is there a recommended strategy for caching dynamic JSON responses?

Without much detail as to how you are storing and retrieving your data, it is a bit tough. But what it sounds like you are saying is that your next_item method is CPU and memory intensive to try to find the next item. Is that correct? Assuming that, you might want to take a look at a Linked List. Each node (polymorphic) would have a link to the next node. You could implement it as a doubly linked list if you needed to step forward and backwards.

How often does the data change? If you have can cache big parts of it and you'll find a trigger attribute (e.g. updated_at) that you can do fragment caching in the view. Or even better to HTTP caching in the controller. You can mix both.
It's a bit complex. Please have a look at http://www.xyzpub.com/en/ruby-on-rails/4.0/caching.html

Related

Should I query and filter on the back-end (Rails API) or front-end (React/Redux)

I have an app that allows users to sort and filter through 30,000 items of data. Right now I make fetch requests from Redux actions to my rails API, with the queries being handled by scope methods on my rails end. My instructor is recommending that I move all my querying to my front-end for efficiency, but I'm wondering if it really will be more performant to manage a Redux state object with 30,000 objects in it, each with 50 of their own attributes.
(A couple extra notes: Right now I've only run the app locally and I'm doing the pagination server-side so it runs lightning fast, but I'm a bit nervous about when I launch it somewhere like Heroku. Also, I know that if I move my querying to the front-end I'll have more options to save the query state in the URL with react-router, but I've already sort of hacked a way around that with my existing set-up.)

Let's have a look at the pros and cons of each approach:
Querying on Front End
👍 Querying does not need another network request
👎 Network requests are slower because there is more data to send
👎 App must store much more data in memory
👎 Querying is not necessarily more efficient because the client has to do the filtering and it usually does not have the mechanisms to do so effectively (caching and indexing).
Querying on Back End
👍 Less data to send to client
👍 Querying can be quite fast if database indexes are set up properly
👍 App is more lightweight, it only holds the data it needs to display
👎 Each query will require a network request
The pros of querying on Back End heavily outweighs that on Front End. I would have to disagree with your instructor's opinion. Imagine you want to search for something on Google and Google sends all relevant results you want to your browser and does the pagination and sorting within your browser, your browser would feel extremely sluggish. With proper caching and adding database indexes to your data, network requests will not be a huge disadvantage.

Rails: Cache multiple result sets during a session

I have a Rails app that searches a static set of documents, and I need to figure out the best way to cache result sets. I can't just use the native ActiveRecord caching.
My situation:
I'm using the will_paginate gem, and at the moment, the query is running every time the user changes pages. Some queries are too complex for this to be responsive, so I need to cache the results, at least during an individual session. Also, the user may have multiple tabs open, running separate queries simultaneously. The document set is static; the size of the set is on the order of tens of thousands of documents.
Why straight-forward ActiveRecord caching won't work:
The user can search the contents of the documents, or search based on metadata restrictions (like a date range), or both. The metadata for each document is stored in an ActiveRecord, so those criteria are applied with an ActiveRecord query.
But if they add a search term for the document content, I run that search using a separate FastCGI application, because I'm doing some specialized search logic. So, I pass the term & the winnowed-down document list to the FastCGI application, which responds with the final result list. Then I do another ActiveRecord query: where("id IN (?)',returnedIds)
By the way, it's these FastCGI searches that are sometimes complex enough to be unresponsive.
My thoughts:
There's the obvious-to-a-newbie approach: I can use the metadata restrictions plus the search term as a key; they're already stored in a hash. They'd be paired up with the returnedIds array. And this guide at RubyOnRails.org mentions the cache stores that are available. But it's not clear to me which store is best, and I'm also assuming there's a gem that's better for this.
I found the gem memcached, but it's not clear to me whether it would work for caching the results of my FastCGI request.

Caching when consuming a webservice

Let’s say I have a weather web service that I’m hitting (consuming) every page load. Not very efficient or smart and probably going to exceed my API limit or make the webservice owners mad. So instead of fetching directly from a controller action, I have a helper / job / method (some layer) that has the chance to cache the data a little bit. Let’s also say that I don’t care too much about the real-time-ness of the data.
Now what I’ve done in the past is simply store the attributes from the weather service in a table and refresh the data every so often. For example, the weather service might look like this:
Weather for 90210 (example primary key)
-----------------------------
Zip Name: Beverly Hills
Current Temperature: 90
Last Temp: 89
Humidity: 0
... etc.
So in this case, I would create columns for each attribute and store them when I fetch from the webservice. I could have an expiring rails action (page caching) to do the refresh or I could do a background job.
This simple approach works well except if the webservice has a large list of attributes (say 1000). Now I’m spending a lot of time creating and maintaining DB columns repeating someone else’s attributes that already exist. What would be great is if I could simply cache the whole response and refer to it as a simple Hash when I need it. Then I’d have all the attributes cached that the webservice offers for “free” because all the capabilities of the web service would be in my Hash instead of just caching a subset.
To do this, I could maybe fetch the webservice response, serialize it (YAML maybe) and then fetch the serialized object if it exists. Meh, not great. Serializing can get weird with special characters. It’d be really cool if I could just follow a memcached type model but I don’t think you can store complex objects in memcached right? I'd also like to limit the amount of software introduced, so a stand-alone proxy layer would be suboptimal imo.
Anyone done something similar or have a name for this?

If the API you're hitting is RESTful and respects caching, don't reinvent the wheel. HTTP has caching built into it (see RFC 2616), so try to use it as far as possible. You have two options:
Just stick a squid proxy between your app and the API and you're done.
Use Wrest - we wrote it to support HTTP 2616 caching and it's the only Ruby HTTP wrapper that I know that does.
If the API doesn't respect caching (most do) then the other advice you've received makes sense. What you actually use to hold your cache (mongodb/memcached/whatever) depend on a bunch of other factors, so really, that depends on your situation.

You can use MongoDB (or another JSON datastore) and get the results of the API in JSON, store the results into your mongo collection. Then get the data and attributes that you care about, and ignore the rest.
For your weather API call, you can check to see if that city exists in your mongo collection, and if not get via the API (and then store in mongo).
It would be a modification of the Rails.cache pattern.

Need caching techniques for REST service calls

I am building a Ruby on Rails application where I need to be able to consume a REST API to fetch some data in (Atom) feed format. The REST API has a limit to number of calls made per second as well as per day. And considering the amount of traffic my application may have, I would easily be exceeding the limit.
The solution to that would be to cache the REST API response feed locally and expose a local service (Sinatra) that provides the cached feed as it is received from the REST API. And of course a sweeper would periodically refresh the cached feed.
There 2 problems here.
1) One of the REST APIs is a search API where search results are returned as an ATOM feed. The API takes in several parameters including the search query. What should be my caching strategy so that cached feed can be uniquely identified against the parameters? That is, for example, if I search for say
/search?q=Obama&page=3&per_page=25&api_version=4
and I get a feed response for these parameters. How do I cache the feed so that for the exact same parameters passed in a call some time later, the cached feed is returned and if the parameters change, a new call should be made to the REST API?
2) The other problem is regarding the sweeper. I don't want to sweep a cached feed which is rarely used. That is, search query Best burgers in Somalia would obviously be very less wanted than say Barak Obama. I do have the data of how many consumers have subscribed to the feed. The strategy here should be that given the number of subscribers to this search query, sweep the cached feeds based on how large this number is. Since the caching needs to happen in the Sinatra application, how would one go about implementing this kind of sweeping strategy? Some code will help.
I am open to any ideas here. I want these mechanisms to be very good on performance. Ideally I would want to do this without database and by pure page caching. However, I am open to possibility of trying other things.

Why would you want to replicate the REST service as a Sinatra app? You could easily just make a model inside your existing Rails app to cache the Atom feeds (storing the whole feed as a string inside for example).
a CachedFeed Model which is updated when its "updated_at" is far enough away to be renewed.
You could even use static caching for your cachedFeed Controller to reduce the strain on your system.
Having the cache inside your Rails app would greatly reduce complexity in terms of when to renew your cache or even count the requests performed against the rest api you query.
You could have model logic to distribute the calls you have to the most popular feeds. Tthe search parameter could just an attribute of your model so you can easily find and distinguish them

Why would Google Search use client-side URL parameters?

Yesterday morning I noticed Google Search was using hash parameters:
http://www.google.com/#q=Client-side+URL+parameters
which seems to be the same as the more usual search (with search?q=Client-side+URL+parameters). (It seems they are no longer using it by default when doing a search using their form.)
Why would they do that?
More generally, I see hash parameters cropping up on a lot of web sites. Is it a good thing? Is it a hack? Is it a departure from REST principles? I'm wondering if I should use this technique in web applications, and when.
There's a discussion by the W3C of different use cases, but I don't see which one would apply to the example above. They also seem undecided about recommendations.

Google has many live experimental features that are turned on/off based on your preferences, location and other factors (probably random selection as well.) I'm pretty sure the one you mention is one of those as well.
What happens in the background when a hash is used instead of a query string parameter is that it queries the "real" URL (http://www.google.com/search?q=hello) using JavaScript, then it modifies the existing page with the content. This will appear much more responsive to the user since the page does not have to reload entirely. The reason for the hash is so that browser history and state is maintained. If you go to http://www.google.com/#q=hello you'll find that you actually get the search results for "hello" (even if your browser is really only requesting http://www.google.com/) With JavaScript turned off, it wouldn't work however, and you'd just get the Google front page.
Hashes are appearing more and more as dynamic web sites are becoming the norm. Hashes are maintained entirely on the client and therefore do not incur a server request when changed. This makes them excellent candidates for maintaining unique addresses to different states of the web application, while still being on the exact same page.
I have been using them myself more and more lately, and you can find one example here: http://blixt.org/js -- If you have a look at the "Hash" library on that page, you'll see my implementation of supporting hashes across browsers.
Here's a little guide for using hashes for storing state:
How?
Maintaining state in hashes implies that your application (I'll call it application since you generally only use hashes for state in more advanced web solutions) relies on JavaScript. Without JavaScript, the only function of hashes would be to tell the browser to find content somewhere on the page.
Once you have implemented some JavaScript to detect changes to the hash, the next step would be to parse the hash into meaningful data (just as you would with query string parameters.)
Why?
Once you've got the state in the hash, it can be modified by your code (or your user) to represent the current state in your application. There are many reasons for why you would want to do this.
One common case is when only a small part of a page changes based on a variable, and it would be inefficient to reload the entire page to reflect that change (Example: You've got a box with tabs. The active tab can be identified in the hash.)
Other cases are when you load content dynamically in JavaScript, and you want to tell the client what content to load (Example: http://beta.multifarce.com/#?state=7001, will take you to a specific point in the text adventure.)
When?
If you had a look at my "JavaScript realm" you'll see a border-line overkill case. I did it simply because I wanted to cram as much JavaScript dynamics into that page as possible. In a normal project I would be conservative about when to do this, and only do it when you will see positive changes in one or more of the following areas:
User interactivity
Usually the user won't see much difference, but the URLs can be confusing
Remember loading indicators! Loading content dynamically can be frustrating to the user if it takes time.
Responsiveness (time from one state to another)
Performance (bandwidth, server CPU)
No JavaScript?
Here comes a big deterrent. While you can safely rely on 99% of your users to have a browser capable of using your page with hashes for state, there are still many cases where you simply can't rely on this. Search engine crawlers, for example. While Google is constantly working to make their crawler work with the latest web technologies (did you know that they index Flash applications?), it still isn't a person and can't make sense of some things.
Basically, you're on a crossroads between compatability and user experience.
But you can always build a road inbetween, which of course requires more work. In less metaphorical terms: Implement both solutions so that there is a server-side URL for every client-side URL that outputs relevant content. For compatible clients it would redirect them to the hash URL. This way, Google can index "hard" URLs and when users click them, they get the dynamic state stuff!

Recently google also stopped serving direct links in search results offering instead redirects.
I believe both have to do with gathering usage statistics, what searches were performed by the same user, in what sequence, what of the search results the user has followed etc.
P.S. Now, that's interesting, direct links are back. I absolutely remember seeing there only redirects in the last couple of weeks. They are definitely experimenting with something.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart