Pulling multiple levels of data efficiently in Rails

Pulling multiple levels of data efficiently in Rails - ruby-on-rails

I am trying to build a print process, which consists of printing a batch of financial applications using Rails.
I am printing around 100 applications, which consist of multiple levels of data (the application itself, sub-models, and their sub-models).
At the moment the page is very inefficient as it is doing a lot of N+1 querying which is causing the performance to be poor.
Question is, is there an efficient way of getting this data out of the database. I've tried pulling the forms with includes() for all the sub-models, but this doesn't help with the models that are below that (for instance, income_line_items on a financial_history model)
Any ideas?

Have you tried using a nested hash to access the sub-sub-models? Something like:
#app = Application.includes(:first_submodel,
{:second_submodel => [:first_sub_submodel, :second_sub_submodel]},
{:third_submodel => :third_sub_submodel})

Related

Rails Heroku app spending lots of time in Ruby running app code

I have a production Rails app running on Heroku, and some API endpoints are taking a long period of time to resolve (~1-2s).
It's a normal Rails RESTful GET action cases#index. The method looks like so:
#cases = cases_query
meta = {
total: #cases.total_count,
count: params[:count],
page: params[:page],
sort: params[:order],
filter: pagination[:filter],
params: params
}
render json: #cases,
root: 'cases',
each_serializer: CaseSerializer,
meta: meta
The method runs an ActiveRecord query to select data, serializes each record and renders JSON. Skylight, a Rails profiler/monitoring/performance tool, is telling me that this endpoint amongst others is spending 70% in the controller method (in Ruby), and 30% in the database.
What in this code or in my app's setup is causing this to spend so much time in the app code? Could it be a gem?
Picture of Skylight analytics on this endpoint (you can see the bulk of the time is spent in Ruby in the controller action):

ActiveRecord can generate a ton of Ruby objects from queries. So you track the time it takes for the database to return results, and that may be ~20% of your request, but the rest could still be in ActiveRecord converting those results into Ruby objects.
Does your query for this request return a lot of rows? Are those rows very wide (like when you do a join of table1., table2., table3.*)?
I've had some experience in the past with serializers really really crushing performance. That usually ends up being a bit of a line by line hunt for what's responsible.
To troubleshoot this issue I recommend finding a way to get realtime or near realtime feedback on your performance. The newrelic_rpm gem has a /newrelic page you can view in development mode, which should provide feedback similar to Skylight. Skylight may have a similar development mode page you should look into.
There's also a gem called Peek that adds a little performance meter to each page view, that you can add gems to in order to show specific slices of the request, like DB, views, and even Garbage collection. https://github.com/peek/peek Check it out, especially the GC plugin.
Once you have that realtime feedback setup, and you can see something that maps roughly to your skylight output, you can start using a binary search in your code to isolate the performance problem.
In your controller, eliminate the rendering step by something like:
render json: {}
and look at the results of that request. If the performance improves dramatically then your issue is probably in the serialization phase.
If not, then maybe it is ActiveRecord blowing up the Ruby objectspace. Do a google search for Ruby Object Space profiling and you'll find ways to troubleshoot this.
If that's your problem, then try to narrow down the results returned by your query. select only the columns you need to render in this response. Try to eliminate joins if possible (by returning a foreign key instead of an object, if that is possible).
If serialization is your problem... Good luck. This one is particularly hard to troubleshoot in my experience. You may try using a more efficient JSON gem like OJ, or hardcoding your serializers rather than using ActiveRecord::Serializer (last resort!).
Good luck!

Normally database queries can cause this kind of issue revisit you database queries and try to optimize them apply joins where you can.
Also try to use Puma gem with heroku to improve your server performance.

Optimising export of DB using Rails

I have a RoR application which contains an API to manage applications, each of which contain recipes (and groups, ingredients, measurements).
Once the user has finished managing the recipes, they download a JSON file of the entire application. Because each application could have hundreds of recipes, the files can be large. It also means there is a lot of DB calls to get all the required data to export.
Now because of this, the request to download the application can take upwards of 30 seconds, sometimes more.
My current code looks something like this:
application.categories.each do |c|
c.recipes.each do |r|
r.groups.each do |r|
r.ingredients.each do |r|
Within each loop I'm storing the data in a HASH and then giving it to the user.
My question is: where do I go from here?
Is there a way to grab all the data I require from the DB in one query? From looking at the log, I can see it is running hundreds of queries.
If the above solution is still slow, is this something I should put into a background process, and then email the user a link (or similar)?

There are of course ways to grab more data at once. This is done with Rails includes or joins, depending on your needs. See this article for some detailed information.
The basic idea is that you can join between your tables so that each time new queries aren't generated. When you do application.categories, that's one query. For each of those categories, you'll do another query: c.recipes - this creates N+1 queries, where N is the number of categories you have. Rather, you can include them off the get go to create 1 or 2 queries (depending on what Rails does).
The basic syntax is easy:
Application.includes(:categories => :recipes).each do |application| ...
This generates 1 (or 2 - again, see article) query that grabs all applications, their categories, and each categories recipies all at once. You can tack on the groups and ingredients too.
As for putting the work in the background, my suggestion would be to just have a loading image, or get fancy by using a progress bar.

First of all I have to assume that the required has_many and belongs_to associations exist.
Generally you can do something like
c.recipes.includes(:groups)
or even
c.recipes.includes(:groups => :ingredients)
which will fetch recipes and groups (and ingredients) at once.
But since you have a quite big data set IMO it would be better if you limited that technique to the deepest levels.
The most usefull approach would be to use find_each and includes together.
(find_each fetches the items in batches in order to keep the memory usage low)
perhaps something like
application.categories.each do |c|
c.recipes.find_each do |r|
r.groups.includes(:ingredients).each do |r|
r.ingredients.each do |r|
...
end
end
end
end
Now even that can take quite a long time (for an http request) so you can consider using some async processing where the client will generate a request that is going to be processed by the server as a background job, and when that is ready, you can provide a download link (or send an email) to the client.
Resque is one possible solution for handling the async part.

rails active record caching in memory

I have a model which has through relationships to another model through two foreign key relationships. I do a lot of lookups on those tables (which are 3-4K rows) while importing a lot of data, and I'm trying to eliminate spurious repeated database lookups.
(Ideally, my database would be doing async writes/INSERTs only)
I have played with doing my own caching by ID, and recently switched to using Rails.cache (with MemoryStore for now. I have no need to sync against other instances, and I'm ram rich on the import machine). However, I find that I am getting multiple copies of the same associated records, and I'd like to get rid of this.
For instance:
irb> p = Phone.includes([:site => :client, :btn => :client]).first.
irb> p.site.client.object_id => 67190640
irb> p.btn.client.object_id => 67170780
Ideally, I'd like these to point to the same object in memory.
Rails.cache would serialize things in/out, which really just makes this worse, but I was surprised by this. Could I override find_by_id() or some such in a way that the association proxies would make use of my cache?
Maybe there is another caching module that I'm missing?
(please note that there is no web front end involved in this process. It's all models and ORM)

Try using IdentityCache (see https://github.com/Shopify/identity_cache). We currently have a similar issue. We're using JRuby because it's fast but mallocs are expensive in our target environment... making cacheing these record instances all the more necessary.
There used to be an IdentityMap within ActiveRecord, but it was removed due to issues with unexpected behaviour around associations.
Just noticed you asked this in August, did you find a good solution?

How to improve performance of single-page application?

Introduction
I have a (mostly) single-page application built with BackboneJS and a Rails backend.
Because most of the interaction happens on one page of the webapp, when the user first visits the page I basically have to pull a ton of information out of the database in one large deeply joined query.
This is causing me some rather extreme load times on this one page.
NewRelic appears to be telling me that most of my problems are because of 457 individual fast method calls.
Now I've done all the eager loading I can do (I checked with the Bullet gem) and I still have a problem.
These method calls are most likely ocurring in my Rabl serializer which I use to serialize a bunch of JSON to embed into the page for initializing Backbone. You don't need to understand all this but suffice to say it could add up to 457 method calls.
object #search
attributes :id, :name, :subscription_limit
# NOTE: Include a list of the members of this search.
child :searchers => :searchers do
attributes :id, :name, :gravatar_icon
end
# Each search has many concepts (there could be over 100 of them).
child :concepts do |search|
attributes :id, :title, :search_id, :created_at
# The person who suggested each concept.
child :suggester => :suggester do
attributes :id, :name, :gravatar_icon
end
# Each concept has many suggestions (approx. 4 each).
node :suggestions do |concept|
# Here I'm scoping suggestions to only ones which meet certain conditions.
partial "suggestions/show", object: concept.active_suggestions
end
# Add a boolean flag to tell if the concept is a favourite or not.
node :favourite_id do |concept|
# Another method call which occurs for each concept.
concept.favourite_id_for(current_user)
end
end
# Each search has subscriptions to certain services (approx. 4).
child :service_subscriptions do
# This contains a few attributes and 2 fairly innocuous method calls.
extends "service_subscriptions/show"
end
So it seems that I need to do something about this but I'm not sure what approach to take. Here is a list of potential ideas I have:
Performance Improvement Ideas
Dumb-Down the Interface
Maybe I can come up with ways to present information to the user which don't require the actual data to be present. I don't see why I should absolutely need to do this though, other single-page apps such as Trello have incredibly complicated interfaces.
Concept Pagination
If I paginate concepts it will reduct the amount of data being extracted from the database each time. Would product an inferior user interface though.
Caching
At the moment, refreshing the page just extracts the entire search out of the DB again. Perhaps I can cache parts of the app to reduce on DB hits. This seems messy though because not much of the data I'm dealing with is static.
Multiple Requests
It is technically bad to serve the page without embedding the JSON into the page but perhaps the user will feel like things are happening faster if I load the page unpopulated and then fetch the data.
Indexes
I should make sure that I have indexes on all my foreign keys. I should also try to think about places where it would help to have indexes (such as favourites?) and add them.
Move Method Calls into DB
Perhaps I can cache some of the results of the iteration I do in my view layer into the DB and just pull them out instead of computing them. Or I could sync things on write rather than on read.
Question
Does anyone have any suggestions as to what I should be spending my time on?

This is a hard question to answer without being able to see the actual user interface, but I would focus on loading exactly only as much data as is required to display the initial interface. For example, if the user has to drill down to see some of the data you're presenting, then you can load that data on demand, rather than loading it as part of the initial payload. You mention that a search can have as many as 100 "concepts," maybe you don't need to fetch all of those concepts initially?
Bottom line, it doesn't sound like your issue is really on the client side -- it sounds like your server-side code is slowing things down, so I'd explore what you can do to fetch less data, or to defer the complex queries until they are definitely required.

I'd recommend separating your JS code-base into modules that are dynamically loaded using an asset loader like RequireJS. This way you won't have so many XHRs firing at load time.
When a specific module is needed it can load and initialize at an appropriate time instead of every page load.
If you complicate your code a little, each module should be able to start and stop. So, if you have any polling occurring or complex code executing you can stop the module to increase performance and decrease the network load.

Caching bunch of simple queries in rails

In my app there're objects, and they belong to countries, regions, cities, types, groups, companies and other sets. Every set is rather simple - it has id, name and sometimes some pointers to other sets, and it never changes. Some sets are small and I load them in before_filter like that:
#countries = Country.all
#regions = Region.all
But then I call, for example,
offer.country.name
or
region.country.name
and my app performs a separate db query-by-id, although I've already loaded them all. After that I perform query through :include, and this case ids, generated by eager loading, do not depend on either I've already loaded this data with another query-by-id or not.
So I want some cache. For example, I may generate hashes with keys as records-ids in my before_filter and then call #countries[offer.country_id].name. This case it seems I don't need eager loading and it's easy turn on Rails.cache here. But maybe there's some smart built-in rails solution that does not require to rewrite everything?

Caching lists of models like that won't cache individual instances of that exist in other model's associations.
The Rails team has worked on implementing Identity Maps in Rails 3.1 to solve this exact problem, but it is disabled by default for now. You can enable it and see if it works for your problem.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart