What is one way that I can reduce .includes association query? - ruby-on-rails

I have an extremely slow query that looks like this:
people = includes({ project: [{ best_analysis: :main_language }, :logo] }, :name, name_fact: :primary_language)
.where(name_id: limit(limit).unclaimed_people(opts))
Look at the includes method call and notice that is loading huge number of associations. In the RailsSpeed book, there is the following quote:
“For example, consider this:
Car.all.includes(:drivers, { parts: :vendors }, :log_messages)
How many ActiveRecord objects might get instantiated here?
The answer is:
# Cars * ( avg # drivers/car + avg log messages/car + average parts/car * ( average parts/vendor) )
Each eager load increases the number of instantiated objects, and in turn slows down the query. If these objects aren't used, you're potentially slowing down the query unnecessarily. Note how nested eager loads (parts and vendors in the example above) can really increase the number of objects instantiated.
Be careful with nesting in your eager loads, and always test with production-like data to see if includes is really speeding up your overall performance.”
The book fails to mention what could be a good substitute for this though. So my question is what sort of technique could I substitute for includes?

Before i jump to answer. I don't see you using any pagination or limit on a query, that may help quite a lot.
Unfortunately, there aren't any, really. And if you use all of the objects in a view that's okay. There is a one possible substitute to includes, though. It quite complex, but it still helpful sometimes: you join all needed tables, select only fields from them that you use, alias them and access them as a flat structure.
Something like
(NOTE: it uses arel helpers. You need to include ArelHelpers::ArelTable in models where you use syntax like NameFact[:id])
relation.join(name_fact: :primary_language).select(
NameFact[:id].as('name_fact_id')
PrimaryLanguage[:language].as('primary_language')
)
I'm not sure it will work for your case, but that's the only alternative I know.

I have an extremely slow query that looks like this
There are couple of potential causes:
Too many unnecessary objects fetched and created. From you comment, looks like that is not the case and you need all the data that is being fetched.
DB indexes not optimised. Check the time taken by query. Explain the generated query (check logs to get query or .to_sql) and make sure it is not doing table scan and other costly operations.

Related

How can I optimise this method in Ruby using preload, includes, or eager_load?

I want to reduce allocations and speed up a Ruby worker. I've been reading about eager loading, but I don't fully understand it yet. Here's the method:
def perform(study_id, timestamp)
study = Study.includes(:questions, :participants).find(study_id)
questions = study.questions.not_random.not_paused
participants = study.participants
return unless questions && participants
end_timestamp = timestamp_window(timestamp)
participants.each do |participant|
process_participant(participant, questions, timestamp, end_timestamp, study)
end
end
I was hoping that Study.includes() would reduce the number of database queries, but looking at Skylight, it doesn't seem to have changed anything:
Am I using includes incorrectly, or should I be using something else?
The example you've given doesn't seem like it's benefiting much from eager loading. Its utility is to avoid N+1 queries; something like this:
User.first(100).each do |user|
comments = user.comments
end
This will make 1 query for the 100 users, and 100 queries for the comments. That's why it's called N+1 (N being 100 here).
To prevent this from happening, you'd use eager loading:
User.first(100).includes(:comments).each do |user|
comments = user.comments
end
Now it makes two queries - one for the users and one for the comments. The fact that it makes 2 queries instead of 1 isn't a problem. Part of optimization (big O) is to find bottlenecks at different 'scales'. I'm not going to explain all that, but this is a good tutorial: https://samurails.com/interview/big-o-notation-complexity-ruby/
In the example without eager loading, the time complexity is O(N), which means 'linear'. The time required increases linearly with the value of N. If you use eager loading, though, then you can increase N without adding additional queries, and it's a O(1) complexity - constant time.
In your case, you have a method that makes three queries:
Study (find one)
associated questions
associated participants
An easy way to determine if you should use eager loading is to check your code for any SQL fetching that happens inside a loop. That's not happening here, so the eager loading won't do much. For example, it'd be good to use includes if you were instead fetching associated data for a list of studies.
It might technically possible to make a SQL query that gets all three tables' data in a single request, but I don't think ActiveRecord has anything to do it for you. It's probably unnecessary, though. If you're not convinced you can try writing that SQL yourself and report on the performance gains.

Performing multiple queries on the same model efficiently

I've been going round in circles for a few days trying to solve a problem which I've also struggled with in the past. Essentially its an issue of understanding the best (or an efficient) way to perform multiple queries on a model as I'm regularly finding my pages are very slow to load.
Consider the situation where you have a model called Everything. Initially you perform a query which finds those records in Everything which match certain criteria
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC')
I want to remember the contents of #chosenrecords as I will present them to the user as a list, however, I would also like to understand more of the attributes of #chosenrecords,for instance
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.count
When I use the above code in my controller and inspect the command history on the local server, I am surprised to find that each of the three queries involves an SQL query on the original Everything model, rather than remembering the records returned in #chosenrecords and performing the query on that. This seems very inefficient to me and indeed each of the three queries takes the same amount of time to process, making the page perform slowly.
I am more experienced in writing codes in software like MATLAB where once you've calculated the value of a variable it is stored locally and can be quickly interrogated, rather than recalculating that variable on each occasion you want to know more information about it. Please could you guide me as to whether I am just on the wrong track completely and the issues I've identified are just "how it is in Rails" or whether there is something I can do to improve it. I've looked into concepts like using a scope, defining a different variable type, and caching, but I'm not quite sure what I'm doing in each case and keep ending up in a similar hole.
Thanks for your time
You are partially on the wrong track. Rails 3 comes with Arel, which defer the query until data is required. In your case, you have generated Arel query but executing it with .first & then with .count. What I have done here is run the first query, get all the results in an array and working on that array in next two lines.
Perform the queries like this:-
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC').all
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.size
It will solve your issue.

In Rails, does Eagerloading apply to the where condition on the query ? how do improve performance of the app, apart from using page caching?

I have three questions:
1)
file_or_folder and dataset each have many metainstances. Given the following query:
p= Metainstance.find(:first, :conditions=>["file_or_folder_id=? AND dataset_id=?", some.id, dataset_id],:include=>[:file_or_folder,:dataset])
Does eager loading apply on file_or_folder and dataset? Also, what is the best way of writing this query?
2) If I need to retrieve a huge amount of data, is it more efficient to write queries using joins or includes option or by using scopes.
3) I cannot use page caching, as I have dynamic content that keeps on changing. How else can I improve the performance of a Rails app?
1) First of all, find(:first) has been deprecated for a long time. It's actually finally going away in Rails 4. Here's how this query would look in the modern era (shamelessly copied from meagar's comment):
Metainstance.
where(:file_or_folder_id => some.id, :dataset_id => dataset_id).
includes(:file_or_folder, :dataset)
So, on to the question: Eager loading in this way means that the following will happen:
First, Rails will load the Metainstances that match the conditions of
the query.
Second, it will load all of the FileOrFolders that are associated
with the Metainstances fetched in the first query (not any others).
Finally, it will load all of the Datasets associated with those
Metainstances.
I think this means that the answer to your question is "Yes, eager loading applies the contents of the where clause."
2) I think we covered this with the above discussion of finder methods. I don't think they actually less efficient, per se. Just uglier and deprecated. The above code is the correct way to run a query like this.
3) There are literally entire books on improving Rails app performance. You're going to have to be much more specific about the query you're running and how you're using the results from it before anyone can give you meaningful advice on this.
a) Yes, it does perform eager loading. I would do this like
p= Metainstance.where(:file_or_folder_id => some.id, :dataset_id => dataset_id).includes([:file_or_folder, :dataset]).first
This also does eager loading.
b) If you are using file_or_folder and dataset later on, then it is best to use includes (and you avoid n+1 problem). If you are not using them and just need to join tables, then joins is the faster way.
c) There are many ways to improve performance of your application and you can find some of these methods in Scaling Rails Screencast series.

Can a Grails dynamic finder be broken by application code?

There is some code in the project I'm working on where a dynamic finder behaves differently in one code branch than it does in another.
This line of code returns all my advertisers (there are 8 of them), regardless of which branch I'm in.
Advertiser.findAllByOwner(ownerInstance)
But when I start adding conditions, things get weird. In branch A, the following code returns all of my advertisers:
Advertiser.findAllByOwner(ownerInstance, [max: 25])
But in branch B, that code only returns 1 advertiser.
It doesn't seem possible that changes in application code could affect how a dynamic finder works. Am I wrong? Is there anything else that might cause this not to work?
Edit
I've been asked to post my class definitions. Instead of posting all of it, I'm going to post what I think is the important part:
static mapping = {
owner fetch: 'join'
category fetch: 'join'
subcategory fetch: 'join'
}
static fetchMode = [
grades: 'eager',
advertiserKeywords: 'eager',
advertiserConnections: 'eager'
]
This code was present in branch B but absent from branch A. When I pull it out, things now work as expected.
I decided to do some more digging with this code present to see what I could observe. I found something interesting when I used withCriteria instead of the dynamic finder:
Advertiser.withCriteria{owner{idEq(ownerInstance.id)}}
What I found was that this returned thousands of duplicates! So I tried using listDistinct:
Adviertiser.createCriteria().listDistinct{owner{idEq(ownerInstance.id)}}
Now this returns all 8 of my advertisers with no duplicates. But what if I try to limit the results?
Advertiser.createCriteria().listDistinct{
owner{idEq(ownerInstance.id)}
maxResults 25
}
Now this returns a single result, just like my dynamic finder does. When I cranked maxResults upto 100K, now I get all 8 of my results.
So what's happening? It seems that the joins or the eager fetching (or both) generated sql that returned thousands of duplicates. Grails dynamic finders must return distinct results by default, so when I wasn't limiting them, I didn't notice anything strange. But once I set a limit, since the records were ordered by ID, the first 25 records would all be duplicate records, meaning that only one distinct record will be returned.
As for the joins and eager fetching, I don't know what problem that code was trying to solve, so I can't say whether or not it's necessary; the question is, why does having this code in my class generate so many duplicates?
I found out that the eager fetching was added (many levels deep) in order to speed up the rendering of certain reports, because hundreds of queries were being made. Attempts were made to eager fetch on demand, but other developers had difficulty going more than one level deep using finders or Grails criteria.
So the general answer to the question above is: instead of eager by default, which can cause huge nightmares in other places, we need to find a way to do eager fetching on a single query that can go more than one level down the tree
The next question is, how? It's not very well supported in Grails, but it can be achieved by simply using Hibernate's Criteria class. Here's the gist of it:
def advertiser = Advertiser.createCriteria()
.add(Restrictions.eq('id', advertiserId))
.createCriteria('advertiserConnections', CriteriaSpecification.INNER_JOIN)
.setFetchMode('serpEntries', FetchMode.JOIN)
.uniqueResult()
Now the advertiser's advertiserConnections, will be eager fetched, and the advertiserConnections' serpEntries will also be eager fetched. You can go as far down the tree as you need to. Then you can leave your classes lazy by default - which they definitely should be for hasMany scenarios.
Since your query are retrieving duplicates, there's a chance that this limit of 25 records return the same data, consequently your distinct will reduce to one record.
Try to define the equals() and hashCode() to your classes, specially if you have some with composite primary key, or is used as hasMany.
I also suggest you to try to eliminate the possibilities. Remove the fetch and the eager one by one to see how it affects your result data (without limit).

Modeling associations between ActiveRecord objects with Redis: avoiding multiple queries

I've been reading / playing around with the idea of using Redis to complement my ActiveRecord models, in particular as a way of modeling relationships. Also watched a few screencasts like this one: http://www.youtube.com/watch?v=dH6VYRMRQFw
It seems like a good idea in cases where you want to fetch one object at a time, but it seems like the approach breaks down when you need to show a list of objects along with each of their associations (e.g. in a View or in a JSON response in the case of an API).
Whereas in the case of using purely ActiveRecord, you can use includes and eager loading to avoid running N more queries, I can't seem to think of how to do so when depending purely on Redis to model relationships.
For instance, suppose you have the following (taken from the very helpful redis_on_rails project):
class Conference < ActiveRecord::Base
def attendees
# Attendee.find(rdb[:attendee_ids])
Attendee.find_all_by_id(rdb[:attendee_ids].smembers)
end
def register(attendee)
Redis.current.multi do
rdb[:attendee_ids].sadd(attendee.id)
attendee.rdb[:conference_ids].sadd id
end
end
def unregister(attendee)
Redis.current.multi do
rdb[:attendee_ids].srem(attendee.id)
attendee.rdb[:conference_ids].srem id
end
end
end
If I did something like
conferences = Conference.first(20)
conferences.each {|c|
c.attendees.each {|a| puts a.name}
}
I'm simply getting the first 20 conferences and getting the attendees in each and printing them out, but you can imagine a case where I am rendering the conferences along with a list of the attendees in a list in a view. In the above case, I would be running into the classic N+1 query problem.
If I had modeled the relationship in SQL along with has_many, I would have been able to use the includes function to avoid the same problem.
Ideas, links, questions welcome.
Redis can provide major benefits to your application's infrastructure, but I've found that, due to the specific operations you can perform on the various data types, you really need to put some thought ahead of time into how you're going to access your data. In this example, if you are very often iterating over a bunch of conferences and outputting the attendees, and are not otherwise benefiting from Redis' ability to do rich set operations (such as intersections, unions, etc.), maybe it's not a good fit for that data model.
On the other hand, if you are benefiting from Redis in performance-intensive parts of your application, it may be worth eating the occasional N+1 GET on Redis in order to reap those benefits. You have to do profiling on the parts of the app that you care about to see if the tradeoffs are worth it.
You may also be able to structure your data in Redis/your application in such a way that you can avoid the N+1 GETs; for example, if you can get all the keys up front, you can use MGET to get all the keys at once, which is a fast O(N) operation, or use pipelining to avoid network latency for multiple lookups.
In an application I work on, we've built a caching layer that caches the foreign key IDs for has_many relationships so that we can do fast lookups on cached versions of a large set of models that have complex relationships with each other; while fetching these by SQL, we generate very large, relatively slow SQL queries, but by using Redis and the cached foreign keys, we can do a few MGETs without hitting the database at all. However, we only arrived at that solution by investigating where our bottlenecks were and discussing how we might avoid them.

Resources