Rails active record, performance when chaining where clauses - ruby-on-rails

I was just hoping to get some feedback to ensure that I'm understanding correctly. From what I've read, my understanding is that having multiple instances of where shouldn't hurt the performance of an application since the ActiveRecord relation objects themselves do not execute an SQL query until the chaining has been completed
In my case, say I have a table called Game and a table called Players. Games have many players, and players are involved in many games. I want to allow the user to type a name into a search box and search all of the games and filter them by the player name. If the user hasn't entered anything, it should return the list of all games where the attribute prospective == false
This code works fine for me functionally. I just want to ensure that from a performance standpoint, I'm not doing anything suboptimal.
def controller_name
games = Game.where.not(game: :prospective).order('score DESC')
if search_text.present?
games = games.joins(:players).where("lower(players.name) LIKE ?", "%#{search_text.downcase}%")
end
return games
end
In my case, I'm actually using GraphQL, but I don't think that should make a difference. I just wanted to get confirmation that this isn't going to execute 2 separate queries in that case that search_text.present? == true
I also welcome any other suggestions for optimizing or writing this more concisely. I'm fairly novice with rails

Related

Include vs Join

I have 3 models
User - has many debits and has many credits
Debit - belongs to User
Credit - belongs to User
Debit and credit are very similar. The fields are basically the same.
I'm trying to run a query on my models to return all fields from debit and credit where user is current_user
User.left_outer_joins(:debits, :credits).where("users.id = ?", #user.id)
As expected returned all fields from User as many times as there were records in credits and debits.
User.includes(:credits, :debits).order(created_at: :asc).where("users.id = ?", #user.id)
It ran 3 queries and I thought it should be done in one.
The second part of this question is. How I could I add the record type into the query?
as in records from credits would have an extra field to show credits and same for debits
I have looked into ActiveRecordUnion gem but I did not see how it would solve the problem here
includes can't magically retrieve everything you want it to in one query; it will run one query per model (typically) that you need to hit. Instead, it eliminates future unnecessary queries. Take the following examples:
Bad
users = User.first(5)
users.each do |user|
p user.debits.first
end
There will be 6 queries in total here, one to User retrieving all the users, then one for each .debits call in the loop.
Good!
users = User.includes(:debits).first(5)
users.each do |user|
p user.debits.first
end
You'll only make two queries here: one for the users and one for their associated debits. This is how includes speeds up your application, by eagerly loading things you know you'll need.
As for your comment, yes it seems to make sense to combine them into one table. Depending on your situation, I'd recommend looking into Single Table Inheritance (STI). If you don't go this route, be careful with adding a column called type, Rails won't like that!
First of all, in the first query, by calling the query on User class you are asking for records of type User and if you do not want user objects you are performing an extra join which could be costly. (COULD BE not will be)
If you want credit and debit records simply call queries on Credit and Debit models. If you load user object somewhere prior to this point, use includes preload eager_load to do load linked credit and debit record all at once.
There is two way of pre-loading records in Rails. In the first, Rails performs single query of each type of record and the second one Rails perform only a one query and load objects of different types using the data returned.
includes is a smart pre-loader that performs either one of the ways depending on which one it thinks would be faster.
If you want to force Rails to use one query no matter what, eager_load is what you are looking for.
Please read all about includes, eager_load and preload in the article here.

Optimizing has many record association query

I have this query that I've built using Enumerable#select. The purpose is to find records thave have no has many associated records or if it does have those records select only those with it's preview attribute set to true. The code below works perfectly for that use case. However, this query does not scale well. When I test against thousands of records it takes several hundred seconds to complete. How can this query be improved upon?
# User has many enrollments
# Enrollment belongs to user.
users_with_no_courses = User.includes(:enrollments).select {|user| user.enrollments.empty? || user.enrollments.where(preview: false).empty?}
So first, make sure enrollments.user_id has an index.
Second, you can speed this up by not loading all the enrollments, and doing your filtering in SQL:
User.where(<<-EOQ)
NOT EXISTS (SELECT 1
FROM enrollments e
WHERE e.user_id = users.id
AND NOT e.preview)
EOQ
By the way here I'm simplifying your two conditions into one: "no enrollments or no real enrollments" is the same as "no real enrollments".
If you want you can put this condition into a scope so it is more reusable.
Third, this is still going to be slow if you're instantiating thousands of User objects. So I would look into paginating if that makes sense, or find_each if this is an offline script. Or use raw SQL to avoid all the object instances.
Oh by the way: even though you are saying includes(:enrollments), this will still go back to the database, giving you an n+1 problem:
user.enrollments.where(preview: false)
That is because the where means ActiveRecord can't use the already-loaded association. You can avoid that by using select instead of where. But not loading the enrollments in the first place is even better.

Duplicating logic in methods and scopes (and sql)

Named scopes really made this problem easier but it is far from being solved. The common situation is to have logic redefined in both named scopes and model methods.
I'll try to demonstrate the edge case of this by using somewhat complex example. Lets say that we have Message model that has many Recipients. Each recipient is being able to mark the message as being read for himself.
If you want to get the list of unread messages for given user, you would say something like this:
Message.unread_for(user)
That would use the named scope unread_for that would generate the sql which will return the unread messages for given user. This sql is probably going to join two tables together and filter messages by those recipients that haven't already read them.
On the other hand, when we are using the Message model in our code, we are using the following:
message.unread_by?(user)
This method is defined in message class and even it is doing basically the same thing, it now has different implementation.
For simpler projects, this is really not a big thing. Implementing the same simple logic in both sql and ruby in this case is not a problem.
But when application starts to get really complex, it starts to be a problem. If we have permission system implemented that checks who is able to access what message based on dozens of criteria defined in dozens of tables, this starts to get very complex. Soon it comes to the point where you need to join 5 tables and write really complex sql by hand in order to define the scope.
The only "clean" solution to the problem is to make the scopes use the actual ruby code. They would fetch ALL messages, and then filter them with ruby. However, this causes two major problems:
Performance
Pagination
Performance: we are creating a lot more queries to the database. I am not sure about internals of DMBS, but how harder is it for database to execute 5 queries each on single table, or 1 query that is going to join 5 tables at once?
Pagination: we want to keep fetching records until specified number of records is being retrieved. We fetch them one by one and check whether it is accepted by ruby logic. Once 10 of them are accepted, process will stop.
Curious to hear your thoughts on this. I have no experience with nosql dbms, can they tackle the issue in different way?
UPDATE:
I was only speaking hypotetical, but here is one real life example. Lets say that we want to display all transactions on the one page (both payments and expenses).
I have created SQL UNION QUERY to get them both, then go through each record, check whether it could be :read by current user and finally paginated it as an array.
def form_transaction_log
sql1 = #project.payments
.select("'Payment' AS record_type, id, created_at")
.where('expense_id IS NULL')
.to_sql
sql2 = #project.expenses
.select("'Expense' AS record_type, id, created_at")
.to_sql
result = ActiveRecord::Base.connection.execute %{
(#{sql1} UNION #{sql2})
ORDER BY created_at DESC
}
result = result.map do |record|
klass = Object.const_get record["record_type"]
klass.find record["id"]
end.select do |record|
can? :read, record
end
#transactions = Kaminari.paginate_array(result).page(params[:page]).per(7)
end
Both payments and expenses need to be displayed within same table, ordered by creation date and paginated.
Both payments and expenses have completely different :read permissions (defined in ability class, CanCan gem). These permission are quite complex and they require querieng several other tables.
The "ideal" thing would be to write one HUGE sql query that would do return what I need. It would made pagination and everything else a lot easier. But that is going to duplicate my logic defined in ability.rb class.
I'm aware that CanCan provides a way of defining the sql query for the ability, but the abilities are so complex, that they couldn't be defined in that way.
What I did is working, but I'm loading ALL transactions, and then checking which ones I could read. I consider it a big performance issue. Pagination here seems pointless because I'm already loading all records (it only saves bandwidth). An alternative is to write really complex SQL that is going to be hard to maintain.
Sounds like you should remove some duplication and perhaps use DB logic more. There's no reason that you can't share code between named scopes between other methods.
Can you post some problematic code for review?

Rails Minimizing Database Load

I am relatively new to rails. I understand that rails lets you play with your database values with much ease but I am a little bit in the blind about what kind of approach is more energy efficient on the database and which not.
Here is a case in point. I have a model appointment which belongs_to user. In my syntax I can sometimes say process_user #appointment.user. When I write that, does that run a separate SELECT query on the database to retrieve that user? Is it more efficient to write process_user #appointment.user_id where user_id is an attribute in the appointment and then try use the user_id value to perform my evaluation related tasks as long as I don't need the whole user object #appointment.user.
Frankly, from a peace of mind point of view, I just love to be able to use process_user #appointment.user because it reads better, looks nicer and works better when preparing logic. Is it a performance efficient way?
You are perfectly fine with using code like process_user #appointment.user, as ActiveRecord tries its best to minimize the number of database queries. Of course it does not handle all situations perfectly, but your example is a very basic one. There would probably no immediate database query happen and the object would only be loaded when its attributes are accessed.
If you notice performance problems in a running large-scaled application and you can track the problems down to ActiveRecord using profiling, it is probably time to optimize. Trying to pre-optimize from the very beginning would be against Rails' philosophy and will only result in ugly (and possible even slower) code. Remember that the real performance bottlenecks are often at places where you would never expect them.
EDIT: As Winfield pointed out, optimizing the number of queries does usually not mean to manage foreign keys or similar internals by yourself. There are quite a number of flags and options for DB access methods that allow you to control how your database is queries.
You can eagerly load your associated users with your Appointment models:
Appointment.all(:include => :user)
...which will join in the users or do a separate lookup for all the associated users in a single query.
This will then load the user association in advance (eagerly) so the user attribute is already populated with the object when you reference it, instead of having to stop and execute a separate query to look it up one by one (N+1 queries).

Rails optimization Question

In Rails while using activeRecord why are join queries considered bad.
For example
Here i'm trying to find the number of companies that belong to a certain category.
class Company ActiveRecord::Base
has_one :company_profile
end
Finding the number of Company for a particular category_id
number_of_companies = Company.find(:all, :joins=>:company_profile, :conditions=>["(company_profiles.category_id = #{c_id}) AND is_published = true"])
How could this be better or is it just poor design?
company_profiles = CompanyProfile.find_all_by_category_id(c_id)
companies = []
company_profiles.each{|c_profile| companies.push(c_profile.company) }
Isn't it better that the first request creates a single query while i'd be running several queries for the second case.
Could someone explain why joins are considered to be bad practice in Rails
Thanks in advance
To my knowledge, there is no such rule. The rule is to hit the database as least as possible, and rails gives you the right tools for that, using the joins.
The example Sam gives above is exemplary. Simple code, but behind the scenes rails has to do two queries, instead of only one using a join.
If there is one rule that comes to mind, that i think is related, is to avoid SQL where possible and use the rails way as much as possible. This keeps your code database agnostic (as rails handles the differences for you). But sometimes even that is unavoidable.
It comes down to good database design, creating the correct indexes (which you need to define manually in migrations), and sometimes big nested structures/joins are needed.
Join queries are not bad, in fact, they are good, and ActiveRecord has them at its very heart. You don't need to break into find_by_sql to use them, options like :include will handle it for you. You can stay within the ORM, which gives the readability and ease of use, whilst still, for the most part, creating very efficient SQL (providing you have your indexes right!)
Bottom line - you need to do the bare minimum of database operations. Joins are a good way of letting the database do the heavy lifting for you, and lowering the number of queries that you execute.
By the by, DataMapper and Arel (the query engine in Rails 3) feature a lot of lazy loading - this means that code such as:
#category = Category.find(params[:id])
#category.companies.size
Would most likely result in a join query that only did a COUNT operation, as the first line wouldn't result in a query being sent to the db.
If you just want to find the number of companies on a category all you need to do is find the category and then call the association name and size because it will return an array.
#category = Category.find(params[:id])
#category.companies.size

Resources