Eager-load an association but ONLY a single column - ruby-on-rails

I have a self-referential association like this:
class User
belongs_to :parent_user, class_name: "User"
has_many :child_users, class_name: "User", foreign_key: :parent_user_id
end
When I get a list of users, I would like to be able to eager-load the child_users association, but I only need the id column from it. I need to be able to do something like this without causing n+1 queries, and preferably without having to load all of the other data in this model:
users = User.preload(:child_users)
users.map(&:child_user_ids)
The above works to limit the query to two queries instead of n+1, but I'm loading a lot of full objects into memory that I would prefer to avoid when I only care about the id itself on those child assocations.

You don't want eager loading which is for loading models. You want to select an aggregate. This isn't something the ActiveRecord query interface handles so you'll need to use SQL strings or Arel and use the correct function for your DB.
For example:
# This is for Postgres, use JSON_ARRAYAGG on MySQL
User.left_joins(:child_users)
.select(
"users.*",
'json_agg("child_users_users".id) AS child_ids'
)
.group(:id)
child_users_users is the wonky alias that .left_joins(:child_users) creates.
At least on Postgres the driver (the PG gem) will automatically cast the result to a an array but YMMV on the other dbs.

Related

ActiveRecord WHERE with namespaced models

I've two models within the same namespace/module:
module ReverseAuction
class Demand < ApplicationRecord
belongs_to :purchase_order, inverse_of: :demands, counter_cache: true
end
end
module ReverseAuction
class PurchaseOrder < ApplicationRecord
has_many :demands
end
end
note that I don't have to specify the class_name for the models cause they're in the same module and the relations are working well this way.
When I try to query a includes with the name of the relation by itself, it works fine, like:
ReverseAuction::PurchaseOrder.all.includes(:demands) # all right .. AR is able to figure out that *:demands* correspond to the 'reverse_auction_demands' table
But when I try to use a where in this query, AR seems to be unable to figure out the (namespaced) table name by itself, so:
ReverseAuction::PurchaseOrder.includes(:demands).where(demands: {user_id: 1}) # gives me error: 'ERROR: missing FROM-clause entry for table "demands"'
But if I specify the full resolved (namespaced) model name, then where goes well:
ReverseAuction::PurchaseOrder.includes(:demands).where(reverse_auction_demands: {user_id: 1}) # works pretty well
Is that normal that AR can infere table name of namespaced models from relations in includes but can't in where, or am I missing the point?
Is that normal that AR can infere table name of namespaced models from
relations in includes but can't in where?
Yes. This is an example of a leaky abstraction.
Assocations are an objection oriented abstraction around SQL joins, to let you do the fun stuff while AR worries about writing the SQL to join them and maintaining the in memory couplings between the records. .joins, .left_joins .includes and .eager_load are "aware" of your assocations and go through that abstraction. Because you have this object oriented abstraction .includes is smart enough to figure out how the module nesting should effect the class names and table names when writing joins.
.where and all the other parts of the ActiveRecord query interface are not as smart. This is just an API that generates SQL queries programmatically.
When you do .where(foo: 'bar') its smart enough to translate that into WHERE table_name.foo = 'bar' because the class is aware of its own table name.
When you do .where(demands: {user_id: 1}) the method is not actually aware of your associations, other model classes or the schema and just generates WHERE demands.user_id = 1 because that's how it translates a nested hash into SQL.
And note that this really has nothing to do with namespaces. When you do:
.where(reverse_auction_demands: {user_id: 1})
It works because you're using the right table name. If you where using a non-conventional table name that didn't line up with the model you would have the exact same issue.
If you want to create a where clause based on the class without hardcoding the table name pass a scope to where:
.where(
ReverseAuction::Demand.where(user_id: 1)
)
or use Arel:
.where(
ReverseAuction::Demand.arel_table[:user_id].eq(1)
)

Issue with polymorphic ActiveRecord query

I have three models with the following associations:
User has_many :owns, has_many :owned_books, :through => :owns, source: :book
Book has_many :owns
Own belongs_to :user, :counter_cache => true, belongs_to :book
I also have a page that tracks the top users by owns with the following query:
User.all.order('owns_count desc').limit(25)
I would now like to add a new page which can track top users by owns as above, but with a condition:
Book.where(:publisher => "Publisher #1")
What would be the most efficient way to do this?
I'm interesting if there is something special for this case, but my shot would be the following.
First, I don't see how polymorphic association can be applied here. You have just one object (user) that book can belong to. As I understand, polymorphic is for connecting book to several dif. objects (e.g. to User, Library, Shelf, etc.) (edit - initial text of question mentioned polymorphic associations, now it doesn't)
Second, I don't believe there is a way to cache counters here, as long as "Publisher #1" is a varying input parameter, and not a set of few pre-defined and known publishers (few constants).
Third, I would assume that amount of books by single Publisher is relatively limited. So even if you have millions of books in your table, amount of books per publisher should be hundreds maximum.
Then you can first query for all Publisher's books ids, e.g.
book_ids = Book.where(:publisher => "Publisher #1").pluck(:id)
And then query in owns table for top users ids:
Owns.select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order(total_owns: :desc).limit(25)
Disclaimer - I didn't try the statement in rails console, as I don't have your objects defined. I'm basing on group call in ActiveRecord docs
Edit. In order to make things more efficient, you can try the following:
0) Just in case, ensure you have indexes on Owns table for both foreign keys.
1) Use pluck for the second query as well not to create Own objects, although should not be a big difference because of limit(25). Something like this:
users_ids = Owns.where(book_id: book_ids).group(:user_id).order("count(*) DESC").limit(25).pluck("user_id")
See this question for reference.
2) Load all result users in one subsequent query and not N queries for each user
top_users = User.where(:id => users_ids)
3) Try joining User table in the first order:
owns_res = Owns.includes(:user).select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order("total_owns DESC").limit(25)
And then use owns_res.first.user

Rails 4 Eager Load has_many Associations for single object

I get the benefits of using eager loading to avoid N+1 when fetching an array of objects with their associated records, but is it important to use eager loading when fetching a single record?
In my case
user has_many :addresses
user has_many :skills
user has_many :devices
user has_many :authentications
In the show action, I am trying to see with rack mini profiler if it is interesting to use eager loading
User.includes(:skills, :addresses, :devices, :authentications).find(params[:id])
But I seem to have the same number of sql requests..
Any advice on using eager loading for such case or not?
is it important to use eager loading when fetching a single record?
For associations one level deep, no.
If you have nested associations, yes.
# class User
has_many :skills
# class Skill
belongs_to :category
# this code will fire N + 1 queries for skill->category
user.skills.each do |skill|
puts skill.category
end
In this case, it is better to eager load skills: :category
User.includes(skills: :category).find(id)
Edit
Rails provide two ways to avoid N+1 queries, which it refers to as preloading and eager_loading.
Preload fires individual SQL queries for each collection.
Eager load attempts to construct one massive left-joined SELECT to retrieve all collections in 1 query.
The short version is that includes lets Rails pick which one to use. But you can force one way or the other.
User.eager_load(:skills, :addresses, :devices, :authentications).find(params[:id])
Should retrieve all records in 1 query.
Further reading:
What's the difference between “includes” and “preload” in an ActiveRecord query?
http://blog.bigbinary.com/2013/07/01/preload-vs-eager-load-vs-joins-vs-includes.html
http://blog.arkency.com/2013/12/rails4-preloading/
Try using the Bullet gem for detecting unused or missing eager loading. It's designed to tell you if there are wasted include statements, or inefficient N+1 queries, where includes would help.
If there's a problem, it can be configured to output to the Rails logger to let you know. Or you can have it show you a notification in the browser on pages that need optimising.

ORDER BY and DISTINCT ON (...) in Rails

I am trying to ORDER by created_at and then get a DISTINCT set based on a foreign key.
The other part is to somehow use this is ActiveModelSerializer. Specifically I want to be able to declare:
has_many :somethings
In the serializer. Let me explain further...
I am able to get the results I need with this custom sql:
def latest_product_levels
sql = "SELECT DISTINCT ON (product_id) client_product_levels.product_id,
client_product_levels.* FROM client_product_levels WHERE client_product_levels.client_id = #{id} ORDER BY product_id,
client_product_levels.created_at DESC";
results = ActiveRecord::Base.connection.execute(sql)
end
Is there any possible way to get this result but as a condition on a has_many relationship so that I can use it in AMS?
In pseudo code: #client.products_levels
Would do something like: #client.order(created_at: :desc).select(:product_id).distinct
That of course fails for reasons that are beyond me.
Any help would be great.
Thank you.
A good way to structure this is to split your query into two parts: the first part manages the filtering of rows so that you get only your latest client product levels. The second part uses a standard has_many association to connect Client with ClientProductLevel.
Starting with your ClientProductLevel model, you can create a scope to do the latest filtering:
class ClientProductLevel < ActiveRecord::Base
scope :latest, -> {
select("distinct on(product_id) client_product_levels.product_id,
client_product_levels.*").
order("product_id, created_at desc")
}
end
You can use this scope anywhere that you have a query that returns a list of ClientProductLevel objects, e.g., ClientProductLevel.latest or ClientProductLevel.where("created_at < ?", 1.week.ago).latest, etc.
If you haven't already done so, set up your Client class with a has_many relationship:
class Client < ActiveRecord::Base
has_many :client_product_levels
end
Then in your ActiveModelSerializer try this:
class ClientSerializer < ActiveModel::Serializer
has_many :client_product_levels
def client_product_levels
object.client_product_levels.latest
end
end
When you invoke the ClientSerializer to serialize a Client object, the serializer sees the has_many declaration, which it would ordinarily forward to your Client object, but since we've got a locally defined method by that name, it invokes that method instead. (Note that this has_many declaration is not the same as an ActiveRecord has_many, which specifies a relationship between tables: in this case, it's just saying that the serializer should present an array of serialized objects under the key `client_product_levels'.)
The ClientSerializer#client_product_levels method in turn invokes the has_many association from the client object, and then applies the latest scope to it. The most powerful thing about ActiveRecord is the way it allows you to chain together disparate components into a single query. Here, the has_many generates the `where client_id = $X' portion, and the scope generates the rest of the query. Et voila!
In terms of simplification: ActiveRecord doesn't have native support for distinct on, so you're stuck with that part of the custom sql. I don't know whether you need to include client_product_levels.product_id explicitly in your select clause, as it's already being included by the *. You might try dumping it.

Eager loading of polymorphic associations in ActiveRecord

This is my first time using Rails and I was wondering if it's possible to load a has one polymorphic association in one SQL query? The models and associations between them are basic enough: An Asset model/table can refer to a content (either an Image, Text, or Audio) through a polymorphic association, i.e.
class Asset < ActiveRecord::Base
:belongs_to :content, :polymorphic => true
end
and the Image, Text, Audio are defined like this:
class Image < ActiveRecord::Base
:has_one :asset, :as => :content
end
When I try to load an Image, say like this:
Image.first(
:conditions => {:id => id},
:include => :asset
)
It specifies two queries, one to retrieve the Image and another to retrieve the Asset (FYI, this happens also if I specify a :joins). Based on my understanding, ActiveRecord does this because it doesn't know there's a one-to-one association between Image and Asset. Is there a way to force a join and retrieve the 2 objects in one go? I've also tried using join with a custom select, but I end up having to create the ActiveRecord models manually and their associations.
Does ActiveRecord provide a way to do this?
After digging through the Rails source I've discovered that you can force a join by referencing a table other than the current model in either the select, conditions or order clauses.
So, if I specify an order on the Asset table:
Image.first(
:conditions => {:id => id},
:include => :asset,
:order => "asset.id"
)
The resulting SQL will use a left outer join and everything in one statement. I would have preferred an inner join, but I guess this will do for now.
I ran up against this issue myself. ActiveRecord leans more toward the end of making it easy for Rubyists (who may not even be all too familiar with SQL) to interface with the database, than it does with optimized database calls. You might have to interact with the database at a lower level (e.g. DBI) to improve your performance. Using ActiveRecord will definitely affect how you design your schema.
The desire for SQL efficiency got me thinking about using other ORMs. I haven't found one to suit my needs. Even those that move more toward transact SQL itself (e.g. Sequel) have a heavy Ruby API. I would be content without a Ruby-like API and just manually writing my T-SQL. The real benefit I'm after with an ORM is the M, mapping the result set (of one or more tables) into the objects.
(This is for Rails 3 syntax.)
MetaWhere is an awesome gem for making complex queries that are outside of ActiveRecord's usual domain easy and ruby-like. (#wayne: It supports outer joins as well.)
https://github.com/ernie/meta_where
Polymorphic joins are a little trickier. Here's how I did mine with MetaWhere:
Image.joins(:asset.type(AssetModel)).includes(:asset)
In fact, I made a convenience method in my polymorphic-association-having class:
def self.join_to(type)
joins(:asset.type(type)).includes(:asset)
end
So it will always query & eager load a particular type of asset. I haven't tried mixing it with multiple types of join in one query. Because polymorphism works using the same foreign key to reference different tables, on the SQL level you have to specify it as separate joins because one join only joins to one table.
This only works with ActiveRecord 3 because you have access to the amazing power of AREL.

Resources