Rails ActiveRecord/Arel query aggregate column with select() - ruby-on-rails

I've got two basic models with a join table. I've added a scope to compute a count through the relation and expose it as an attribute/psuedo-column. Everything works fine, but I'd now like to query a subset of columns and include the count column, but I don't know how to reference it.
tldr; How can I include an aggregate such as a count in my Arel query while also selecting a subset of columns?
Models are Employer and Employee, joined through Job. Here's the relevant code from Employer:
class Employer < ApplicationRecord
belongs_to :user
has_many :jobs
has_many :employees, through: :jobs
scope :include_counts, -> do
left_outer_joins(:employees).
group("employers.id").
select("employers.*, count(employees.*) as employees_count")
end
end
This allows me to load an employer with counts:
employers = Employer.include_counts.where(id: 1)
And then reference the count:
count = employers[0].employees_count
I'm loading the record in my controller, which then renders it. I don't want to render more fields than I need to, though. Prior to adding the count, I could do this:
employers = Employer.where(id: 1).select(:id, :name)
When I add my include_counts scope, it basically ignores the select(). It doesn't fail, but it ends up including ALL the columns, because of this line in my scope:
select("employers.*, count(employees.*) as employees_count")
If I remove employers.* from the scope, then I don't get ANY columns in my result, with or without a select() clause.
I tried this:
employers = Employer.include_counts.where(id: 1).select(:id, :name, :employee_counts)
...but that produces the following SQL:
SELECT employers.*, count(employees.*) as employees_count, id, name, employees_count FROM
...and an SQL error because column employees_count doesn't exist and id and name are ambiguous.
The only thing that sort of works is this:
employers = Employer.include_counts.where(id: 1).select("employers.id, employers.name, count(employees.*) as employees_count")
...but that actually selects ALL the columns in employers, due to the scope clause again.
I also don't want that raw SQL leaking into my controller if I can avoid it. Is there a more idiomatic way to do this with Rails/Arel?
If I can't find another way to do the query, I'll probably create another scope or custom finder in the model, so that the controller code is cleaner. I'm open to suggestions for doing that as well, but I'd like to know if there's a simple way to reference computed aggregate columns like this as though they were any other column.

Related

Optimizing queries in a has_many_through association with three models

Trying to avoid n+1 query
I'm working on a web based double entry accounting application that has the following basic models;
ruby
class Account < ApplicationRecord
has_many :splits
has_many :entries, through: :splits
end
class Entry < ApplicationRecord
has_many :splits, -> {order(:account_id)}, dependent: :destroy, inverse_of: :entry
attribute :amount, :integer
attribute :reconciled
end
class Split < ApplicationRecord
belongs_to :entry, inverse_of: :splits
belongs_to :account
attribute :debit, :integer
attribute :credit, :integer
attribute :transfer, :string
end
This is a fairly classic Accounting model, at least it is patterned after GnuCash, but it leads to somewhat complex queries. (From ancient history this is pretty much a 3rd normal form structure!)
First Account is a hierarchal tree structure (an Account belongs to a parent (except ROOT) and my have many children, children may also have many children, which I call a family). Most of these relations are covered in the Account model and optimized as much as you can a recursive structure.
An Account has many Entries(transactions) and entries must have at least two Splits that the sum of the Amount attribute(or Debits/Credits) must equal 0.
The primary use of this structure is to produce Ledgers, which is just a list of Entries and their associated Splits usually filtered by a date range. This is fairly simple if the account has no Family/Children
ruby
# self = a single Account
entries = self.entries.where(post_date:#bom..#eom).includes(:splits).order(:post_date,:numb)
It get more complex if you want a ledger of an account that has many children (I want a Ledger of all Current Assets)
ruby
def self.scoped_acct_range(family,range)
# family is a single account_id or array of account_ids
Entry.where(post_date:range).joins(:splits).
where(splits: {account_id:family}).
order(:post_date,:numb).distinct
end
While this works, I guess I have an n+1 query because if I use includes instead of joins I won't get all the splits for an Entry, only those in the family - I want all splits. That means it reloads(queries) the splits in the view. Also distinct is needed because a split could reference an account multiple time.
My question is there a better way to handle this three model query?
I threw together a few hacks, one going backwards from splits:
ruby
def self.scoped_split_acct_range(family,range)
# family is a single account_id or array of account_ids
# get filtered Entry ids
entry_ids = Split.where(account_id:family).
joins(:entry).
where(entries:{post_date:range}).
pluck(:entry_id).uniq
# use ids to get entries and eager loaded splits
Entry.where(id:eids).includes(:splits).order(:post_date,:numb)
end
This also works and by the ms reported in log, may even be faster. Normal use of either would be looking at 50 or so Entries for a month, but then you can filter a years worth of transactions - but you get what you asked for. For normal use, an ledger for a month is about 70ms, Even a quarter is around 100ms.
I've used a few attributes in both Splits and Accounts that got rid a few view level queries. Transfer is basically concatenated Account names going up the tree.
Again, just looking to see if I'm missing something and there is a better way.
Using a nested select is the proper option IMO.
You can optimize your code with the nested select to use the following:
entry_ids = Entry.where(post_date: range)
.joins(:splits)
.where(post_date: range, splits: { account_id: family })
.select('entries.id')
.distinct
Entry.where(id: entry_ids).includes(:splits).order(:post_date,:numb)
This will generate a single SQL statement with a nested select, instead of having 2 SQL queries: 1 to get the Entry ids and pass it to Rails and 1 other query to select entries based on those ids.
The following gem, developed by an ex-colleague, can help you deal with this kind of stuff: https://github.com/MaxLap/activerecord_where_assoc
In your case, it would enable you to do the following:
Entry.where_assoc_exists(:splits, account_id: 123)
.where(post_date: range)
.includes(:splits)
.order(:post_date, :numb)
Which does the same thing as I suggested but behind the scene.

Selecting records based upon conditions on multiple associations that use the same table

I have what is actually a common situation in Rails where I have a model that has two associations. I want to be able to search for specific records defined by the model based upon conditions on either one or both of these associations.
The twist is that these two associations use the same table.
Here's the main model:
class Resource < ActiveRecord::Base
belongs_to :primary_item, class_name: 'Item', foreign_key: 'primary_item_Id'
belongs_to :secondary_item, class_name: 'Item', foreign_key: 'secondary_item_Id'
...
And the Item model:
class Item < ActiveRecord::Base
has_many :res_primary, class_name: 'Resource', foreign_key: 'primary_item_Id'
has_many :res_secondary, class_name: 'Resource', foreign_key: 'secondary_item_Id'
...
Let's suppose an Item has a string field called name. It's just the name of the item. I want to be able to find all of the resources that have a primary and/or secondary item that is like a given name. Here's what that looks like (I think) if I am just filtering for a primary item:
#resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
The primary_search_string is just the string I want to match in the name. This works fine. A similar search works for the secondary items.
Now suppose I'd like the user to be able to search for resources that have either a given primary item name, a given secondary item by name, or both (each with its own name). I would have a primary_search_string and a secondary_search_string with independent values. One of them could be nil which would mean I don't want to narrow the search based upon that string. What I want is to filter the resources by either or both strings depending upon whether they are nil. Here is what I ended up with which works but seems awkward:
#resources = <some query that obtains an active record relation of resources>
if primary_search_string then
#resources = #resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
if secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("secondary_items_#{Item.table_name}.name like #{secondary_search_string}")
end
elsif secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("#{Item.table_name}.name like #{secondary_search_string}")
end
Note how that if I am only joining one table, the table's name is known by Item.table_name. However, if I have to join both tables, then Rails must distinguish the second instance of the table by specifying the name further with the association name: secondary_items_#{Item.table_name}.
My question is whether there's a somewhat simpler way of handling this that doesn't involve having to reference the table names of the associations in the where clauses? Since I am referencing the table names of the associations, and the table name may be different depending upon whether I'm joining one of them or both of them, I end up checking the search string for nil, which is a way to determine whether I'm joining the Item table more than once. In this particular program example, I can check for it and live with it being awkward. But what if I didn't know whether #resources was previously joined to the primary items and I wanted to filter based upon the secondary items? I wouldn't know, in that case, what the associations table name would be to use in the where clause since I wouldn't know if it were already joined or not.
I suspect there may be a better way here, which is the essence of my question.
If you have two separate associations, but they are both to the same type of child object (eg. Resource), then instead of joins/includes, you could just focus on finding the Resource records that match either the primary_item_id or secondary_item_id of the parent Item record you want.
Rails 3/4 doesn't natively support OR queries, but one way to brute-force it is by finding IDs of Resources that belong to the Item as a primary or secondary association. (This is obviously inefficient because you are doing multiple database queries.)
ids = Resource.where(primary_item_id: #item.id).map(&:id)
ids << Resource.where(secondard_item_id: #item.id).map(&:id)
#special_resources = Resource.where(id: ids, name: 'Some Special Name')
Rails 5 supports 'or' queries, so it's much simpler:
resources = Resource.where(primary_item_id: #item.id).or(secondary_item_id: #item.id)
#special_resources = resources.where(name: 'Some Special Name')

has_one association not working with includes

I've been trying to figure out some odd behavior when combining a has_one association and includes.
class Post < ApplicationRecord
has_many :comments
has_one :latest_comment, -> { order('comments.id DESC').limit(1) }, class_name: 'Comment'
end
class Comment < ApplicationRecord
belongs_to :post
end
To test this I created two posts with two comments each. Here are some rails console commands that show the odd behavior. When we use includes then it ignores the order of the latest_comment association.
posts = Post.includes(:latest_comment).references(:latest_comment)
posts.map {|p| p.latest_comment.id}
=> [1, 3]
posts.map {|p| p.comments.last.id}
=> [2, 4]
I would expect these commands to have the same output. posts.map {|p| p.latest_comment.id} should return [2, 4]. I can't use the second command because of n+1 query problems.
If you call the latest comment individually (similar to comments.last above) then things work as expected.
[Post.first.latest_comment.id, Post.last.latest_comment.id]
=> [2, 4]
If you have another way of achieving this behavior I'd welcome the input. This one is baffling me.
I think the cleanest way to make this work with PostgreSQL is to use a database view to back your has_one :latest_comment association. A database view is, more or less, a named query that acts like a read-only table.
There are three broad choices here:
Use lots of queries: one to get the posts and then one for each post to get its latest comment.
Denormalize the latest comment into the post or its own table.
Use a window function to peel off the latest comments from the comments table.
(1) is what we're trying to avoid. (2) tends to lead to a cascade of over-complications and bugs. (3) is nice because it lets the database do what it does well (manage and query data) but ActiveRecord has a limited understanding of SQL so a little extra machinery is needed to make it behave.
We can use the row_number window function to find the latest comment per-post:
select *
from (
select comments.*,
row_number() over (partition by post_id order by created_at desc) as rn
from comments
) dt
where dt.rn = 1
Play with the inner query in psql and you should see what row_number() is doing.
If we wrap that query in a latest_comments view and stick a LatestComment model in front of it, you can has_one :latest_comment and things will work. Of course, it isn't quite that easy:
ActiveRecord doesn't understand views in migrations so you can try to use something like scenic or switch from schema.rb to structure.sql.
Create the view:
class CreateLatestComments < ActiveRecord::Migration[5.2]
def up
connection.execute(%q(
create view latest_comments (id, post_id, created_at, ...) as
select id, post_id, created_at, ...
from (
select id, post_id, created_at, ...,
row_number() over (partition by post_id order by created_at desc) as rn
from comments
) dt
where dt. rn = 1
))
end
def down
connection.execute('drop view latest_comments')
end
end
That will look more like a normal Rails migration if you're using scenic. I don't know the structure of your comments table, hence all the ...s in there; you can use select * if you prefer and don't mind the stray rn column in your LatestComment. You might want to review your indexes on comments to make this query more efficient but you'd be doing that sooner or later anyway.
Create the model and don't forget to manually set the primary key or includes and references won't preload anything (but preload will):
class LatestComment < ApplicationRecord
self.primary_key = :id
belongs_to :post
end
Simplify your existing has_one to just:
has_one :latest_comment
Maybe add a quick test to your test suite to make sure that Comment and LatestComment have the same columns. The view won't automatically update itself as the comments table changes but a simple test will serve as a reminder.
When someone complains about "logic in the database", tell them to take their dogma elsewhere as you have work to do.
Just so it doesn't get lost in the comments, your main problem is that you're abusing the scope argument in the has_one association. When you say something like this:
Post.includes(:latest_comment).references(:latest_comment)
the scope argument to has_one ends up in the join condition of the LEFT JOIN that includes and references add to the query. ORDER BY doesn't make sense in a join condition so ActiveRecord doesn't include it and your association falls apart. You can't make the scope instance-dependent (i.e. ->(post) { some_query_with_post_in_a_where... }) to get a WHERE clause into the join condition, then ActiveRecord will give you an ArgumentError because ActiveRecord doesn't know how to use an instance-dependent scope with includes and references.

Rails has_and_belongs_to_many query for all records

Given the following 2 models
class PropertyApplication
has_and_belongs_to_many :applicant_profiles
end
class ApplicantProfile
has_and_belongs_to_many :property_applications
end
I have a query that lists all property_applications and gets the collection of applicant_profiles for each property_application.
The query is as follows and it is very inefficient.
applications = PropertyApplication.includes(:applicant_profile).all.select |property_application| do
property_application.applicant_profile_ids.include?(#current_users_applicant_profile_id)
do
assume #current_users_applicant_profile_id is already defined.
How can I perform one (or few) queries to achieve this?
I want to achieve something like this
PropertyApplication.includes(:applicant_profile).where('property_application.applicant_profiles IN (#current_users_applicant_profile))

ORDER BY and DISTINCT ON (...) in Rails

I am trying to ORDER by created_at and then get a DISTINCT set based on a foreign key.
The other part is to somehow use this is ActiveModelSerializer. Specifically I want to be able to declare:
has_many :somethings
In the serializer. Let me explain further...
I am able to get the results I need with this custom sql:
def latest_product_levels
sql = "SELECT DISTINCT ON (product_id) client_product_levels.product_id,
client_product_levels.* FROM client_product_levels WHERE client_product_levels.client_id = #{id} ORDER BY product_id,
client_product_levels.created_at DESC";
results = ActiveRecord::Base.connection.execute(sql)
end
Is there any possible way to get this result but as a condition on a has_many relationship so that I can use it in AMS?
In pseudo code: #client.products_levels
Would do something like: #client.order(created_at: :desc).select(:product_id).distinct
That of course fails for reasons that are beyond me.
Any help would be great.
Thank you.
A good way to structure this is to split your query into two parts: the first part manages the filtering of rows so that you get only your latest client product levels. The second part uses a standard has_many association to connect Client with ClientProductLevel.
Starting with your ClientProductLevel model, you can create a scope to do the latest filtering:
class ClientProductLevel < ActiveRecord::Base
scope :latest, -> {
select("distinct on(product_id) client_product_levels.product_id,
client_product_levels.*").
order("product_id, created_at desc")
}
end
You can use this scope anywhere that you have a query that returns a list of ClientProductLevel objects, e.g., ClientProductLevel.latest or ClientProductLevel.where("created_at < ?", 1.week.ago).latest, etc.
If you haven't already done so, set up your Client class with a has_many relationship:
class Client < ActiveRecord::Base
has_many :client_product_levels
end
Then in your ActiveModelSerializer try this:
class ClientSerializer < ActiveModel::Serializer
has_many :client_product_levels
def client_product_levels
object.client_product_levels.latest
end
end
When you invoke the ClientSerializer to serialize a Client object, the serializer sees the has_many declaration, which it would ordinarily forward to your Client object, but since we've got a locally defined method by that name, it invokes that method instead. (Note that this has_many declaration is not the same as an ActiveRecord has_many, which specifies a relationship between tables: in this case, it's just saying that the serializer should present an array of serialized objects under the key `client_product_levels'.)
The ClientSerializer#client_product_levels method in turn invokes the has_many association from the client object, and then applies the latest scope to it. The most powerful thing about ActiveRecord is the way it allows you to chain together disparate components into a single query. Here, the has_many generates the `where client_id = $X' portion, and the scope generates the rest of the query. Et voila!
In terms of simplification: ActiveRecord doesn't have native support for distinct on, so you're stuck with that part of the custom sql. I don't know whether you need to include client_product_levels.product_id explicitly in your select clause, as it's already being included by the *. You might try dumping it.

Resources