Optimizing queries in a has_many_through association with three models - ruby-on-rails

Trying to avoid n+1 query
I'm working on a web based double entry accounting application that has the following basic models;
ruby
class Account < ApplicationRecord
has_many :splits
has_many :entries, through: :splits
end
class Entry < ApplicationRecord
has_many :splits, -> {order(:account_id)}, dependent: :destroy, inverse_of: :entry
attribute :amount, :integer
attribute :reconciled
end
class Split < ApplicationRecord
belongs_to :entry, inverse_of: :splits
belongs_to :account
attribute :debit, :integer
attribute :credit, :integer
attribute :transfer, :string
end
This is a fairly classic Accounting model, at least it is patterned after GnuCash, but it leads to somewhat complex queries. (From ancient history this is pretty much a 3rd normal form structure!)
First Account is a hierarchal tree structure (an Account belongs to a parent (except ROOT) and my have many children, children may also have many children, which I call a family). Most of these relations are covered in the Account model and optimized as much as you can a recursive structure.
An Account has many Entries(transactions) and entries must have at least two Splits that the sum of the Amount attribute(or Debits/Credits) must equal 0.
The primary use of this structure is to produce Ledgers, which is just a list of Entries and their associated Splits usually filtered by a date range. This is fairly simple if the account has no Family/Children
ruby
# self = a single Account
entries = self.entries.where(post_date:#bom..#eom).includes(:splits).order(:post_date,:numb)
It get more complex if you want a ledger of an account that has many children (I want a Ledger of all Current Assets)
ruby
def self.scoped_acct_range(family,range)
# family is a single account_id or array of account_ids
Entry.where(post_date:range).joins(:splits).
where(splits: {account_id:family}).
order(:post_date,:numb).distinct
end
While this works, I guess I have an n+1 query because if I use includes instead of joins I won't get all the splits for an Entry, only those in the family - I want all splits. That means it reloads(queries) the splits in the view. Also distinct is needed because a split could reference an account multiple time.
My question is there a better way to handle this three model query?
I threw together a few hacks, one going backwards from splits:
ruby
def self.scoped_split_acct_range(family,range)
# family is a single account_id or array of account_ids
# get filtered Entry ids
entry_ids = Split.where(account_id:family).
joins(:entry).
where(entries:{post_date:range}).
pluck(:entry_id).uniq
# use ids to get entries and eager loaded splits
Entry.where(id:eids).includes(:splits).order(:post_date,:numb)
end
This also works and by the ms reported in log, may even be faster. Normal use of either would be looking at 50 or so Entries for a month, but then you can filter a years worth of transactions - but you get what you asked for. For normal use, an ledger for a month is about 70ms, Even a quarter is around 100ms.
I've used a few attributes in both Splits and Accounts that got rid a few view level queries. Transfer is basically concatenated Account names going up the tree.
Again, just looking to see if I'm missing something and there is a better way.

Using a nested select is the proper option IMO.
You can optimize your code with the nested select to use the following:
entry_ids = Entry.where(post_date: range)
.joins(:splits)
.where(post_date: range, splits: { account_id: family })
.select('entries.id')
.distinct
Entry.where(id: entry_ids).includes(:splits).order(:post_date,:numb)
This will generate a single SQL statement with a nested select, instead of having 2 SQL queries: 1 to get the Entry ids and pass it to Rails and 1 other query to select entries based on those ids.
The following gem, developed by an ex-colleague, can help you deal with this kind of stuff: https://github.com/MaxLap/activerecord_where_assoc
In your case, it would enable you to do the following:
Entry.where_assoc_exists(:splits, account_id: 123)
.where(post_date: range)
.includes(:splits)
.order(:post_date, :numb)
Which does the same thing as I suggested but behind the scene.

Related

Selecting records based upon conditions on multiple associations that use the same table

I have what is actually a common situation in Rails where I have a model that has two associations. I want to be able to search for specific records defined by the model based upon conditions on either one or both of these associations.
The twist is that these two associations use the same table.
Here's the main model:
class Resource < ActiveRecord::Base
belongs_to :primary_item, class_name: 'Item', foreign_key: 'primary_item_Id'
belongs_to :secondary_item, class_name: 'Item', foreign_key: 'secondary_item_Id'
...
And the Item model:
class Item < ActiveRecord::Base
has_many :res_primary, class_name: 'Resource', foreign_key: 'primary_item_Id'
has_many :res_secondary, class_name: 'Resource', foreign_key: 'secondary_item_Id'
...
Let's suppose an Item has a string field called name. It's just the name of the item. I want to be able to find all of the resources that have a primary and/or secondary item that is like a given name. Here's what that looks like (I think) if I am just filtering for a primary item:
#resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
The primary_search_string is just the string I want to match in the name. This works fine. A similar search works for the secondary items.
Now suppose I'd like the user to be able to search for resources that have either a given primary item name, a given secondary item by name, or both (each with its own name). I would have a primary_search_string and a secondary_search_string with independent values. One of them could be nil which would mean I don't want to narrow the search based upon that string. What I want is to filter the resources by either or both strings depending upon whether they are nil. Here is what I ended up with which works but seems awkward:
#resources = <some query that obtains an active record relation of resources>
if primary_search_string then
#resources = #resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
if secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("secondary_items_#{Item.table_name}.name like #{secondary_search_string}")
end
elsif secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("#{Item.table_name}.name like #{secondary_search_string}")
end
Note how that if I am only joining one table, the table's name is known by Item.table_name. However, if I have to join both tables, then Rails must distinguish the second instance of the table by specifying the name further with the association name: secondary_items_#{Item.table_name}.
My question is whether there's a somewhat simpler way of handling this that doesn't involve having to reference the table names of the associations in the where clauses? Since I am referencing the table names of the associations, and the table name may be different depending upon whether I'm joining one of them or both of them, I end up checking the search string for nil, which is a way to determine whether I'm joining the Item table more than once. In this particular program example, I can check for it and live with it being awkward. But what if I didn't know whether #resources was previously joined to the primary items and I wanted to filter based upon the secondary items? I wouldn't know, in that case, what the associations table name would be to use in the where clause since I wouldn't know if it were already joined or not.
I suspect there may be a better way here, which is the essence of my question.
If you have two separate associations, but they are both to the same type of child object (eg. Resource), then instead of joins/includes, you could just focus on finding the Resource records that match either the primary_item_id or secondary_item_id of the parent Item record you want.
Rails 3/4 doesn't natively support OR queries, but one way to brute-force it is by finding IDs of Resources that belong to the Item as a primary or secondary association. (This is obviously inefficient because you are doing multiple database queries.)
ids = Resource.where(primary_item_id: #item.id).map(&:id)
ids << Resource.where(secondard_item_id: #item.id).map(&:id)
#special_resources = Resource.where(id: ids, name: 'Some Special Name')
Rails 5 supports 'or' queries, so it's much simpler:
resources = Resource.where(primary_item_id: #item.id).or(secondary_item_id: #item.id)
#special_resources = resources.where(name: 'Some Special Name')

Modelling a collection of records that act as as a group

I'm looking for orientation for either a concrete or abstract approach in Ruby-Like (Rails 4/5) to model the following requirement or user story:
Given a model, let's call it PurchaseOrder with the following attributes:
amount_to_produce
amount_taken_from_stock
placement_date
delivery_date
product_id
client_id
As a user, i want to be able to see a table list of these PurchaseOrder and, when necessary, group them.
Detail Info: When a collection of PurchaseOrder is grouped, that grouped collection should behave exactly like a PurchaseOrder, in the sense that it must be displayed as a record in the table, filtering operations should work on the grouped record as they do on single PurchaseOrder instances, same goes for pagination and sorting. Moreover, the group must cache or at least i'm thinking it that way, the sum of amount_to_produce, amount_taken_from_stock, the minimum placement_date among all placement dates and last but not least, the minimum delivery_date also among them all.
Im thinking in modelling this implicitly in the PurchaseOrder like this:
Class PurchaseOrder < ApplicationRecord
belongs_to :group, class_name: PurchaseOrder.model_name.to_s, inverse_of: :purchase_orders
# purchase order can represent a "group" of purchase orders
has_many :purchase_orders, inverse_of: :group, foreign_key: :group_id
end
This way it would achieve the purpose of been displayed in the table view easily, filtering pagination and sorting would work out of the box and just by scoping records with group_id nil, the grouped records can be left out of the table.
However i'm foreseeing immediate drawbacks:
When updating a group member attribute, say amount_to_produce, the parent cached amount_to_produce should be updated also, same for the other three attributes. This would probably led to model callbacks before_update, which i tend not to use unless it concerns behaviour of the single instance itself.
When ungrouping a member, same history
Same when destroying a member of the group (it can and will happen).
For 1. we could imply that there's no need to cache the amounts or date attributes in the parent PurchaseOrder, since we can override the getter for those attributes and return the sum / min of the children if purchase_orders.size.nonzero?, however, this smells like something wrong.
So summing it up, i would like if not the best, an optimistic approach to model this scenario and regarding the method to group and ungroup members to / from a group, ideas on what's the best domain place to implement it, i'm thinking of a concern like Groupable.
Pd: For each group, the client_id of the group will be a default seeded client called "Multiple Customers", and the product_id, the same as the product_id of the children, since it's a restriction that only PurchaseOrder with same product_id can be grouped, no groups with different product_id's can be grouped.
Thanks.
I would split this into two models, a PurchaseOrderGroup, and a PurchaseOrder.
class PurchaseOrderGroup < ApplicationRecord
has_many :purchase_orders
belongs_to :product
def aggregate_pos
PurchaseOrder.where(purchase_order_group_id: self.id).
group(:purchase_order_group_id).
pluck('sum(amount_to_produce), min(delivery_date), ...')
end
end
class PurchaseOrder < ApplicationRecord
belongs_to :purchase_order_group
end
I would create a PurchaseOrderGroup for each PurchaseOrder even if there is only one, which maintains the same interface. You can then define delegate methods on the PurchaseOrderGroup which grab the appropriate sum, min, max etc of the children - aggregate queries should make short work of that. See above aggregate_pos() method. Easy enough to cache the results of this in the PurchaseOrderGroup class. Deleting or adding PurchaseOrder objects is easy then, just call aggregate_pos() again.
This also cleans up the product_id dilemma, just put that attribute on the group rather than the PurchaseOrder. That way it is impossible for two PurchaseOrders in the same group to have different product_ids.

Issue with polymorphic ActiveRecord query

I have three models with the following associations:
User has_many :owns, has_many :owned_books, :through => :owns, source: :book
Book has_many :owns
Own belongs_to :user, :counter_cache => true, belongs_to :book
I also have a page that tracks the top users by owns with the following query:
User.all.order('owns_count desc').limit(25)
I would now like to add a new page which can track top users by owns as above, but with a condition:
Book.where(:publisher => "Publisher #1")
What would be the most efficient way to do this?
I'm interesting if there is something special for this case, but my shot would be the following.
First, I don't see how polymorphic association can be applied here. You have just one object (user) that book can belong to. As I understand, polymorphic is for connecting book to several dif. objects (e.g. to User, Library, Shelf, etc.) (edit - initial text of question mentioned polymorphic associations, now it doesn't)
Second, I don't believe there is a way to cache counters here, as long as "Publisher #1" is a varying input parameter, and not a set of few pre-defined and known publishers (few constants).
Third, I would assume that amount of books by single Publisher is relatively limited. So even if you have millions of books in your table, amount of books per publisher should be hundreds maximum.
Then you can first query for all Publisher's books ids, e.g.
book_ids = Book.where(:publisher => "Publisher #1").pluck(:id)
And then query in owns table for top users ids:
Owns.select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order(total_owns: :desc).limit(25)
Disclaimer - I didn't try the statement in rails console, as I don't have your objects defined. I'm basing on group call in ActiveRecord docs
Edit. In order to make things more efficient, you can try the following:
0) Just in case, ensure you have indexes on Owns table for both foreign keys.
1) Use pluck for the second query as well not to create Own objects, although should not be a big difference because of limit(25). Something like this:
users_ids = Owns.where(book_id: book_ids).group(:user_id).order("count(*) DESC").limit(25).pluck("user_id")
See this question for reference.
2) Load all result users in one subsequent query and not N queries for each user
top_users = User.where(:id => users_ids)
3) Try joining User table in the first order:
owns_res = Owns.includes(:user).select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order("total_owns DESC").limit(25)
And then use owns_res.first.user

Get all records where the number of associated records is smaller than a certain number

I have a model Occurrence that has many Cleaners through the joint table Assignments. The Occurrence model has a field number_of_cleaners.
How can I find all Occurrences using Active Record (or SQL, Postgres) where the number of assigned Cleaners is smaller than the number specified in occurrences.number_of_cleaners?
This query is to identify the Occurrences where we need to find more Cleaners to assign to the Occurrence).
class Occurrence < ActiveRecord::Base
belongs_to :job
has_many :assignments
has_many :cleaners, through: :assignments
end
class Assignment < ActiveRecord::Base
belongs_to :cleaner
belongs_to :occurrence
end
Just as a side note, previously we just queried for each Occurrence that had no Assignment regardless of occurrences.number_of_cleaners. The query looked like this:
# Select future occurrences which do not have an assignment (and therefore no cleaners) and select one per job ordering by booking time
# The subquery fetches the IDs of all these occurrences
# Then, it runs another query where it gets all the IDs from the subquery and orders the occurrences by booking time
# See http://stackoverflow.com/a/8708460/1076279 for more information on how to perform subqueryes
subquery = Occurrence.future.where.not(id: Assignment.select(:occurrence_id).uniq).select('distinct on (job_id) id').order('occurrences.job_id', 'booking_time')
#occurrences = Occurrence.includes(job: :customer).where("occurrences.id IN (#{subquery.to_sql})").where.not(bundle_first_id: nil).select{ |occurrence| #current_cleaner.unassigned_during?(occurrence.booking_time, occurrence.end_time) }
Instead of joining the tables and doing query, You should implement counter_cache on your models as its more efficient and meant exactly for the purpose.
For more details, check out these links:
Counter Cache in Rails
Three Easy Steps to Using Counter Caches in Rails

Ruby on Rails - Counting goals of a team in many matches

I've got a Match model and a Team model.
I want to count how many goals a Team scores during the league (so I have to sum all the scores of that team, in both home_matches and away_matches).
How can I do that? What columns should I put into the matches and teams database tables?
I'd assume your Match model looks something like this:
belongs_to :home_team, class_name:"Team"
belongs_to :away_team, class_name:"Team"
attr_accessible :home_goal_count, :away_goal_count
If so, you could add a method to extract the number of goals:
def goal_count
home_matches.sum(:home_goal_count) + away_matches.sum(:away_goal_count)
end
Since this could be expensive (especially if you do it often), you might just cache this value into the team model and use an after_save hook on the Match model (and, if matches ever get deleted, then an after_destroy hook as well):
after_save :update_team_goals
def update_team_goals
home_team.update_attribute(:goal_count_cache, home_team.goal_count)
away_team.update_attribute(:goal_count_cache, away_team.goal_count)
end
Since you want to do this for leagues, you probably want to add a belongs_to :league on the Match model, a league parameter to the goal_count method (and its query), and a goal_count_cache_league column if you want to cache the value (only cache the most recently changed with my suggested implementation, but tweak as needed).
You dont put that in any table. Theres a rule for databases: Dont ever store data in your database that could be calculated from other fields.
You can calcuate that easyly using this function:
def total_goals
self.home_matches.collect(&:home_goals).inject(&:+)+self.away_matches.collect(&:away_goals).inject(&:+)
end
that should do it for you. If you want the mathes filtered for a league you can use a scope for that.

Resources