I have three models with the following associations:
User has_many :owns, has_many :owned_books, :through => :owns, source: :book
Book has_many :owns
Own belongs_to :user, :counter_cache => true, belongs_to :book
I also have a page that tracks the top users by owns with the following query:
User.all.order('owns_count desc').limit(25)
I would now like to add a new page which can track top users by owns as above, but with a condition:
Book.where(:publisher => "Publisher #1")
What would be the most efficient way to do this?
I'm interesting if there is something special for this case, but my shot would be the following.
First, I don't see how polymorphic association can be applied here. You have just one object (user) that book can belong to. As I understand, polymorphic is for connecting book to several dif. objects (e.g. to User, Library, Shelf, etc.) (edit - initial text of question mentioned polymorphic associations, now it doesn't)
Second, I don't believe there is a way to cache counters here, as long as "Publisher #1" is a varying input parameter, and not a set of few pre-defined and known publishers (few constants).
Third, I would assume that amount of books by single Publisher is relatively limited. So even if you have millions of books in your table, amount of books per publisher should be hundreds maximum.
Then you can first query for all Publisher's books ids, e.g.
book_ids = Book.where(:publisher => "Publisher #1").pluck(:id)
And then query in owns table for top users ids:
Owns.select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order(total_owns: :desc).limit(25)
Disclaimer - I didn't try the statement in rails console, as I don't have your objects defined. I'm basing on group call in ActiveRecord docs
Edit. In order to make things more efficient, you can try the following:
0) Just in case, ensure you have indexes on Owns table for both foreign keys.
1) Use pluck for the second query as well not to create Own objects, although should not be a big difference because of limit(25). Something like this:
users_ids = Owns.where(book_id: book_ids).group(:user_id).order("count(*) DESC").limit(25).pluck("user_id")
See this question for reference.
2) Load all result users in one subsequent query and not N queries for each user
top_users = User.where(:id => users_ids)
3) Try joining User table in the first order:
owns_res = Owns.includes(:user).select("user_id, book_id, count(book_id) as total_owns").where(book_id: book_ids).group(:user_id).order("total_owns DESC").limit(25)
And then use owns_res.first.user
Related
Trying to avoid n+1 query
I'm working on a web based double entry accounting application that has the following basic models;
ruby
class Account < ApplicationRecord
has_many :splits
has_many :entries, through: :splits
end
class Entry < ApplicationRecord
has_many :splits, -> {order(:account_id)}, dependent: :destroy, inverse_of: :entry
attribute :amount, :integer
attribute :reconciled
end
class Split < ApplicationRecord
belongs_to :entry, inverse_of: :splits
belongs_to :account
attribute :debit, :integer
attribute :credit, :integer
attribute :transfer, :string
end
This is a fairly classic Accounting model, at least it is patterned after GnuCash, but it leads to somewhat complex queries. (From ancient history this is pretty much a 3rd normal form structure!)
First Account is a hierarchal tree structure (an Account belongs to a parent (except ROOT) and my have many children, children may also have many children, which I call a family). Most of these relations are covered in the Account model and optimized as much as you can a recursive structure.
An Account has many Entries(transactions) and entries must have at least two Splits that the sum of the Amount attribute(or Debits/Credits) must equal 0.
The primary use of this structure is to produce Ledgers, which is just a list of Entries and their associated Splits usually filtered by a date range. This is fairly simple if the account has no Family/Children
ruby
# self = a single Account
entries = self.entries.where(post_date:#bom..#eom).includes(:splits).order(:post_date,:numb)
It get more complex if you want a ledger of an account that has many children (I want a Ledger of all Current Assets)
ruby
def self.scoped_acct_range(family,range)
# family is a single account_id or array of account_ids
Entry.where(post_date:range).joins(:splits).
where(splits: {account_id:family}).
order(:post_date,:numb).distinct
end
While this works, I guess I have an n+1 query because if I use includes instead of joins I won't get all the splits for an Entry, only those in the family - I want all splits. That means it reloads(queries) the splits in the view. Also distinct is needed because a split could reference an account multiple time.
My question is there a better way to handle this three model query?
I threw together a few hacks, one going backwards from splits:
ruby
def self.scoped_split_acct_range(family,range)
# family is a single account_id or array of account_ids
# get filtered Entry ids
entry_ids = Split.where(account_id:family).
joins(:entry).
where(entries:{post_date:range}).
pluck(:entry_id).uniq
# use ids to get entries and eager loaded splits
Entry.where(id:eids).includes(:splits).order(:post_date,:numb)
end
This also works and by the ms reported in log, may even be faster. Normal use of either would be looking at 50 or so Entries for a month, but then you can filter a years worth of transactions - but you get what you asked for. For normal use, an ledger for a month is about 70ms, Even a quarter is around 100ms.
I've used a few attributes in both Splits and Accounts that got rid a few view level queries. Transfer is basically concatenated Account names going up the tree.
Again, just looking to see if I'm missing something and there is a better way.
Using a nested select is the proper option IMO.
You can optimize your code with the nested select to use the following:
entry_ids = Entry.where(post_date: range)
.joins(:splits)
.where(post_date: range, splits: { account_id: family })
.select('entries.id')
.distinct
Entry.where(id: entry_ids).includes(:splits).order(:post_date,:numb)
This will generate a single SQL statement with a nested select, instead of having 2 SQL queries: 1 to get the Entry ids and pass it to Rails and 1 other query to select entries based on those ids.
The following gem, developed by an ex-colleague, can help you deal with this kind of stuff: https://github.com/MaxLap/activerecord_where_assoc
In your case, it would enable you to do the following:
Entry.where_assoc_exists(:splits, account_id: 123)
.where(post_date: range)
.includes(:splits)
.order(:post_date, :numb)
Which does the same thing as I suggested but behind the scene.
I have what is actually a common situation in Rails where I have a model that has two associations. I want to be able to search for specific records defined by the model based upon conditions on either one or both of these associations.
The twist is that these two associations use the same table.
Here's the main model:
class Resource < ActiveRecord::Base
belongs_to :primary_item, class_name: 'Item', foreign_key: 'primary_item_Id'
belongs_to :secondary_item, class_name: 'Item', foreign_key: 'secondary_item_Id'
...
And the Item model:
class Item < ActiveRecord::Base
has_many :res_primary, class_name: 'Resource', foreign_key: 'primary_item_Id'
has_many :res_secondary, class_name: 'Resource', foreign_key: 'secondary_item_Id'
...
Let's suppose an Item has a string field called name. It's just the name of the item. I want to be able to find all of the resources that have a primary and/or secondary item that is like a given name. Here's what that looks like (I think) if I am just filtering for a primary item:
#resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
The primary_search_string is just the string I want to match in the name. This works fine. A similar search works for the secondary items.
Now suppose I'd like the user to be able to search for resources that have either a given primary item name, a given secondary item by name, or both (each with its own name). I would have a primary_search_string and a secondary_search_string with independent values. One of them could be nil which would mean I don't want to narrow the search based upon that string. What I want is to filter the resources by either or both strings depending upon whether they are nil. Here is what I ended up with which works but seems awkward:
#resources = <some query that obtains an active record relation of resources>
if primary_search_string then
#resources = #resources.joins(:primary_item)
.where("#{Item.table_name}.name like #{primary_search_string}")
if secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("secondary_items_#{Item.table_name}.name like #{secondary_search_string}")
end
elsif secondary_search_string then
#resources = #resources.joins(:secondary_item)
.where("#{Item.table_name}.name like #{secondary_search_string}")
end
Note how that if I am only joining one table, the table's name is known by Item.table_name. However, if I have to join both tables, then Rails must distinguish the second instance of the table by specifying the name further with the association name: secondary_items_#{Item.table_name}.
My question is whether there's a somewhat simpler way of handling this that doesn't involve having to reference the table names of the associations in the where clauses? Since I am referencing the table names of the associations, and the table name may be different depending upon whether I'm joining one of them or both of them, I end up checking the search string for nil, which is a way to determine whether I'm joining the Item table more than once. In this particular program example, I can check for it and live with it being awkward. But what if I didn't know whether #resources was previously joined to the primary items and I wanted to filter based upon the secondary items? I wouldn't know, in that case, what the associations table name would be to use in the where clause since I wouldn't know if it were already joined or not.
I suspect there may be a better way here, which is the essence of my question.
If you have two separate associations, but they are both to the same type of child object (eg. Resource), then instead of joins/includes, you could just focus on finding the Resource records that match either the primary_item_id or secondary_item_id of the parent Item record you want.
Rails 3/4 doesn't natively support OR queries, but one way to brute-force it is by finding IDs of Resources that belong to the Item as a primary or secondary association. (This is obviously inefficient because you are doing multiple database queries.)
ids = Resource.where(primary_item_id: #item.id).map(&:id)
ids << Resource.where(secondard_item_id: #item.id).map(&:id)
#special_resources = Resource.where(id: ids, name: 'Some Special Name')
Rails 5 supports 'or' queries, so it's much simpler:
resources = Resource.where(primary_item_id: #item.id).or(secondary_item_id: #item.id)
#special_resources = resources.where(name: 'Some Special Name')
I've got two basic models with a join table. I've added a scope to compute a count through the relation and expose it as an attribute/psuedo-column. Everything works fine, but I'd now like to query a subset of columns and include the count column, but I don't know how to reference it.
tldr; How can I include an aggregate such as a count in my Arel query while also selecting a subset of columns?
Models are Employer and Employee, joined through Job. Here's the relevant code from Employer:
class Employer < ApplicationRecord
belongs_to :user
has_many :jobs
has_many :employees, through: :jobs
scope :include_counts, -> do
left_outer_joins(:employees).
group("employers.id").
select("employers.*, count(employees.*) as employees_count")
end
end
This allows me to load an employer with counts:
employers = Employer.include_counts.where(id: 1)
And then reference the count:
count = employers[0].employees_count
I'm loading the record in my controller, which then renders it. I don't want to render more fields than I need to, though. Prior to adding the count, I could do this:
employers = Employer.where(id: 1).select(:id, :name)
When I add my include_counts scope, it basically ignores the select(). It doesn't fail, but it ends up including ALL the columns, because of this line in my scope:
select("employers.*, count(employees.*) as employees_count")
If I remove employers.* from the scope, then I don't get ANY columns in my result, with or without a select() clause.
I tried this:
employers = Employer.include_counts.where(id: 1).select(:id, :name, :employee_counts)
...but that produces the following SQL:
SELECT employers.*, count(employees.*) as employees_count, id, name, employees_count FROM
...and an SQL error because column employees_count doesn't exist and id and name are ambiguous.
The only thing that sort of works is this:
employers = Employer.include_counts.where(id: 1).select("employers.id, employers.name, count(employees.*) as employees_count")
...but that actually selects ALL the columns in employers, due to the scope clause again.
I also don't want that raw SQL leaking into my controller if I can avoid it. Is there a more idiomatic way to do this with Rails/Arel?
If I can't find another way to do the query, I'll probably create another scope or custom finder in the model, so that the controller code is cleaner. I'm open to suggestions for doing that as well, but I'd like to know if there's a simple way to reference computed aggregate columns like this as though they were any other column.
I've got an application in which businesses can file taxes and request tax extensions. (An extension is a request saying "I need more time to file.")
I have the following relationships:
Business has_many :tax_filings (one per year)
TaxFiling belongs_to :business
Business has_many :tax_extensions (one per year)
TaxExtension belongs_to :business
When I show a list of tax filings, I want each filing to show whether there is a corresponding extension for it. But I'm not sure how to do that without an N+1 query.
Right now I have this method on TaxFiling:
def extension
TaxExtension.where(:business_id => business_id, :year => year).first
end
So every time I call TaxFiling#extension, it does another database query.
I added a scope for TaxFiling that joins extensions on business_id and year, but I'm not sure how to get TaxFiling#extension to use that without having a declared relationship between the two models.
How can I do this?
I think what you want is the .includes method, in order to do eager loading, when you initially load the TaxFiling models. If you do something like this:
TaxFiling.includes(:business => [:tax_extensions])
Rails will load the associated businesses and extensions into memory, using three queries (one per model), instead of N queries.
I have a slightly complicated query that I'd like to have as a natural ActiveRecord relationship. Currently I call #calendar.events and #calendar.shared_events then join the arrays and sort them. The consolidation is able to be done with this sql:
SELECT calendar_events.* FROM calendar_events left outer join calendar_events_calendars on calendar_events_calendars.calendar_event_id = calendar_events.id where calendar_events.calendar_id = 2 or calendar_events_calendars.calendar_id = 2
but I'm not sure how to represent this as an ActiveRecord relationship. I know I could use a custom sql finder but I'd like to be able to use scopes on the results.
Currently an event belongs to one calendar and also belongs to many other calendars via a habtm join table:
has_and_belongs_to_many :shared_events, :class_name => 'CalendarEvent', :order => 'beginning DESC, name'
has_many :events, :class_name => 'CalendarEvent', :dependent => :destroy, :order => 'beginning DESC, name'
Any pointers would be greatly appreciated :)
Can you help us understand your data structure a bit more? What are you trying to achieve with the two relationships (the has_many/belongs_to and the HABTM) that you can't achieve through the HABTM or a has_many, :through? It looks to me like a simplification of your data model is likely to yield the results you're after.
(sorry - not enough points or would have added as a comment)
--UPDATED AFTER COMMENT
I think that what you've suggested in your comment is an infinitely better solution. It is possible to do it how you've started implementing it, but it's unnecessarily complex - and in my experience, you can often tell when you're going down the wrong path with ActiveRecord when you start having crazy complex and duplicate relationship names.
Why not
a) have it all in a has_many, through relationship (in both directions)
b) have an additional field on the join table to specify that this is the main/primary attachment (or vice versa).
You can then have named scopes on the Event model for shared events, etc. which do the magic by including the condition on the specified join. This gives you:
#calendar.events #returns all events across the join
#calendar.shared_events #where the 'shared' flag on the join is set
#calendar.main_events #without the flag