Left outer join using the model of the joined table in rails - ruby-on-rails

I need to do a left outer join in rails, but I need the model objects to be for the joined table.
What I want is a list of the days, with the metrics for each day. I need to have all days regardless of whether or not there were metrics, but I don't want to make a bunch of round trips to the database.
This works, but causes problems because it thinks I have PeriodDay objects when I really want Metric objects:
PeriodDay.select("metrics.*").join('LEFT OUTER JOIN metrics ON period_days.date = metrics.date').where('period_id = ?', current_period)
I can use find_by_sql on the Metric object, but the query building is more complicated (and conditional) than this simplified example, so I would rather figure out the "rails way" for this problem.

My current workaround is to loop through the records and create Metric objects from the attributes of the PeriodDay object. It doesn't feel efficient, but it is better than making multiple database calls.
metrics = []
recs = PeriodDay.select("metrics.*").join('LEFT OUTER JOIN metrics ON period_days.date = metrics.date').where('period_id = ?', current_period)
for rec in recs
metrics << Metric.new(rec.attributes)
end

Assuming that Period has many PeriodDay has many Metric, and that period_id is an attribute of your PeriodDay model, your workaround should be identical to something like this:
Metric.includes(:period_day).where(:period_day => {:period_id => #current_period})
This doesn't get you a list of days with their respective Metric objects as you mentioned in the original question, but it gets you a list of all Metric objects for a particular period. (unless I'm missing something...)
If you want a list of PeriodDay objects with their included Metric objects, you can use includes instead of joins.
PeriodDay.includes(:metrics).where(:period_id => #current_period)
This will execute two queries (one to get period days and the other to get metrics) but it is a lot more readable.

Related

Rails (Activerecord) - Can't make query with joins and global sum without duplicates

I'm using a query with multiple user-set filters in order to show a list of invoices in a Rails app. One of the filters adds a where condition on a column of a separate table, which needs a double join in order to be accessible (estimates -through projects-).
scope :by_seller, lambda {|user_id|
joins(project: :estimates)
.where(estimates: {:user_id => user_id}) unless user_id.blank?
}
Additionally, I use Rails' aggregate method "sum" in order to find out the total amount of the invoices, #invoices.sum(:total_cache), where total_cache is a cached column in the database specifically designed to perform this kind of sum in a performant way.
#invoices.sum(:total_cache)
My problem is, given the fact that I need a double join in order to access Estimates through Projects, and that each Invoice belongs to a Project, BUT a project can have many Estimates, the join operation results in duplicate records, so my Invoices table shows some of the invoices many times (as many as the number of estimates its project has). This results in an invoices table with duplicate records, and in an incorrect sum value, as it sums some of the invoice totals N times.
The filtering behaviour is just fine, as my intention is to filter by the user who made ANY of the estimates in the invoice project. However, the issue is that when I try to avoid the duplicates by adding a group('invoices.id') -the way I always solved such situations-, the final sum operation won't return the total sum of the invoices' total, but a grouped sum of each one of them (totally useless).
The only workaround I've found is to include the group clause and perform the sum in pure ruby code, treating the collection as an array, which IMHO is terribly inefficient, as there are tons of invoices:
#invoices.map(&:total_cache).inject(0, &:+)
Is there a way I can obtain a unique ActiveRecord collection of Invoices without duplicates in a way I can then call the aggregate sum method and obtain a total calculated by Postgres?
Of course, if there is something wrong in my base idea I'm completely open to hearing it! It's quite a complex query (I simplified it for the sake of the question here) and there can be many approaches I'm sure!
Thank you everyone!
I'm not sure how much "slower" or "faster" this is than doing the sum in ruby code. But if you want to still retain an ActiveRecord::Relation object, then you can do something like below. I reproduced your setup environment in a local Rails project.
user = User.first
Invoice.where(
id: Invoice.by_seller(user.id).select(:id)
).sum(:total_cache)
# (1.2 ms) SELECT SUM("invoices"."total_cache") FROM "invoices" WHERE "invoices"."id" IN (SELECT "invoices"."id" FROM "invoices" INNER JOIN "projects" ON "projects"."id" = "invoices"."project_id" INNER JOIN "estimates" ON "estimates"."project_id" = "projects"."id" WHERE "estimates"."user_id" = $1) [["user_id", 1]]
# => 5

Optimizing has many record association query

I have this query that I've built using Enumerable#select. The purpose is to find records thave have no has many associated records or if it does have those records select only those with it's preview attribute set to true. The code below works perfectly for that use case. However, this query does not scale well. When I test against thousands of records it takes several hundred seconds to complete. How can this query be improved upon?
# User has many enrollments
# Enrollment belongs to user.
users_with_no_courses = User.includes(:enrollments).select {|user| user.enrollments.empty? || user.enrollments.where(preview: false).empty?}
So first, make sure enrollments.user_id has an index.
Second, you can speed this up by not loading all the enrollments, and doing your filtering in SQL:
User.where(<<-EOQ)
NOT EXISTS (SELECT 1
FROM enrollments e
WHERE e.user_id = users.id
AND NOT e.preview)
EOQ
By the way here I'm simplifying your two conditions into one: "no enrollments or no real enrollments" is the same as "no real enrollments".
If you want you can put this condition into a scope so it is more reusable.
Third, this is still going to be slow if you're instantiating thousands of User objects. So I would look into paginating if that makes sense, or find_each if this is an offline script. Or use raw SQL to avoid all the object instances.
Oh by the way: even though you are saying includes(:enrollments), this will still go back to the database, giving you an n+1 problem:
user.enrollments.where(preview: false)
That is because the where means ActiveRecord can't use the already-loaded association. You can avoid that by using select instead of where. But not loading the enrollments in the first place is even better.

HATBM association, how can I get pairs of associated models?

Given following models and association:
(source: rubyonrails.org)
How can I get an array of pairs (physician_name, patient_name) that are appointed for certain day (appointment_date)? You can assume that one patient will never go to the same physician twice. Never.
I already tried things like:
#appointments = Appointment.where(appointment_date: params[:date])
but I have no idea what to do further. Should I iterate through this array and get every pair like this below?
#appointments.each do |appointment|
#physician = Physicians.where(id: :appointment.physician_id)
#patient = Patients.where(id: :appointment_patient_id)
I believe there's much easier way.
I'm using Rails 4.2.5.1.
I think what you want is approximately this:
Appointment.includes([:physician, :patient]).where(:date => appointment_date).map{|a| [a.physician.name, a.patient.name]}
Since only the Physician and Patient models have the names, they'll need to be loaded in the query (ok, you could avoid it by doing some fancy SQL trickery, but this is database-agnostic, which is convenient). Hence includes, which eager-loads associated models.
Then use .where to return only the appointments on the day you want (may be more complex if you're actually setting times in those DateTime values).
And finally, iterate over the list and return an Array of Arrays (Ruby not having Tuples) containing the names.

Rails - get objects of objects WITH duplicates

I received some really good help in solving an issue in which I needed to get objects from objects in one query. That worked like a charm.
This is a follow up to that question. I chose to create a new question to this since the last one was answered according to my previous specification. Please do refer to that previous question for details.
Basically, I wanted to get all the objects of multiple objects in one single query. E.g. if a Product has several Categories which in turn has several Products, I want to get all the Products in that relation, easier (and erronously) put:
all_products = #my_product.categories.products
This was solved with this query. It is this query I would (preferably) like to alter:
Product.includes(:categories).where(categories: { id: #my_product.categories.pluck(:id) } )
Now, I realized something I missed using this solution was that I only get a list of unique Products (which one would expect as well). I would however like to get a list with possible duplicates as well.
Basically, if a "Blue, Electric Car" is included in categories ("Blue", "Electric" and "Car") I would like to get three instances of that object returned, instead of one unique.
I guess this does not make Rails-sense but is there a way to alter the query above so that it does not serve me a list of unique objects in the returned list but rather the "complete" list?
The includes method of AREL will choose between two strategies to make the query, one of which simply does two distinct query and the other one does an INNER JOIN.
In both cases the products will be distinct.
You have to do manually a right outer join:
Product.joins('RIGHT JOIN categories ON categories.product_id = products.id').where(categories: { id: #my_product.categories.pluck(:id) } )
adds also .preload(:categories) if you want to keep the eager loading of the categories.
Since you want duplicates, just change includes to joins, (I tested this just now). joins will essentially combine (inner-join) the two tables giving you a list of records that are all unique (per Product and Category). includes does eager loading which just loads the associated tables already but does an outer-join, and therefore, the retrieved records are also unique (but only per Product).
Product.joins(:categories).where(categories: { id: #my_product.categories.pluck(:id) } )

Best SQL indexes for join table

With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).
Model and Table Setup
My models are Foo and Bar and per rails convention, I have a join table called bars_foos. There is no primary key or timestamps making the old fields in this table bar_id:integer and foo_id:integer. I'm interested in knowing which of the following indexes is best and is without duplication:
A compound index: add_index :bars_foos, [:bar_id, :foo_id]
Two indexes
A. add_index :bars_foos, :bar_id
B. add_index :bars_foos, :foo_id
A combination of both 1 and 2-B
Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.
Likely Usage
The most common usage will be given an instance of model Foo, I will be asking for its associated bars using the RoR syntax of foo.bars and vice versa with bar.foos for an instance of the model Bar.
These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ? and SELECT * FROM bars_foos WHERE bar_id = ? respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?) and SELECT * FROM foos WHERE ID in (?).
Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?.
Databases
In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.
The Answer
The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.
tl;dr Explanation
The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.
The Full Explanation
The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.
Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.
The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.
Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"
Footnotes
* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.
Depends on how you are going to query the data.
Assuming you want to search for all of these...
WHERE bar_id = ?
WHERE foo_id = ?
WHERE bar_id = ? AND foo_id = ?
...then you should probably go with an index on {bar_id, foo_id} and an index on {foo_id}.
While you could also create a third index on {bar_id}, the price of maintaining additional index would probably outweigh the benefit of better clustering in the smaller index.
Also, how do you plan to cover your queries with indexes? Some of the alternatives, such as...
{foo_id, bar_id} and {bar_id}
{foo_id, bar_id} and {bar_id, foo_id}
...might cover certain kinds of queries better.
Covering is a balancing act - sometimes adding a field to an index just for covering purposes is justified, sometimes it's not. You won't know until you measure on realistic amounts of data.
(Disclaimer: I'm not familiar with Ruby. This answer is purely from the database perspective.)

Resources