Rails: AND operator in a has_many association - ruby-on-rails

My relationship is a Client can have many ClientJobs. I want to be able to find clients that perform both Job a and Job b. I'm using 3 select boxes so I can pick a maximum of three jobs to select from. The select boxes are populated from the database.
I know how to test for 1 job with the query below. But I need a way to use an AND operator to test that both jobs exist for that client.
#clients = Client.includes("client_jobs").where(
client_jobs: { job_name: params[:job1]})
Unfortunately it's easy to do an IN operation like below, but I'm thinking the syntax for AND should be similar....I hope
#lients = Client.includes("client_jobs").where(
client_jobs: { job_name: [params[:job1], params[:job2]]})
EDIT: Posting the sql statement that hits the database from the answer below
Core Load (0.6ms) SELECT `clients`.* FROM `clients`
CoreStatistic Load (1.9ms) SELECT `client_jobs`.* FROM `client_jobs`
WHERE `client_jobs `.`client_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,........)
The second query runs through every client_job in the database. It's never tested against the params[:job1], params[:job2] etc. So #clients returns nil crashing my view template
(undefined method `map' for nil:NilClass

In my opinion, a better approach then self-joins is to simply join ClientJobs and then use GROUP BY and HAVING clauses to filter out only those records that exactly match the given associated records.
performed_jobs = %w(job job2 job3)
Client.joins(:client_jobs).
where(client_jobs: { job_name: performed_jobs }).
group("clients.id").
having("count(*) = #{performed_jobs.count}")
Let's walk through this query:
first two clauses join the ClientJobs to Clients and filter out only those, that have any of the three jobs defined (it uses the IN clause)
next, we group these joined records by Client.id so that we get the clients back
finally, the having clause ensures we only return those clients that had exactly 3 ClientJob records joined in, i.e. only those that had all the three client jobs defined.
It is the trick with HAVING(COUNT(*) = ...) that turns the IN clause (which is essentially an OR-ed list of options) into a "must have all these" clause.

To do this in a single SQL query try the following:
jobs_with_same_user = ClientJob.select(:user_id).where(job_name: "<job_name1>", user_id: ClientJob.select(:user_id).where(job_name: "<job_name2>"))
#clients = Client.where(id: jobs_with_same_user)
Here's what this query is doing:
Select the user_ids of all Client jobs with [job_name2]
Select the user_ids of all Client jobs with user_id IN result set from (1) AND having [job_name1]
Select all users with using (2) as a subquery.
Not many know this but Rails 4+ supports subqueries. Basically this is a self join acting as subquery for the clients:
SELECT *
FROM clients
WHERE id IN <jobs_with_same_user>
Also, I'm not sure if you're referencing the client_jobs association in your view, but if you are, add the includes statement to avoid an N+1 query:
#clients = Client.includes(:client_jobs).where(id: jobs_with_same_user)
EDIT
If you prefer, the same result can be achieved with a self-referencing inner join:
jobs_with_same_user = ClientJob
.select("client_jobs.user_id AS user_id")
.joins("JOIN client_jobs inner_client_jobs ON inner_client_jobs.user_id=client_jobs.user_id")
.where(client_jobs: { job_name: "<first_job_name1>" }, inner_client_jobs: { job_name: "<job_name2>" })
#clients = Client.where(id: jobs_with_same_user)

Related

Rails: How to Eager Load with Left Join Table?

Currently I have a controller query which fetches products & product updates as follows:
products = Product.left_outer_joins(:productupdates).select("products.*, count(productupdates.*) as update_count, max(productupdates.old_price) as highest_price").group(:id)
products = products.paginate(:page => params[:page], :per_page => 20)
This query creates N+1 query but I can not put .include(:productsupdates) since I have a left out join as well.
If possible, can you please help me how to reduce N+1 queries?
EDIT------------------------------
As per Vishal's suggestion; I have changed the controller query as follows,
products = product.includes(:productupdates).select("products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price").group("productupdates.product_id")
products = products.paginate(:page => params[:page], :per_page => 20)
Unfortunately, I receive the following error:
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: missing FROM-clause entry for table "productupdates"
LINE 1: SELECT products.*, count(productupdates.*) as productupdate_count, m...
^
: SELECT products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price FROM "products" WHERE "products"."isopen" = $1 AND (products.year > 2009) AND ("products"."make" IS NOT NULL) GROUP BY productupdates.product_id LIMIT $2 OFFSET $3):
Please advise how this is causing N+1 and how you think this will solve the issue. The only way I can see an N+1 situation here is if you are then calling productupdates on each product later. If this is the case then this will not solve the issue. Please advise so others can formulate appropriate responses
For the time being I am going to assume that somewhere later in the code you are calling productupdates on the individual products. If this is the case then we can solve this without the aggregation as follows
#products = Product.eager_load(:productupdates)
Now when we loop the productupdates are already loaded so to get the count and the max we can do things like
#products.each do |p|
# COUNT
# (don't use the count method or it will execute a query )
p.productupdates.size
# MAX old_price
# older ruby versions use rails `try` instead
# e.g. p.productupdates.max_by(&:old_price).try(:old_price) || 0
p.productupdates.max_by(&:old_price)&.old_price || 0
end
Using these methods will not execute additional queries since the productupdates are already loaded
Side note: The reason includes did not work for you is that includes will use 2 queries to retrieve the data (sudo outer join) unless one of the following conditions is met:
The where clause uses a hash finder condition that references the association table (e.g. where(productupdates: {old_price: 12}))
You include the references method (e.g. Product.includes(:productupdates).references(:productupdates))
In both theses cases the table will be left joined. I chose to use eager load in this case as includes delegates to eager_load in the above cases anyway
You can directly do Product.includes(:productupdates) this will query the database with left outer join as well as it will overcome the N+1 query problem.
So instead of Product.left_outer_joins(:productupdates) in your query use Product.includes(:productupdates)
after firing this query in the console you can see that includes fires left outer join query on the table

Properly format an ActiveRecord query with a subquery in Postgres

I have a working SQL query for Postgres v10.
SELECT *
FROM
(
SELECT DISTINCT ON (title) products.title, products.*
FROM "products"
) subquery
WHERE subquery.active = TRUE AND subquery.product_type_id = 1
ORDER BY created_at DESC
With the goal of the query to do a distinct based on the title column, then filter and order them. (I used the subquery in the first place, as it seemed there was no way to combine DISTINCT ON with ORDER BY without a subquery.
I am trying to express said query in ActiveRecord.
I have been doing
Product.select("*")
.from(Product.select("DISTINCT ON (product.title) product.title, meals.*"))
.where("subquery.active IS true")
.where("subquery.meal_type_id = ?", 1)
.order("created_at DESC")
and, that works! But, it's fairly messy with the string where clauses in there. Is there a better way to express this query with ActiveRecord/Arel, or am I just running into the limits of what ActiveRecord can express?
I think the resulting ActiveRecord call can be improved.
But I would start improving with original SQL query first.
Subquery
SELECT DISTINCT ON (title) products.title, products.* FROM products
(I think that instead of meals there should be products?) has duplicate products.title, which is not necessary there. Worse, it misses ORDER BY clause. As PostgreSQL documentation says:
Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
I would rewrite sub-query as:
SELECT DISTINCT ON (title) * FROM products ORDER BY title ASC
which gives us a call:
Product.select('DISTINCT ON (title) *').order(title: :asc)
In main query where calls use Rails-generated alias for the subquery. I would not rely on Rails internal convention on aliasing subqueries, as it may change anytime. If you do not take this into account you could merge these conditions in one where call with hash-style argument syntax.
The final result:
Product.select('*')
.from(Product.select('DISTINCT ON (title) *').order(title: :asc))
.where(subquery: { active: true, meal_type_id: 1 })
.order('created_at DESC')

Populate an active record collection with different SQLs on the same model in rails

I'm trying to populate an active record collection from several SQLs on the same model. The only thing that differs between the SQLs is the where clause. My models have a type_id. As an example I have
models = Model.where("type_id = ?", 1)
logger.debug 'models.count ' + models.count.to_s
m = Model.where("type_id = ?", 2)
models << m
logger.debug 'models.count ' + models.count.to_s
From that, my logfile shows me
SELECT COUNT(*) FROM "models" WHERE (type_id = 1)
models.count 1
SELECT COUNT(*) FROM "models" WHERE (type_id = 1)
models.count 1
The second SQL is not correct for my situation, I wanted
SELECT COUNT(*) FROM "models" WHERE (type_id = 2)
The only way I've found to get around this is to do Model.all, iterate over each and add the ones I want. This would be very time consuming for a large model. Is there a better way?
From the sounds of it, you're looking for any Model with a type_id of either 1 or 2. In SQL, you would express this as an IN subclause:
SELECT * FROM models WHERE type_id IN (1, 2);
In Rails, you can pass an array of acceptable values to a where call to generate the SQL IN statement:
Model.where(:type_id => [1, 2])
As stated by #ArtOfCode what you want is to do the query on one pass. That being said, what you are trying to do there won't work because when you are adding with << the object of your second query to the first one you are just appending the instance to the first collection. The object type of the resulting query is an ActiveRecord_Relation which happens to hold two instances of your custom models (in this case Model) but when you send / call count thats actually executing an ActiveRecord query.
How can you tell the difference? Well, if you do run that code you used and do:
models.count
You'll see that there's SQL executed for whatever the conditions of the query on models you did, however, if you do this:
models.length
You'll notice the result is 2, and the reason is because the length of the collection of your own objects which happens to be inside the ActiveRecord_Relation is indeed two, and that is what happens if you use <<; it'll add object instances to the relation but that does not mean that they are part of the query.
You could even do this:
models << Model.new
And calling models.length would effectively return 3 because that is the amount of instances of your model that are contained within the relation, again, not a part of the query. So as you can see you can even add new object instances which have not even been saved to the database.
TL;DR if you want to query objects that are stored in the database do it on the query itself, or chain conditions at once, but don't try to mix activerecord relation collections.

ActiveRecord query searching for duplicates on a column, but returning associated records

So here's the lay of the land:
I have a Applicant model which has_many Lead records.
I need to group leads by applicant email, i.e. for each specific applicant email (there may be 2+ applicant records with the email) i need to get a combined list of leads.
I already have this working using an in-memory / N+1 solution
I want to do this in a single query, if possible. Right now I'm running one for each lead which is maxing out the CPU.
Here's my attempt right now:
Lead.
all.
select("leads.*, applicants.*").
joins(:applicant).
group("applicants.email").
having("count(*) > 1").
limit(1).
to_a
And the error:
Lead Load (1.2ms) SELECT leads.*, applicants.* FROM "leads" INNER
JOIN "applicants" ON "applicants"."id" = "leads"."applicant_id"
GROUP BY applicants.email HAVING count(*) > 1 LIMIT 1
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"leads.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT leads.*, applicants.* FROM "leads" INNER JOIN
"appli...
This is a postgres specific issue. "the selected fields must appear in the GROUP BY clause".
must appear in the GROUP BY clause or be used in an aggregate function
You can try this
Lead.joins(:applicant)
.select('leads.*, applicants.email')
.group_by('applicants.email, leads.id, ...')
You will need to list all the fields in leads table in the group by clause (or all the fields that you are selecting).
I would just get all the records and do the grouping in memory. If you have a lot of records, I would paginate them or batch them.
group_by_email = Hash.new { |h, k| h[k] = [] }
Applicant.eager_load(:leads).each_batch(10_000) do |batch|
batch.each do |applicant|
group_by_email[:applicant.email] << applicant.leads
end
end
You need to use a .where rather than using Lead.all. The reason it is maxing out the CPU is you are trying to load every lead into memory at once. That said I guess I am still missing what you actually want back from the query so it would be tough for me to help you write the query. Can you give more info about your associations and the expected result of the query?

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

Resources