I'm using Rails, I'm doing something like this - which is more efficient?
post = current_user.posts.find(29)
OR
post = Post.where("user_id = ? AND id = ?", user.id, 29).first
I'm guessing the first statement would do something like SELECT * FROM posts WHERE user_id = x (current_user is a preset User instance) then find post #29 amongst the returned array/rows; however, the second one might do something like SELECT * FROM posts WHERE user_id = x AND id = 29 LIMIT 0,1 .. is it quicker to fetch all, without any criteria, then let ruby search within the returned array/rows; OR, is criteria and a limit a quicker way to do it; OR, does it depend on the length/width of the table and countless other things? Thanks
SQL query in both cases will be the same. So there's no difference in time of execution - but the first statement is more idiomatic, hence should be preferred.
Related
Assuming this simplified schema:
users has_many discount_codes
discount_codes has_many orders
I want to grab all users, and if they happen to have any orders, only include the orders that were created between two dates. But if they don't have orders, or have orders only outside of those two dates, still return the users and do not exclude any users ever.
What I'm doing now:
users = User.all.includes(discount_codes: :orders)
users = users.where("orders.created_at BETWEEN ? AND ?", date1, date2).
or(users.where(orders: { id: nil })
I believe my OR clause allows me to retain users who do not have any orders whatsoever, but what happens is if I have a user who only has orders outside of date1 and date2, then my query will exclude that user.
For what it's worth, I want to use this orders where clause here specifically so I can avoid n + 1 issues later in determining orders per user.
Thanks in advance!
It doesn't make sense to try and control the orders that are loaded as part of the where clause for users. If you were to control that it'd have to be part of the includes (which I think means it'd have to be a part of the association).
Although technically it can combine them into a single query in some cases, activerecord is going to do this as two queries.
The first query will be executed when you go to iterate over the users and will use that where clause to limit the users found.
It will then run a second query behind the scenes based on that includes statement. This will simply be a query to get all orders which are associated with the users that were found by the previous query. As such the only way to control the orders that are found through the user's where clause is to omit users from the result set.
If I were you I would create an instance method in User model for what you are looking for but instead of using where use a select block:
def orders_in_timespan(start, end)
orders.select{ |o| o.between?(start, end) }
end
Because of the way ActiveRecord will cache the found orders from the includes against the instance then if you start off with an includes in your users query then I believe this will not result in n queries.
Something like:
render json: User.includes(:orders), methods: :orders_in_timespan
Of course, the easiest way to confirm the number of queries is to look at the logs. I believe this approach should have two queries regardless of the number of users being rendered (as likely does your code in the question).
Also, I'm not sure how familiar you are with sql but you can call .to_sql on the end of things such as your users variable in order to see the sql that would be generated which might help shed some light on the discrepancies between what you're getting and what you're looking for.
Option 1: Write a custom query in SQL (ugly).
Option 2: Create 2 separate queries like below...
#users = User.limit(10)
#orders = Order.joins(:discount_code)
.where(created_at: [10.days.ago..1.day.ago], discount_codes: {user_id: users.select(:id)})
.group_by{|order| order.discount_code.user_id}
Now you can use it like this ...
#users.each do |user|
orders = #orders[user.id]
puts user.name
puts user.id
puts orders.count
end
I hope this will solve your problem.
You need to use joins instead of includes. Rails joins use inner joins and will reject all the records which don't have associations.
User.joins(discount_codes: :orders).where(orders: {created_at: [10.days.ago..1.day.ago]}).distinct
This will give you all distinct users who placed orders in a given period of time.
user = User.joins(:discount_codes).joins(:orders).where("orders.created_at BETWEEN ? AND ?", date1, date2) +
User.left_joins(:discount_codes).left_joins(:orders).group("users.id").having("count(orders.id) = 0")
I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.
I was hoping that Rails could break down the enormous model into smaller lists using a line like this:
#today_hooks = #m_webhooks.where(:date_sent => this_date)
I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:
SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent')
This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.
I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.
m_webhooks.controller.rb
def index
def set_sub_count_hash(thip) {
gmail_hooks: {opened: a = thip.gmail.send(#event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
hotmail_hooks: {opened: a = thip.hotmail.send(#event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
yahoo_hooks: {opened: a = thip.yahoo.send(#event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
other_hooks: {opened: a = thip.other.send(#event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},
}
end
#m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
#event = params[:event] || "unique_opened"
#m_list_of_ips = [#List of three ip addresses]
end_date = Date.today
start_date = Date.today - 10.days
date_range = (end_date - start_date).to_i
#count_array = []
date_range.times do |n|
this_date = end_date - n.days
#today_hooks = #m_webhooks.where(:date_sent => this_date)
#count_array[n] = {:this_date => this_date}
#m_list_of_ips.each_with_index do |ip, index|
thip = #today_hooks.where(:sending_ip => ip) #Stands for "Today Hooks ip"
#count_array[n][index] = set_sub_count_hash(thip)
end
end
Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.
Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.
Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.
Therefore you have something like:
10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method)
and then you have a problem.
What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.
I imagine the query something like:
main_query = #m_webhooks.where('date_sent > ?', 10.days.ago.to_date)
.where(sending_ip:#m_list_of_ips)
Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.
I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)
In case this helps anybody else, I learned how to fill a hash with counts in a much simpler way. More importantly, this approach runs a single query (as opposed to the 240 queries that I was running before).
#count_array[esp_index][j] = MWebhook.where('date_sent > ?', start_date.to_date)
.group('date_sent', 'sending_ip', 'event', 'esp').count
I have users, problems, and attempts which is a join table between users and problems. I'm looking to show an index of all the problems along with the current user's most recent attempt for each, if they have one.
I've tried four things to get a left join with conditions and none of them have worked.
The naive approach is something like...
#problems = Problem.enabled
#problems.each do { |prob|
prob.last_attempt = prob.attempts
.where(user_id: current_user.id)
.last
end
This gets all the problems and the attempts I want but is N+1 queries. So...
#problems = Problem.enabled
.includes(:attempts)
This does the left join (or the equivalent two queries) getting all the problems but also all the attempts, not just those for the current user. So...
#problems = Problem.enabled
.includes(:attempts)
.where(attempts: {user_id: current_user.id})
This gets only those problems that the current user has already attempted.
So...
//problem.rb
has_many :user_attempts,
-> (user) { where(user_id: user.id) },
class_name: 'Attempt'
//problem_controller.index
#problems = Problem.enabled
.includes(:user_attempts, current_user)
And this gives an error message from rails saying joins with instance
arguments are not supported.
So I'm stuck. What is the best way to do this? Is Arel the right tool? Can I skip active record and just get back a JSON blob? Am I just being dumb?
This question is quite similar to this one but I'd need a argument to the joined scope which isn't supported. And I'm hoping rails added something in last couple years.
Thanks so much for your help.
The way I solved this was to use raw sql. It's ugly and a security risk but I didn't find better.
results = Problem.connection.exec_query(%(
SELECT *
FROM problems
LEFT JOIN (
SELECT *
//etc.
)
))
And then manipulating the results array in memory.
I have User model that is related to a Friend model (has_many / belongs_to)
After joining, I would like to be able to check if a certain friend object exists in the friends that were joined to users:
users = User.joins(:friends).where("some condition") # subset of total friends
fs = Friend.all
fs.each do |f|
if users.friends.includes?(f) # match!
...
else # no match
...
end
The code as-is does not work and I am having difficulties getting this functionality in code.
Try something like this:
users.friends.where(id: u.id).exists?
That should generate a query like so:
SELECT 1 AS one FROM `users` WHERE `users`.`friend_id` = 42 AND `users`.`id` = 1 LIMIT 1
You'll either get back the number 1 (considered "truthy"), or nil (considered "falsey").
Side note: Unless you need to use your u variable later, you can probably get away with simply placing some_id directly in the where clause, and not do the second User lookup.
Edit
Just noticed a problem in your loop that might be what is causing your original problem. When you loaded up the list of users, unless you have some limit clause or invoked .first, you'll get back an array of users. So I'm guessing your application is crapping out on this line:
users.friends.includes?(f)
Because .friends is a method of a User object, not of an array.
So you'll have to do a nested loop instead like so:
fs.each do |f|
users.each do |u|
u.friends.includes?(f)
end
end
Note that this method might be very slow, depending on the number of friends and users. It is a very inefficient algorithm, which is why I'm trying to understand your situation better in the comments, because I'm certain there's a more efficient way to accomplish your task.
The answer to this question should be simple, but I haven't found one through the active record querying guide, other questions here on SO, or through messing around in the Rails console.
I simply want to query the database through the active record querying interface and return the value of a single column of the first or last entry, without having to traverse through the entire table (will explain in a moment).
There is a way to do this with pluck, however the structure of the query messages are as follows:
initial = Message.where("id = ?", some_id).pluck(:value).first
final = Message.where("id = ?", some_id).pluck(:value).last
Unfortunately, this is an extremely inefficient operation as it plucks the value attribute out of every record where there is a match on the provided id, before returning either just the first or last entry. I would like to basically reorder the statements to be something along the lines of:
initial = Message.where("id = ?", some_id).first.pluck(:value)
final = Message.where("id = ?", some_id).last.pluck(:value)
However, I get an NoMethodError explaining there is no method pluck for Message. I've tried to do this various ways:
initial = Message.where("id = ?", some_id).first(:value)
initial = Message.where("id = ?", some_id).first.select(:value)
...
But all return some sort of error. I know returning the
Oops
Somehow part of my question got cut off (including the answer I had at the end) - I'll see if I can find it, but I explored using the select() method, in which I found that select only takes one argument meaning a query string must be built as it cannot take optional arguments like id = ?, some_id, but then I found that just appending a .value (where value is the column attribute that you are trying to get) works, so I switched back to the where method as shown in the answer below.
Answer is in the question, but if you're trying to do something like this:
initial = Message.where("id = ?", some_id).pluck(:value).first
final = Message.where("id = ?", some_id).pluck(:value).last
Change it to this (just reference the column name, in this example it is value, but it could be amount or something):
initial = Message.where("id = ?", some_id).first.value
final = Message.where("id = ?", some_id).last.value