Does Rails automagically optimize queries - ruby-on-rails

After running two similar queries like
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(2)
I was expecting to see two SQL statements in my console being executed by the server. However, the first query is missing and only the second one is being run. Similarly, after executing the following two queries:
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(#articles.size - 2)
the first query is completely ignored as well. These two queries generate the SQL:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "articles"
WHERE "articles"."magazine_id" = $1 LIMIT 2 OFFSET 0)
subquery_for_count [["magazine_id", 1]]
SELECT "articles".* FROM "articles"
WHERE "articles"."magazine_id" = $1
LIMIT 2 OFFSET 2 [["magazine_id", 1]]
Interestingly enough, if I change #articles.size to #articles.length both queries are run as expected. I would think since length requires the collection in memory, the first statement is forced to run. Can anyone describe what's happening here and if it's too broad a topic, point me to a good resource.

It's not so much optimising as deferring execution of the query until it really needs to execute it.
In both cases you're storing the result of building up a query in #articles. Active Record, or more accurately arel, defers execution of the query until you call a method that needs the results. I suspect that you're actually seeing the query being executed against the database when you call something like #artircles.each or #articles.count or somesuch.
You could build the query up in a series of steps and it won't actually get executed:
a = #magazine.articles
a = a.limit(2)
a = a.offset(0)
It also means you can leave some query clause that drastically reduces the result size to the end of the process:
a = a.where('created_at > ?', Time.now.at_beginning_of_day)
Still no query has been sent to the database.
The thing to watch out for is testing this logic in the rails console. If you run these steps in the console itself it tries to display the last return value (by calling .inspect I think) and by inspecting the return value it causes the query to be executed. So if you put a = Magazine.find(1).articles into the console you'll see a query immediately exeecuted which wouldn't have been if the code was run in the context of a controller action for example. If you then call a.limit(2) you'll see another query and so on.

Related

How can I speed up a simple rails database query?

I am using rails 4.2.11.1, on ruby 2.6.3
I have had extremely slow requests using rails, so I benchmarked my code and found the main culprit. The majority of the slowdown happens at the database call, where I select a single row from a table in the database. I have tried a few different versions of the same idea
Using this version
Rails.logger.info Benchmark.measure{
result = Record.find_by_sql(['SELECT column FROM table WHERE condition']).first.column
}
the rails output says that the sql takes 54.5ms, but the benchmark prints out 0.043427 0.006294 0.049721 ( 1.795859), and the total request takes 1.81 seconds. When I run the above sql directly in my postgres terminal, it takes 42ms.
Obviously the problem is not that my sql is slow. 42 milliseconds is not noticeable. But 1.79 seconds is way too slow, and creates a horrible user experience.
I did some reading and came to the conclusion that the slowdown was caused by rails' object creation (which seems weird, but apparently that can be super slow) so I tried using pluck to minimize the number of objects created:
Rails.logger.info Benchmark.measure{
result = Record.where(condition).pluck(column).first
}
Now rails says that the sql took 29.3ms, and the benchmark gives 0.017989 0.006119 0.024108 ( 0.713973)
The whole request takes 0.731 seconds. This is a huge improvement, but 0.7 seconds is still a bad slowdown and still undermines the usability of my application.
What am I doing wrong? It seems insane to me that something so simple should have such a huge slowdown. If this is just how rails works I can't imagine that anyone uses it for serious applications!
find_by_sql executes a custom SQL query against your database and returns all the results.
That means all the records in your database are returned and instanciated. Only then do you pick the first one from that array by calling first on the results.
When you call first on a ActiveRecord::Relation, it will add a limit to your query and pick only that, which is the behavior you want.
That means you should be limiting the query yourself:
result = Record.find_by_sql(['SELECT column FROM table WHERE condition LIMIT 1']).first.column
I'm pretty sure that your request will be fast then as ruby doesn't need to instanciate all the result rows.
As I mentioned above, not sure why you ask for all the matches if you just want the first one.
If I do:
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').pluck(:email).first
}
(9.6ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 [["email", "foo#bar.com"]]
#<Benchmark::Tms:0x00007fc2ce4b7998 #label="", #real=0.6364280000561848, #cstime=0.00364, #cutime=0.000661, #stime=0.1469640000000001, #utime=0.1646029999999996, #total=0.3158679999999997>
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').limit(1).pluck(:email)
}
(1.8ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "foo#bar.com.com"], ["LIMIT", 1]]
#<Benchmark::Tms:0x00007fc2ce4cd838 #label="", #real=0.004004000045824796, #cstime=0.0, #cutime=0.0, #stime=0.0005539999999997214, #utime=0.0013550000000002171, #total=0.0019089999999999385>
Rails also does cashing. If you run your query again it should be faster the second time. How complex is your where condition? That might be part of it.

Rails: How to Eager Load with Left Join Table?

Currently I have a controller query which fetches products & product updates as follows:
products = Product.left_outer_joins(:productupdates).select("products.*, count(productupdates.*) as update_count, max(productupdates.old_price) as highest_price").group(:id)
products = products.paginate(:page => params[:page], :per_page => 20)
This query creates N+1 query but I can not put .include(:productsupdates) since I have a left out join as well.
If possible, can you please help me how to reduce N+1 queries?
EDIT------------------------------
As per Vishal's suggestion; I have changed the controller query as follows,
products = product.includes(:productupdates).select("products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price").group("productupdates.product_id")
products = products.paginate(:page => params[:page], :per_page => 20)
Unfortunately, I receive the following error:
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: missing FROM-clause entry for table "productupdates"
LINE 1: SELECT products.*, count(productupdates.*) as productupdate_count, m...
^
: SELECT products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price FROM "products" WHERE "products"."isopen" = $1 AND (products.year > 2009) AND ("products"."make" IS NOT NULL) GROUP BY productupdates.product_id LIMIT $2 OFFSET $3):
Please advise how this is causing N+1 and how you think this will solve the issue. The only way I can see an N+1 situation here is if you are then calling productupdates on each product later. If this is the case then this will not solve the issue. Please advise so others can formulate appropriate responses
For the time being I am going to assume that somewhere later in the code you are calling productupdates on the individual products. If this is the case then we can solve this without the aggregation as follows
#products = Product.eager_load(:productupdates)
Now when we loop the productupdates are already loaded so to get the count and the max we can do things like
#products.each do |p|
# COUNT
# (don't use the count method or it will execute a query )
p.productupdates.size
# MAX old_price
# older ruby versions use rails `try` instead
# e.g. p.productupdates.max_by(&:old_price).try(:old_price) || 0
p.productupdates.max_by(&:old_price)&.old_price || 0
end
Using these methods will not execute additional queries since the productupdates are already loaded
Side note: The reason includes did not work for you is that includes will use 2 queries to retrieve the data (sudo outer join) unless one of the following conditions is met:
The where clause uses a hash finder condition that references the association table (e.g. where(productupdates: {old_price: 12}))
You include the references method (e.g. Product.includes(:productupdates).references(:productupdates))
In both theses cases the table will be left joined. I chose to use eager load in this case as includes delegates to eager_load in the above cases anyway
You can directly do Product.includes(:productupdates) this will query the database with left outer join as well as it will overcome the N+1 query problem.
So instead of Product.left_outer_joins(:productupdates) in your query use Product.includes(:productupdates)
after firing this query in the console you can see that includes fires left outer join query on the table

ActiveRecord query searching for duplicates on a column, but returning associated records

So here's the lay of the land:
I have a Applicant model which has_many Lead records.
I need to group leads by applicant email, i.e. for each specific applicant email (there may be 2+ applicant records with the email) i need to get a combined list of leads.
I already have this working using an in-memory / N+1 solution
I want to do this in a single query, if possible. Right now I'm running one for each lead which is maxing out the CPU.
Here's my attempt right now:
Lead.
all.
select("leads.*, applicants.*").
joins(:applicant).
group("applicants.email").
having("count(*) > 1").
limit(1).
to_a
And the error:
Lead Load (1.2ms) SELECT leads.*, applicants.* FROM "leads" INNER
JOIN "applicants" ON "applicants"."id" = "leads"."applicant_id"
GROUP BY applicants.email HAVING count(*) > 1 LIMIT 1
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"leads.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT leads.*, applicants.* FROM "leads" INNER JOIN
"appli...
This is a postgres specific issue. "the selected fields must appear in the GROUP BY clause".
must appear in the GROUP BY clause or be used in an aggregate function
You can try this
Lead.joins(:applicant)
.select('leads.*, applicants.email')
.group_by('applicants.email, leads.id, ...')
You will need to list all the fields in leads table in the group by clause (or all the fields that you are selecting).
I would just get all the records and do the grouping in memory. If you have a lot of records, I would paginate them or batch them.
group_by_email = Hash.new { |h, k| h[k] = [] }
Applicant.eager_load(:leads).each_batch(10_000) do |batch|
batch.each do |applicant|
group_by_email[:applicant.email] << applicant.leads
end
end
You need to use a .where rather than using Lead.all. The reason it is maxing out the CPU is you are trying to load every lead into memory at once. That said I guess I am still missing what you actually want back from the query so it would be tough for me to help you write the query. Can you give more info about your associations and the expected result of the query?

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

No Method Error 'map' for #<Arel::Nodes::SqlLiteral>

I have the following example query:
source = "(SELECT DISTINCT source.* FROM (SELECT * FROM items) AS source) AS items"
items = Item.select("items.*").from(source).includes([:images])
p items # [#<Item id: 1>, #<Item id:2>]
However running:
p items.count
Results in NoMethodError: undefined methodmap' for Arel::Nodes::SqlLiteral`
I appreciate the query is silly, however the non-simplifieid query is a bit too complicated to copy and this was the smallest crashing version I could create. Any ideas?
Can you call all on that object to essentially cast it to an Array?
Item.select("items.*").from(source).includes([:images]).all.count
Or perhaps in that case, size would be more appropriate. In any case, this will execute the query and load all the objects into memory, which may not be desirable.
It looks like the problem is with your includes([:images]). On a similar application, I can execute this from the console:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').count
(0.5ms) SELECT COUNT(*) FROM (SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories
(Notice that the count overrides the SELECT clause, even though I explicitly specified items.*. But they're still equivalent queries.)
As soon as I add an includes scope, it fails:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').includes(:projects).count
NoMethodError: undefined method `left' for #<Arel::Nodes::SqlLiteral:0x131d35248>
I tried a few different means of acquiring the count, like select('COUNT(categories.*)'), but they all failed in various ways. ActiveRecord seems to be falling back on a basic LEFT OUTER JOIN to perform the eager loading, possibly because it thinks you're using some kind of condition or external table to perform the join, and this seems to confuse its normal methods of performing the count. See the end of the section on Eager Loading in the ActiveRecord::Associations docs.
My Suggestion
If the join doesn't affect the number of rows returned in the outer query, I'd say your best bet is to execute one query to get the count and one query to get the actual results. We have to do something similar in our application for paging: one query returns the current page of results, and one returns the total number of records matching the filter criteria.
The issue is Rails #24193 https://github.com/rails/rails/issues/24193 and has to do with from combined with eager loading. The workaround is to use the form: Item.select("items.*").from([Arel.sql(source)]).includes([:images])

Resources