I find my query is taking too long to load so I'm wondering if the position of the includes matters.
Example A:
people = Person.where(name: 'guillaume').includes(:jobs)
Example B:
people = Person.includes(:jobs).where(name: 'guillaume')
Is example A faster because I should have fewer people's jobs to load?
Short answer: no.
ActiveRecord builds your query and as long as you don't need the records, it won't send the final SQL query to the database to fetch them. The 2 queries you pasted are identical.
Whenever in doubt, you can always open up rails console, write your queries there and observe the queries printed out. In your example it would be something like:
SELECT "people".* FROM "people" WHERE "people"."name" = $1 LIMIT $2 [["name", "guillaume"], ["LIMIT", 11]]
SELECT "jobs".* FROM "jobs" WHERE "jobs"."person_id" = 1
in both of the cases.
Related
I know that this is basic, but I cannot figure it out for whatever reason.
I'm trying to return some model data and the majority of the time I return it with pluck, but pluck is returning an array. How do I return the value without the array?
<dd><%= Goal.where(id:post.attachid).pluck(:title) %></dd>
This is the code that I have and it is returning, for example, ["Computer"]
How do I just make it return Computer
Depending on your needs, you can do
= Goal.where(id:post.attachid).pluck(:title).to_s
or
= Goal.where(id:post.attachid).pluck(:title).join(", ")
This would produce a result like
#> Computer, Laptop, Mouse...
In cases in which you expect the database to only return one record you might want to use find_by instead of where.
dd><%= Goal.find_by(id: post.attachid).title %></dd>
find_by always returns only a single record while where always returns an array-like ActiveRecord::Relation.
It's interesting to check what's the SQL generated by activerecord (you could verify that in your console):
Goal.where(id: post.attachid).pluck(:title)
Will produce something like:
SELECT "goals"."title" FROM "goals" WHERE "goals"."attachid" = $1 [["id", 1]]
I guess a cheaper alternative for what you want to achieve would be:
Goal.select(:title).find(post.attachid).title
That would produce:
SELECT "goals"."title" FROM "goals" WHERE "goals"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
That are many ways you could achieve the same thing, I would advise you to experiment, observe the SQL output and find the one that suit you best.
I am using rails 4.2.11.1, on ruby 2.6.3
I have had extremely slow requests using rails, so I benchmarked my code and found the main culprit. The majority of the slowdown happens at the database call, where I select a single row from a table in the database. I have tried a few different versions of the same idea
Using this version
Rails.logger.info Benchmark.measure{
result = Record.find_by_sql(['SELECT column FROM table WHERE condition']).first.column
}
the rails output says that the sql takes 54.5ms, but the benchmark prints out 0.043427 0.006294 0.049721 ( 1.795859), and the total request takes 1.81 seconds. When I run the above sql directly in my postgres terminal, it takes 42ms.
Obviously the problem is not that my sql is slow. 42 milliseconds is not noticeable. But 1.79 seconds is way too slow, and creates a horrible user experience.
I did some reading and came to the conclusion that the slowdown was caused by rails' object creation (which seems weird, but apparently that can be super slow) so I tried using pluck to minimize the number of objects created:
Rails.logger.info Benchmark.measure{
result = Record.where(condition).pluck(column).first
}
Now rails says that the sql took 29.3ms, and the benchmark gives 0.017989 0.006119 0.024108 ( 0.713973)
The whole request takes 0.731 seconds. This is a huge improvement, but 0.7 seconds is still a bad slowdown and still undermines the usability of my application.
What am I doing wrong? It seems insane to me that something so simple should have such a huge slowdown. If this is just how rails works I can't imagine that anyone uses it for serious applications!
find_by_sql executes a custom SQL query against your database and returns all the results.
That means all the records in your database are returned and instanciated. Only then do you pick the first one from that array by calling first on the results.
When you call first on a ActiveRecord::Relation, it will add a limit to your query and pick only that, which is the behavior you want.
That means you should be limiting the query yourself:
result = Record.find_by_sql(['SELECT column FROM table WHERE condition LIMIT 1']).first.column
I'm pretty sure that your request will be fast then as ruby doesn't need to instanciate all the result rows.
As I mentioned above, not sure why you ask for all the matches if you just want the first one.
If I do:
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').pluck(:email).first
}
(9.6ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 [["email", "foo#bar.com"]]
#<Benchmark::Tms:0x00007fc2ce4b7998 #label="", #real=0.6364280000561848, #cstime=0.00364, #cutime=0.000661, #stime=0.1469640000000001, #utime=0.1646029999999996, #total=0.3158679999999997>
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').limit(1).pluck(:email)
}
(1.8ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "foo#bar.com.com"], ["LIMIT", 1]]
#<Benchmark::Tms:0x00007fc2ce4cd838 #label="", #real=0.004004000045824796, #cstime=0.0, #cutime=0.0, #stime=0.0005539999999997214, #utime=0.0013550000000002171, #total=0.0019089999999999385>
Rails also does cashing. If you run your query again it should be faster the second time. How complex is your where condition? That might be part of it.
I have a rule builder that ultimately builds up ActiveRecord queries by chaining multiple where calls, like so:
Track.where("tracks.popularity < ?", 1).where("(audio_features ->> 'valence')::numeric between ? and ?", 2, 5)
Then, if someone wants to sort the results randomly, it would append order("random()").
However, given the table size, random() is extremely inefficient for ordering, so I need to use Postgres TABLESAMPLE-ing.
In a raw SQL query, that looks like this:
SELECT * FROM "tracks" TABLESAMPLE SYSTEM(0.1) LIMIT 250;
Is there some way to add that TABLESAMPLE SYSTEM(0.1) to the existing chain of ActiveRecord calls? Putting it inside a where() or order() doesn't work since it's not a WHERE or ORDER BY function.
irb(main):004:0> Track.from('"tracks" TABLESAMPLE SYSTEM(0.1)')
Track Load (0.7ms) SELECT "tracks".* FROM "tracks" TABLESAMPLE SYSTEM(0.1) LIMIT $1 [["LIMIT", 11]]
I run simple query:
History.where(channel_id: 1).order('histories.id DESC').first
Result:
History Load (808.8ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
808.8ms for 1 of 7 records with channel_id = 1. Total histories count is 2,110,443.
If I select all histories for channel_id = 1:
History.where(channel_id: 1)
History Load (0.5ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 [["channel_id", 1]]
It took only 0.5ms
And if we try to take one record with help of ruby Array:
History.where(channel_id: 1).order('histories.id DESC').to_a.first
History Load (0.5ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY id DESC [["channel_id", 1]]
Where I should find the problem?
PS: I already have an index on channel_id field.
UPD:
History.where(channel_id: 1).order('histories.id DESC').limit(1).explain
History Load (848.9ms) SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
=> EXPLAIN for: SELECT "histories".* FROM "histories" WHERE "histories"."channel_id" = 1 ORDER BY histories.id DESC LIMIT 1 [["channel_id", 1]]
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit (cost=0.43..13.52 rows=1 width=42)
-> Index Scan Backward using histories_pkey on histories (cost=0.43..76590.07 rows=5849 width=42)
Filter: (channel_id = 1)
(3 rows)
There are two ways PostgreSQL can handle your query (with the ORDER BY and LIMIT clauses):
It can scan the table and order the found tuples, then limit your result. PostgreSQL will choose this plan if he thinks your table has a really low number of tuples, or if he thinks the index will be of no use;
It can use the index.
It seems that PostgreSQL did choose the first option, which can occur for only two reasons in my humble opinion:
Your table statistics are not accurate. I recommend you VACUUM your table and try again this query
The channel_id values are unevenly distributed (for example almost all of the tuples have channel_id=2), which is why PostgreSQL thinks that the index is of no use. Here I recommend the use of a partial index.
After running two similar queries like
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(2)
I was expecting to see two SQL statements in my console being executed by the server. However, the first query is missing and only the second one is being run. Similarly, after executing the following two queries:
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(#articles.size - 2)
the first query is completely ignored as well. These two queries generate the SQL:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "articles"
WHERE "articles"."magazine_id" = $1 LIMIT 2 OFFSET 0)
subquery_for_count [["magazine_id", 1]]
SELECT "articles".* FROM "articles"
WHERE "articles"."magazine_id" = $1
LIMIT 2 OFFSET 2 [["magazine_id", 1]]
Interestingly enough, if I change #articles.size to #articles.length both queries are run as expected. I would think since length requires the collection in memory, the first statement is forced to run. Can anyone describe what's happening here and if it's too broad a topic, point me to a good resource.
It's not so much optimising as deferring execution of the query until it really needs to execute it.
In both cases you're storing the result of building up a query in #articles. Active Record, or more accurately arel, defers execution of the query until you call a method that needs the results. I suspect that you're actually seeing the query being executed against the database when you call something like #artircles.each or #articles.count or somesuch.
You could build the query up in a series of steps and it won't actually get executed:
a = #magazine.articles
a = a.limit(2)
a = a.offset(0)
It also means you can leave some query clause that drastically reduces the result size to the end of the process:
a = a.where('created_at > ?', Time.now.at_beginning_of_day)
Still no query has been sent to the database.
The thing to watch out for is testing this logic in the rails console. If you run these steps in the console itself it tries to display the last return value (by calling .inspect I think) and by inspecting the return value it causes the query to be executed. So if you put a = Magazine.find(1).articles into the console you'll see a query immediately exeecuted which wouldn't have been if the code was run in the context of a controller action for example. If you then call a.limit(2) you'll see another query and so on.