I am using rails 4.2.11.1, on ruby 2.6.3
I have had extremely slow requests using rails, so I benchmarked my code and found the main culprit. The majority of the slowdown happens at the database call, where I select a single row from a table in the database. I have tried a few different versions of the same idea
Using this version
Rails.logger.info Benchmark.measure{
result = Record.find_by_sql(['SELECT column FROM table WHERE condition']).first.column
}
the rails output says that the sql takes 54.5ms, but the benchmark prints out 0.043427 0.006294 0.049721 ( 1.795859), and the total request takes 1.81 seconds. When I run the above sql directly in my postgres terminal, it takes 42ms.
Obviously the problem is not that my sql is slow. 42 milliseconds is not noticeable. But 1.79 seconds is way too slow, and creates a horrible user experience.
I did some reading and came to the conclusion that the slowdown was caused by rails' object creation (which seems weird, but apparently that can be super slow) so I tried using pluck to minimize the number of objects created:
Rails.logger.info Benchmark.measure{
result = Record.where(condition).pluck(column).first
}
Now rails says that the sql took 29.3ms, and the benchmark gives 0.017989 0.006119 0.024108 ( 0.713973)
The whole request takes 0.731 seconds. This is a huge improvement, but 0.7 seconds is still a bad slowdown and still undermines the usability of my application.
What am I doing wrong? It seems insane to me that something so simple should have such a huge slowdown. If this is just how rails works I can't imagine that anyone uses it for serious applications!
find_by_sql executes a custom SQL query against your database and returns all the results.
That means all the records in your database are returned and instanciated. Only then do you pick the first one from that array by calling first on the results.
When you call first on a ActiveRecord::Relation, it will add a limit to your query and pick only that, which is the behavior you want.
That means you should be limiting the query yourself:
result = Record.find_by_sql(['SELECT column FROM table WHERE condition LIMIT 1']).first.column
I'm pretty sure that your request will be fast then as ruby doesn't need to instanciate all the result rows.
As I mentioned above, not sure why you ask for all the matches if you just want the first one.
If I do:
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').pluck(:email).first
}
(9.6ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 [["email", "foo#bar.com"]]
#<Benchmark::Tms:0x00007fc2ce4b7998 #label="", #real=0.6364280000561848, #cstime=0.00364, #cutime=0.000661, #stime=0.1469640000000001, #utime=0.1646029999999996, #total=0.3158679999999997>
Rails.logger.info Benchmark.measure{
result = User.where(email: 'foo#bar.com').limit(1).pluck(:email)
}
(1.8ms) SELECT "users"."email" FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "foo#bar.com.com"], ["LIMIT", 1]]
#<Benchmark::Tms:0x00007fc2ce4cd838 #label="", #real=0.004004000045824796, #cstime=0.0, #cutime=0.0, #stime=0.0005539999999997214, #utime=0.0013550000000002171, #total=0.0019089999999999385>
Rails also does cashing. If you run your query again it should be faster the second time. How complex is your where condition? That might be part of it.
Related
I have a rule builder that ultimately builds up ActiveRecord queries by chaining multiple where calls, like so:
Track.where("tracks.popularity < ?", 1).where("(audio_features ->> 'valence')::numeric between ? and ?", 2, 5)
Then, if someone wants to sort the results randomly, it would append order("random()").
However, given the table size, random() is extremely inefficient for ordering, so I need to use Postgres TABLESAMPLE-ing.
In a raw SQL query, that looks like this:
SELECT * FROM "tracks" TABLESAMPLE SYSTEM(0.1) LIMIT 250;
Is there some way to add that TABLESAMPLE SYSTEM(0.1) to the existing chain of ActiveRecord calls? Putting it inside a where() or order() doesn't work since it's not a WHERE or ORDER BY function.
irb(main):004:0> Track.from('"tracks" TABLESAMPLE SYSTEM(0.1)')
Track Load (0.7ms) SELECT "tracks".* FROM "tracks" TABLESAMPLE SYSTEM(0.1) LIMIT $1 [["LIMIT", 11]]
I find my query is taking too long to load so I'm wondering if the position of the includes matters.
Example A:
people = Person.where(name: 'guillaume').includes(:jobs)
Example B:
people = Person.includes(:jobs).where(name: 'guillaume')
Is example A faster because I should have fewer people's jobs to load?
Short answer: no.
ActiveRecord builds your query and as long as you don't need the records, it won't send the final SQL query to the database to fetch them. The 2 queries you pasted are identical.
Whenever in doubt, you can always open up rails console, write your queries there and observe the queries printed out. In your example it would be something like:
SELECT "people".* FROM "people" WHERE "people"."name" = $1 LIMIT $2 [["name", "guillaume"], ["LIMIT", 11]]
SELECT "jobs".* FROM "jobs" WHERE "jobs"."person_id" = 1
in both of the cases.
So far, the "common" way to get a random record from the Database has been:
# Postgress
Model.order("RANDOM()").first
# MySQL
Model.order("RAND()").first
But, when doing this in Rails 5.2, it shows the following Deprecation Warning:
DEPRECATION WARNING: Dangerous query method (method whose arguments are used as raw SQL) called with non-attribute argument(s): "RANDOM()". Non-attribute arguments will be disallowed in Rails 6.0. This method should not be called with user-provided values, such as request parameters or model attributes. Known-safe values can be passed by wrapping them in Arel.sql().
I am not really familiar with Arel, so I am not sure what would be the correct way to fix this.
If you want to continue using order by random() then just declare it safe by wrapping it in Arel.sql like the deprecation warning suggests:
Model.order(Arel.sql('random()')).first # PostgreSQL
Model.order(Arel.sql('rand()')).first # MySQL
There are lots of ways of selecting a random row and they all have advantages and disadvantages but there are times when you absolutely must use a snippet of SQL in an order by (such as when you need the order to match a Ruby array and have to get a big case when ... end expression down to the database) so using Arel.sql to get around this "attributes only" restriction is a tool we all need to know about.
Edited: The sample code is missing a closing parentheses.
I'm a fan of this solution:
Model.offset(rand(Model.count)).first
With many records, and not many deleted records, this may be more efficient. In my case I have to use .unscoped because default scope uses a join. If your model doesn't use such a default scope, you can omit the .unscoped wherever it appears.
Patient.unscoped.count #=> 134049
class Patient
def self.random
return nil unless Patient.unscoped.any?
until #patient do
#patient = Patient.unscoped.find rand(Patient.unscoped.last.id)
end
#patient
end
end
#Compare with other solutions offered here in my use case
puts Benchmark.measure{10.times{Patient.unscoped.order(Arel.sql('RANDOM()')).first }}
#=>0.010000 0.000000 0.010000 ( 1.222340)
Patient.unscoped.order(Arel.sql('RANDOM()')).first
Patient Load (121.1ms) SELECT "patients".* FROM "patients" ORDER BY RANDOM() LIMIT 1
puts Benchmark.measure {10.times {Patient.unscoped.offset(rand(Patient.unscoped.count)).first }}
#=>0.020000 0.000000 0.020000 ( 0.318977)
Patient.unscoped.offset(rand(Patient.unscoped.count)).first
(11.7ms) SELECT COUNT(*) FROM "patients"
Patient Load (33.4ms) SELECT "patients".* FROM "patients" ORDER BY "patients"."id" ASC LIMIT 1 OFFSET 106284
puts Benchmark.measure{10.times{Patient.random}}
#=>0.010000 0.000000 0.010000 ( 0.148306)
Patient.random
(14.8ms) SELECT COUNT(*) FROM "patients"
#also
Patient.unscoped.find rand(Patient.unscoped.last.id)
Patient Load (0.3ms) SELECT "patients".* FROM "patients" ORDER BY "patients"."id" DESC LIMIT 1
Patient Load (0.4ms) SELECT "patients".* FROM "patients" WHERE "patients"."id" = $1 LIMIT 1 [["id", 4511]]
The reason for this is because we're using rand() to get a random ID and just do a find on that single record. However the greater the number of deleted rows (skipped ids) the more likely the while loop will execute multiple times. It might be overkill but could be worth a the 62% increase in performance and even higher if you never delete rows. Test if it's better for your use case.
After running two similar queries like
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(2)
I was expecting to see two SQL statements in my console being executed by the server. However, the first query is missing and only the second one is being run. Similarly, after executing the following two queries:
#articles = #magazine.articles.limit(2).offset(0)
#articles = #articles.limit(2).offset(#articles.size - 2)
the first query is completely ignored as well. These two queries generate the SQL:
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "articles"
WHERE "articles"."magazine_id" = $1 LIMIT 2 OFFSET 0)
subquery_for_count [["magazine_id", 1]]
SELECT "articles".* FROM "articles"
WHERE "articles"."magazine_id" = $1
LIMIT 2 OFFSET 2 [["magazine_id", 1]]
Interestingly enough, if I change #articles.size to #articles.length both queries are run as expected. I would think since length requires the collection in memory, the first statement is forced to run. Can anyone describe what's happening here and if it's too broad a topic, point me to a good resource.
It's not so much optimising as deferring execution of the query until it really needs to execute it.
In both cases you're storing the result of building up a query in #articles. Active Record, or more accurately arel, defers execution of the query until you call a method that needs the results. I suspect that you're actually seeing the query being executed against the database when you call something like #artircles.each or #articles.count or somesuch.
You could build the query up in a series of steps and it won't actually get executed:
a = #magazine.articles
a = a.limit(2)
a = a.offset(0)
It also means you can leave some query clause that drastically reduces the result size to the end of the process:
a = a.where('created_at > ?', Time.now.at_beginning_of_day)
Still no query has been sent to the database.
The thing to watch out for is testing this logic in the rails console. If you run these steps in the console itself it tries to display the last return value (by calling .inspect I think) and by inspecting the return value it causes the query to be executed. So if you put a = Magazine.find(1).articles into the console you'll see a query immediately exeecuted which wouldn't have been if the code was run in the context of a controller action for example. If you then call a.limit(2) you'll see another query and so on.
> player.records
Record Load (0.5ms) SELECT * FROM `records` WHERE (`records`.player_id = 1)
> player.records.first(:conditions => {:metric_id => "IS NOT NULL"})
Record Load (0.5ms) SELECT * FROM `records` WHERE (`records`.player_id = 1 AND (`records`.`metric_id` = 'IS NOT NULL')) LIMIT 1
Is there a way to make the second query not hit the database, but use the cache instead? It seems a bit excessive for it to be hitting the database again when they data is already in memory.
I need both results. I'm aware that Ruby can iterate through the values, but I'd prefer to do this through ActiveRecord if possible. I'm coming from a Django background where filter() did this just fine.
I'm using Rails 2.3.
No, simply because the condition is different.
But try to explain the context. Why do you need to use both queries? Can't you use only the second one?
If you need both, why can't you filter the Array with Ruby code instead of making another query?