Ruby's .where vs. detect - ruby-on-rails

I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:
Where:
User.where(id: 1)
# User Load (0.5ms)
Detect:
User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where
I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?
Edited for clarity.
.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?

In this example
User.find(1) # or
User.find_by(id: 1)
will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.
Whereas
User.where(id: 1)
would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.
In opposite to
User.all.detect { |u| u.id == 1 }
that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.
Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.

NB One should use ActiveRecord#find in this particular case, please refer to the answer by #spickermann instead.
User.where is executed on DB level, returning one record.
User.all.detect will return all the records to the application, and only then iterate through on ruby level.
That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.

Here's a general guide:
Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.
Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.
Use .where(...) when retrieving multiple records.
Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)

Related

Querying cached records in Rails seems slow

I'm doing a query based suggestion API in Rails, with suggestions being returned to the user as they type. In order to avoid hitting the database too often, I decided to cache the records.
def cached_values
Rails.cache.fetch(:values_cached, expires_in: 1.day) do
Table.verified.select(:value).all
end
end
cached_values
=>
[#<Table:0x000056406fc70370 id: nil, value: "xxx">,
#<Table:0x000056406fc77f80 id: nil, value: "xxx">,
#<Table:0x000056406fc77d00 id: nil, value: "xxx">
...
I'm aware it's not a good practice to cache ActiveRecord entries, but the "verified" scope is relatively small (~6k rows) and I want to query it further. So when a call to the API is made, I query the cached values (simplified, the real one is sanitized):
def query_cached(query)
cached_values.where("value LIKE '%#{query}%'").to_a
end
The issue here is that I have tested both cached and uncached queries, and the later has better performance. Setting Rails.logger.level = 0, I noticed the cached query still logs a database query:
pry(main)> query_cached("a")
Table Load (1.2ms) SELECT "table"."value" FROM "table" WHERE "table"."verified" = TRUE AND (value LIKE '%a%')
My guess is that the cached search is both opening a connection to the database and loading the cached records, taking more time but being effectively useless. Is there any reliable way to check that?
If the cache is just slower, maybe it is still worth keeping it and preventing too many connections to the database.
Benchmark for 10000 queries each:
user system total real
uncached 20.110681 0.369983 20.480664 ( 26.935934)
cached 23.750934 0.753414 24.504348 ( 34.198694)
It is important to note that all doesn't return all records in an array. Instead it returns an ActiveRecord::Relation object. Such a relation represents a database query that might be called later or that can be extended by more conditions like, for example, .where("value LIKE '%#{query}%'"). If all already returned an array of records then you would not be able to add the additional condition with .where("value LIKE '%#{query}%'") to it because where doesn't exist on arrays.
Because you only cached the Relation that represents a database query that can be run by calling a method that needs the actual records (like each, to_a, first) but hasn't run yet, the caching is useless in this case.
Additionally, I would argue that caching is not useful in the context of this example at all because you would need to cache different values for each different user input. That means if the user searched for foo then you can cache that result but if another user then searches for bar you would still need to run another query to the database. Only if two users search for the same string the cache might be useful.
In your example, a full-text index in the database might be the better choice.

How can I disable lazy loading of active record queries?

I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length

Is there a lazy select_all option in Rails?

I've got a complicated query that I need to run, and it can potentially yield a large result set. I need to iterate linearly through this result set in order to crunch some numbers.
I'm executing the query like so:
ActiveRecord::Base.connection.select_all(query)
find_in_batches Won't work for my use case, as it's critical that I get the records in a custom order. Also, my query returns some fields that aren't part of any models, so I need to get the records as hashes.
The problem is, select_all is not lazy (from what I can tell). It loads all of the records into memory. Does Rails have a way to lazily get the results for a custom SQL query? .lazy doesn't seem applicable here, as I need custom ordering of the results.
This is possible in other languages (C#, Haskell, JavaScript), so it seems like it would be possible in Ruby.
Not sure but maybe you're asking for eager_load or preload.
http://blog.arkency.com/2013/12/rails4-preloading/
Hope this can help you.
You can try find_each or find_in_batches ActiveRecord methods.
Both query database in configurable-sized batches.
The difference it that find_each yields objects one-by-one to block (they are lazy initialized).
find_in_batches yields whole batch group.
If you can't use above methods due to custom sorting, what you can do is query the database using limit and offset. This way you will deal with data in portions. Memory consumption will decrease, but number of queries will increase.
Other solution may be to let database engine perform arithmetic operations, that you need and return calculated result.

Sanitizing arrays in select_all SQL queries in Rails 4

I need to pull aggregated columns from my database in a way that's unique to this page, crosses several models, and is expensive to aggregate in memory on a page that already has performance concerns. As a result, I want to make a hand-written SQL query to the database and return a bunch of simple objects (like hashes, or structs) which I can then dole out to the existing objects that consume this information. ActiveRecord::Base.connection.select_all is perfect for what I want.
My SQL string takes an array of ids:
WHERE entry.task_id IN (#{#project.tasks.id})
but this doesn't work because it returns a wrapped array, and connection.quote returns a bulleted list for some reason. What would be the best way to get the information I want? Should I manually strip off the [ and ]? Is there a handy, accessible function like sanitize_sql_array that'll do the trick? Should I never be calling select_all as part of normal operation?
This appears to have been easier in the past, but most of the methods I would use have been protected (including sanitize_sql_array). Rails really wants me to use ActiveRecord query methods, it seems.
Join and make your array a string, then pass it to the SQL:
ids_string = ids.join(', ')

What is the difference between these two statements, and why would you choose them?

I'm a beginner at rails. And I've come to understand two different ways to return the same result.
What is the difference between these two? And what situation would require you to choose one from the other?
Example 1:
Object.find(:all).select {|c| c.name == "Foobar" }.size
Example 2:
Object.count(:conditions => ['name = ?', 'Foobar'])
FURTHER NOTE:
I seriously wish I could vote everyone correct answers for this one. Thank you so much. I just had a serious rails affirmation.
Object.count always hits the DB, and the find()....size() call can optimize. Good discussion here
http://rhnh.net/2007/09/26/counting-activerecord-associations-count-size-or-length
Example 1:
This constructs a query:
SELECT * FROM objects
then turns all the records into a collection of objects in your memory, then iterates through every object to see if it meets the condition, and then counts the number of elements that meet condition.
Example 2:
This constructs a query:
SELECT count(id) FROM objects WHERE name = 'Foobar'
lets sql do all the hard work, and returns just an integer - a number of objects meeting condition.
Usually you want no 2 - faster and less memory
Example 1 will load all of your records from the DB (assuming Object is an ActiveRecord model), then uses Ruby to reduce the set, and then return the size of that array. So this is potentially memory and CPU heavy - not good.
Example 2 performs the count in SQL, so all the heavy lifting is performed in the database, not in Ruby. Much better :)
In example 1, you are getting all objects from the datastore, and then iterating over all of them, selecting the objects that has the name Foobar. And then getting the size of that array. Example 1 is the clear loser here.
Example 1 sql:
select * from whatever
# then iterate over entire array
Example two executes a where clause in SQL to the datastore.
select count(id) from whatever where name = 'foobar'
# The SQL above is sql-server accurate, but not necessarily mysql or sqlite3

Resources