I'm doing a query based suggestion API in Rails, with suggestions being returned to the user as they type. In order to avoid hitting the database too often, I decided to cache the records.
def cached_values
Rails.cache.fetch(:values_cached, expires_in: 1.day) do
Table.verified.select(:value).all
end
end
cached_values
=>
[#<Table:0x000056406fc70370 id: nil, value: "xxx">,
#<Table:0x000056406fc77f80 id: nil, value: "xxx">,
#<Table:0x000056406fc77d00 id: nil, value: "xxx">
...
I'm aware it's not a good practice to cache ActiveRecord entries, but the "verified" scope is relatively small (~6k rows) and I want to query it further. So when a call to the API is made, I query the cached values (simplified, the real one is sanitized):
def query_cached(query)
cached_values.where("value LIKE '%#{query}%'").to_a
end
The issue here is that I have tested both cached and uncached queries, and the later has better performance. Setting Rails.logger.level = 0, I noticed the cached query still logs a database query:
pry(main)> query_cached("a")
Table Load (1.2ms) SELECT "table"."value" FROM "table" WHERE "table"."verified" = TRUE AND (value LIKE '%a%')
My guess is that the cached search is both opening a connection to the database and loading the cached records, taking more time but being effectively useless. Is there any reliable way to check that?
If the cache is just slower, maybe it is still worth keeping it and preventing too many connections to the database.
Benchmark for 10000 queries each:
user system total real
uncached 20.110681 0.369983 20.480664 ( 26.935934)
cached 23.750934 0.753414 24.504348 ( 34.198694)
It is important to note that all doesn't return all records in an array. Instead it returns an ActiveRecord::Relation object. Such a relation represents a database query that might be called later or that can be extended by more conditions like, for example, .where("value LIKE '%#{query}%'"). If all already returned an array of records then you would not be able to add the additional condition with .where("value LIKE '%#{query}%'") to it because where doesn't exist on arrays.
Because you only cached the Relation that represents a database query that can be run by calling a method that needs the actual records (like each, to_a, first) but hasn't run yet, the caching is useless in this case.
Additionally, I would argue that caching is not useful in the context of this example at all because you would need to cache different values for each different user input. That means if the user searched for foo then you can cache that result but if another user then searches for bar you would still need to run another query to the database. Only if two users search for the same string the cache might be useful.
In your example, a full-text index in the database might be the better choice.
Related
I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length
I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:
Where:
User.where(id: 1)
# User Load (0.5ms)
Detect:
User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where
I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?
Edited for clarity.
.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?
In this example
User.find(1) # or
User.find_by(id: 1)
will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.
Whereas
User.where(id: 1)
would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.
In opposite to
User.all.detect { |u| u.id == 1 }
that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.
Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.
NB One should use ActiveRecord#find in this particular case, please refer to the answer by #spickermann instead.
User.where is executed on DB level, returning one record.
User.all.detect will return all the records to the application, and only then iterate through on ruby level.
That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.
Here's a general guide:
Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.
Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.
Use .where(...) when retrieving multiple records.
Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)
Can default_scope when used to not order records by ID significantly slow down a Rails application?
For example, I have a Rails (currently 3.1) app using PostgreSQL where nearly every Model has a default_scope ordering records by their name:
default_scope order('users.name')
Right now because the default_scope's order records by name rather by ID, I am worried I might be incurring a significant performance penalty when normal queries are run. For example with:
User.find(5563401)
or
#User.where('created_at = ?', 2.weeks.ago)
or
User.some_scope_sorted_best_by_id.all
In the above examples, what performance penalty might I incur by having a default_scope by name on my Model? Should I be concerned about this default_scope affecting application performance?
Your question is missing the point. The default scope itself is just a few microseconds of Ruby execution to cause an order by clause to be added to every SQL statement sent to PostgreSQL.
So your question is really asking about the performance difference between unordered queries and ordered ones.
Postgresql documentation is pretty explicit. Ordered queries on unindexed fields are much slower than unordered because (no surprise), PostgreSQL must sort the results before returning them, first creating temporary table or index to contain the result. This could easily be a factor of 4 in query time, possibly much more.
If you introduce an index just to achieve quick ordering, you are still paying to maintain the index on every insert and update. And unless it's the primary index, sorted access still involves random seeks, which may actually be slower than creating a temporary table. This also is discussed in the Postgres docs.
In a nutshell, NEVER add an order clause to an SQL query that doesn't need it (unless you enjoy waiting for your database).
NB: I doubt a simple find() will have order by attached because it must return exactly one result. You can verify this very quickly by starting rails console, issuing a find, and watching the generated SQL scroll by. However, the where and all definitely will be ordered and consequently definitely be slower than needed.
I have a CSV file that contains data like the
id of user, unit and size.
I want to update member_id for 500,000 products:
500000.times do |i|
user = User.find(id: tmp[i])
hash = {
unit: tmp[UNIT],
size: tmp[SIZE]
}
hash.merge!(user_id: user.id) if user.present?
Product.create(hash)
end
How do I optimize that procedure to not find each User object but maybe get an array of related hashes?
There's two things here that are massively holding back performance. First you're doing N User.find calls which is totally out of control. Secondly you're creating individual records instead of doing a mass-insert each of which runs inside its own tiny transaction block.
Generally these sorts of bulk operations are better done purely in the SQL domain. You can insert a very large number of rows at the same time, often only limited by the size of the query you can submit, and that parameter is usually adjustable.
While a gigantic query may lock or block your database for a period of time, it will be the fastest way to do your updates. If you need to keep your system running during mass inserts, you'll need to break it up into a series of smaller commits.
Remember that Product.connection is a more low-level access layer allowing you to manipulate the data directly with queries.
I am caching the results of a model like so (with Memcached):
Rails.cache.fetch('Store.all') { Store.all }
Later, I am wanting to retrieve a subset of Store.all, such as stores in a certain city. Is there an easy way to query the already cached set of Stores, or do I need to hit the database again for each query?
Thanks!
Remember, the database is optimized for doing queries matching arbitrary conditions, and the cache store is just a fast lookup given you have a known key. You should use cache only for things that you've already filtered or prepared.
Assuming a cache key like cities/1/stores for all the stores in city 1, you could cache this collection and fetch it again later.
If you have a large number of stores, It would be an anti-optimization to try and cache Store.all as one cache key and then try to filter it with ruby for a given city or any other criteria. Your program would be forced to iterate over all the cities, since arrays don't have indexes on city_id. You'd be much better off letting the database do this work with a "where" clause and use the indexing power the database provides.
You can do it with ruby. First, fetch your results from cache, then iterate through the collection to find stores in a given city. Assuming you are using memcached, there is no easy way to query it, since it's a simple key:value store.