Confusion caching Active Record queries with Rails.cache.fetch - ruby-on-rails

My version is:
Rails: 3.2.6
dalli: 2.1.0
My env is:
config.action_controller.perform_caching = true
config.cache_store = :dalli_store, 'localhost:11211', {:namespace => 'MyNameSpace'}
When I write:
Rails.cache.fetch(key) do
User.where('status = 1').limit(1000)
end
The user model can't be cached. If I use
Rails.cache.fetch(key) do
User.all
end
it can be cached. How to cache query result?

The reason is because
User.where('status = 1').limit(1000)
returns an ActiveRecord::Relation which is actually a scope, not a query. Rails caches the scope.
If you want to cache the query, you need to use a query method at the end, such as #all.
Rails.cache.fetch(key) do
User.where('status = 1').limit(1000).all
end
Please note that it's never a good idea to cache ActiveRecord objects. Caching an object may result in inconsistent states and values. You should always cache primitive objects, when applicable. In this case, consider to cache the ids.
ids = Rails.cache.fetch(key) do
User.where('status = 1').limit(1000).pluck(:id)
end
User.find(ids)
You may argue that in this case a call to User.find it's always executed. It's true, but the query using primary key is fast and you get around the problem I described before. Moreover, caching active record objects can be expensive and you might quickly end up filling all Memcached memory with just one single cache entry. Caching ids will prevent this problem as well.

In addition to selected answer: for Rails 4+ you should use load instead of all for getting the result of scope.

Rails.cache.fetch caches exactly what the block evaluates to.
 User.where('status = 1').limit(1000)
Is just a scope, so what gets cached is just the ActiveRecord::Relation object, ie the query, but not its results (because the query has not been executed yet).
If you want something useful to be cached, you need to force execution of the query inside the block, for example by doing
User.where('status = 1').limit(1000).all
Note that on rails 4 , all doesn't force loading of the relation - use to_a instead

Using
User.where("status = 1").limit(1000).all
should work.

Related

Ruby on Rails #find_each and #each. is this bad practice?

I know that find_each has been designed to consume smaller memory than each.
I found some code that other people wrote long ago. and I think that it's wrong.
Think about this codes.
users = User.where(:active => false) # What this line does actually? Nothing?
users.find_each do |user|
# update or do something..
user.update(:do_something => "yes")
end
in this case, It will store all user objects to the users variable. so we already populated the full amount of memory space. There is no point using find_each later on.
Am I correct?
so in other words, If you want to use find_each, you always need to use it with ActiveRecord::Relation object. Like this.
User.where(:active => false).find_each do |user|
# do something...
end
What do you think guys?
Update
in users = User.where(:active => false) line,
Some developer insists that rails never execute query unless we don't do anything with that variable.
What if we have a class with initialize method that has query?
class Test
def initialize
#users = User.where(:active => true)
end
def do_something
#user.find_each do |user|
# do something really..
end
end
end
If we call Test.new, what would happen? Nothing will happen?
users = User.where(:active => false) doesn't run a query against the database and it doesn't return an array with all inactive users. Instead, where returns an ActiveRecord::Relation. Such a relation basically describes a database query that hasn't run yet. The defined query is only run against the database when the actual records are needed. This happens for example when you run one of the following methods on that relation: find, to_a, count, each, and many others.
That means the change you did isn't a huge improvement, because it doesn't change went and how the database is queried.
But IMHO that your code is still slightly better because when you do not plan to reuse the relation then why assign it to a variable in the first place.
users = User.where(:active => false)
users.find_each do |user|
User.where(:active => false).find_each do |user|
Those do the same thing.
The only difference is the first one stores the ActiveRecord::Relation object in users before calling #find_each on it.
This isn't a Rails thing, it applies to all of Ruby. It's method chaining common to most object-oriented languages.
array = Call.some_method
array.each{ |item| do_something(item) }
Call.some_method.each{ |item| do_something(item) }
Again, same thing. The only difference is in the first the intermediate array will persist, whereas in the second the array will be built and then eventually deallocated.
If we call Test.new, what would happen? Nothing will happen?
Exactly. Rails will make an ActiveRecord::Relation and it will defer actually contacting the database until you actually do a query.
This lets you chain queries together.
#inactive_users = User.where(active: false).order(name: :asc)
Later you can to the query
# Inactive users whose favorite color is green ordered by name.
#inactive_users.where(favorite_color: :green).find_each do |user|
...
end
No query is made until find_each is called.
In general, pass around relations rather than arrays of records. Relations are more flexible and if it's never used there's no cost.
find_each is special in that it works in batches to avoid consuming too much memory on large tables.
A common mistake is to write this:
User.where(:active => false).each do |user|
Or worse:
User.all.each do |user|
Calling each on an ActiveRecord::Relation will pull all the results into memory before iterating. This is bad for large tables.
find_each will load the results in batches of 1000 to avoid using too much memory. It hides this batching from you.
There are other methods which work in batches, see ActiveRecord::Batches.
For more see the Rails Style Guide and use rubocop-rails to scan your code for issues and make suggestions and corrections.

Mark as persistent an ActiveRecord item recovered from serialisation

I have a situation where a set of items, known to be in the database, is recovered from a serialised json array, including their ids, so they can be reshaped to ActiveRecord model instances just calling new, without access to the database:
for itm in a do
item = Item.new(itm)
itmlist << item
emd
Now, the problem is, how to tell ActiveRecord that these elements are already persisted and not new? If item.new_record? is true, a item.save will fail because Rails will insert instead of update.
The goal is to make sure that Rails does update, without any extra queries to the database. The closest thing I have got is
item = Item.new(itm)
item.instance_variable_set(:#new_record, false)
with plays with ActiveRecord Internals
Not sure I completely understand the question but if you just want to update all the items the following will work
a.each do |item_hash|
Item.find(item_hash["id"]).update(item_hash.except("id"))
end
If the Item may or may not exist then
a.each do |item|
item = Item.find(item_hash["id"]) || Item.new
item.update(item_hash.except("id"))
end
Neither one of these options will handle validation failures. Depending on your usage the following could be useful
all_items = a.map do |item_hash|
item = Item.find(item_hash["id"]) || Item.new
item.assign_attributes(item_hash.except("id"))
end
pass,fail = all_items.partition(&:save)
If you only care about the failures you can change this to: fail = all_items.reject(&:save)
If there are a substantial number of items there are more performant alternatives as well that avoid so many queries. e.g. Item.where(id: a.map {|i| i["id"]})
Apparently, reload-ing works:
thing = Thing.last
thing_attributes = thing.attributes
same_thing = Thing.new(thing_attributes)
same_thing.new_record? # => true
same_thing.reload
same_thing.new_record? # => false
From the question, I see that your concern is only about ActiveRecord performing an INSERT query instead of the intended UPDATE, so reloading shouldn't be a problem. But, if my guess is wrong and you don't even want to reload, then it might be difficult without fiddling with the internals of ActiveRecord since it doesn't provide any way to instantiate already persisted records.
Possible alternate solution
Pardon me if the solution won't work in your case, but instead of serialising the entire objects, just serialise an array of IDs. So that you can re-fetch them in one go:
Thing.where(id: the_array_of_ids)

Rails 4 - Low-level Caching Still Querying Database [duplicate]

My version is:
Rails: 3.2.6
dalli: 2.1.0
My env is:
config.action_controller.perform_caching = true
config.cache_store = :dalli_store, 'localhost:11211', {:namespace => 'MyNameSpace'}
When I write:
Rails.cache.fetch(key) do
User.where('status = 1').limit(1000)
end
The user model can't be cached. If I use
Rails.cache.fetch(key) do
User.all
end
it can be cached. How to cache query result?
The reason is because
User.where('status = 1').limit(1000)
returns an ActiveRecord::Relation which is actually a scope, not a query. Rails caches the scope.
If you want to cache the query, you need to use a query method at the end, such as #all.
Rails.cache.fetch(key) do
User.where('status = 1').limit(1000).all
end
Please note that it's never a good idea to cache ActiveRecord objects. Caching an object may result in inconsistent states and values. You should always cache primitive objects, when applicable. In this case, consider to cache the ids.
ids = Rails.cache.fetch(key) do
User.where('status = 1').limit(1000).pluck(:id)
end
User.find(ids)
You may argue that in this case a call to User.find it's always executed. It's true, but the query using primary key is fast and you get around the problem I described before. Moreover, caching active record objects can be expensive and you might quickly end up filling all Memcached memory with just one single cache entry. Caching ids will prevent this problem as well.
In addition to selected answer: for Rails 4+ you should use load instead of all for getting the result of scope.
Rails.cache.fetch caches exactly what the block evaluates to.
 User.where('status = 1').limit(1000)
Is just a scope, so what gets cached is just the ActiveRecord::Relation object, ie the query, but not its results (because the query has not been executed yet).
If you want something useful to be cached, you need to force execution of the query inside the block, for example by doing
User.where('status = 1').limit(1000).all
Note that on rails 4 , all doesn't force loading of the relation - use to_a instead
Using
User.where("status = 1").limit(1000).all
should work.

Ruby on Rails instance variable caching based on parameters

I am storing users miscellaneous data in user_data table and when I am retrieving that data with the association I defined and then I am actually caching it using Ruby instance Variable caching like this.
def user_data(user_id)
#user_data || = User.find(user_id).data
end
but instance variable #user_data will be allocated value only first time when it's nil and once it hold data for a user lets say for user_id equal to 1,and when I am passing user_id 2 in this method it's returning data for user_id 1 , because it will not allocate new value to it so my question is how can I cache it's value based on parameters of function.
Take a look at the Caching with Rails guide. Rails can cache data on multiple levels from full page caching to fragment caching, I strongly advise you to read all this page so you can make a perceived choice.
For the low-level caching you can do this:
#user_data = Rails.cache.fetch("user_#{user_id}", expires_in: 1.hour) do
User.find(user_id).data
end
By default Rails stores cache on disk, but you can setup for memcache, memory store, and etc.
You can use a hash for a key-based intance-variable-cache. I think that does what you want.
def user_data(user_id)
#user_data ||= {}
#user_data[user_id.to_i] || = User.find(user_id).data
end

Rails cache & ActiveRecord eager fetching - Fetch only if the fragment hasn't been cached

I have a controller method which currently looks like:
#people = Person.where(conditions).includes(eager_fetch).all
I'm now trying to make the controller cache-aware. Since the eager fetch is rather expensive, I want to avoid loading as much data as possible. If it's relevant, the output is XML from an RPC style endpoint. I've arrived at:
#people = Person.where(conditions).all
#fragments = {}
#people.dup.each do |person|
cache_key = "fragment-for-#{person.id}-#{person.updated_at.to_i}"
fragment = Rails.cache.fetch(cache_key)
unless fragment.nil?
#fragments[person.id] = fragment
#people.delete person
end
end
#people = Person.where(:id => #people.collect(&:id)).includes(eager_fetch).all
There's another possibility, which is very much the same, except instead of re-querying on the last line,
Person.send :preload_associations, #people, eager_fetch
Am I missing an important piece of API for handling this correctly? Currently on Rails 3.0.12, but will be upgrading to 3.2.x, so a solution that only works with 3.2.x would be fine. Neither of my solutions seem elegant to me.
(I've anonymized and simplified this code, apologies if I've left out anything important)
Don't rely on ActiveRecord's eager loading. It will load everything that isn't in the ActiveRecord per-request query cache.
Instead query for your primary object, and then use your own crafty method to fetch the cached things and query the slower datastore for the missed ID's.

Resources