I have a rails 4.1 app, that on a particular page retrieves a list of orders and lists them out in a table. It's important to note that the list is different depending on the logged in user.
To improve performance of this, I am looking to cache the partials for each order row. I am considering to do it like this:
_order_list.html.erb
<% cache(#orders) do %>
<%= render #orders %>
<% end %>
_order.html.erb
<% cache(order) do %>
...view code for order here
<% end %>
However, I'm unsure about the caching of the collection (#orders). Will all users then be served the same set of cached #orders (which is not desired)?
In other words, how can I ensure to cache the entire collection of #orders for each user individually?
Will all users then be served the same set of cached #orders (which is
not desired)?
Actually cache_digests does not cache #orders themselves. It caches html part of the page for a particular given object or set of objects (e.g. #orders). Each time a user asks for a webpage, #orders variable is going to be set in controller action and its digest is compared to the cached digest.
So, assuming we retrieve #orders like this:
def index
#orders = Order.where(:id => [1,20,34]).all
end
What we gonna get is cached view with a stamp like that:
views/orders/1-20131202075718784548000/orders/20-20131220073309890261000/orders/34-20131223112753448151000/6da080fdcd3e2af29fab811488a953d0
Note that ids of retrieved orders are mentioned in that stamp, so each user with his/her own unique set of orders should obtain his/her own individual cached view.
But here comes some great downsides of your approach:
Page caches are always stored on disk. That means that you can't have page stamp with any desired length. As soon as you retrieve a solid bunch of orders in a time, you exceed your OS's limit for filenames (e.g. it's 255 bytes for linux) and end up with runtime error.
Orders are dynamic content. As soon as at least one of them updates, your cache becomes invalid. Cache generation and saving it to disk is a pretty consuming operation, so it would be better to cache each order individually. In this case you will have to re-generate cache for a single order instead of re-generating the whole massive collection's cache.
Related
I have just started using caching in a production application to speed things up. I've read the primary Rails guide, various blogs, the source itself, etc. But my head is still not clear on one simple thing when it comes to fragment caching:
When you destroy the cache after updating the object, are you only updating the single object, or the class? I think just the single object.
Here's an example:
<% #jobs.each do |job| %>
<% cache("jobs_index_table_environment_#{session[:merchant_id]}_job_#{job}") do %>
stuff
<% end %>
<% end %>
I use the code above in my jobs index page. Each row is rendered with some information the user wants, some CSS, clickable to view the individual job, etc.
I wrote this in my Job class (model)
after_save do
Rails.cache.delete("jobs_index_table_environment_#{merchant_id}_job_#{self}")
end
after_destroy do
Rails.cache.delete("jobs_index_table_environment_#{merchant_id}_job_#{self}")
end
I want the individual job objects destroyed from the cache if they are updated or destroyed, and of course newly created jobs get their own cache key the first time they pop on the page.
I don't do the Russian doll thing with #jobs because this is my "god" object and is changing all the time. The cache would almost never be helpful as the collection probably morphs by the minute.
Is my understanding correct that in the above view, if I rendered, say, 25 jobs to the first page, I would get 25 objects in my cache with the cache key, and then if I only change the first, it's cached value would be destroyed and the next time the jobs page loads, it would be re-cached while the other 24 would just be pulled from the cache?
I'm a novice to fragment caching as well, and I just encountered a very similar use-case so I feel my (limited) knowledge is fresh enough to be of help.
Trosborn is correct, your terminal will highlight when you READ and when you WRITE, which shows you how many "hits" you got on your cache. It should only WRITE when you've changed an object. And based on what I see above, your delete is only deleting individual records.
However, I think there is a potentially simpler way to accomplish this, which is passing the ActiveRecord object to the cache, such as:
<% #jobs.each do |job| %>
<% cache(job) do %>
stuff
<% end %>
<% end %>
Read this post from DHH on how this works. In short, when an AR object is passed to cache, the key is generated not just on the model name, but also on the id and the updated_at fields.
Obsolete fragments eventually get pushed out of the cache when memory runs out, so you don't need to worry about deleting old cache objects.
I have a model named Employee that has over a dozen fields making it impractical to display all fields at once. I would like to allow users to choose which columns are displayed by using either a multiple select box or a list of checkboxes, the result of which would ideally be stored in memory, not in the model since nothing will be saved long term, and be accessible for a loop to display the appropriate columns.
A sample of the view might be like so:
<% for employee in #employees %>
<tr>
<% for col in col_list %>
<td><%= employee.col %></td>
<% end %>
</tr>
<% end %>
where col_list is the list of columns selected by the user.
A better approach might be to output all the columns server side and do the filtering with javascript in the client. There are several libraries for this such as jQuery Datatables.
You can combine this with a preferences model which is persisted to the session or Redis instead of the main RDBMS if you want to remember the user prefs.
(Yes, you can use models for objects not stored in the DB. It gives you all the rails awesomeness of validations, form and param binding etc.)
As it is described, there is nothing particularly complicated in this feature.
If you have the users logged in somewhere, you can store a serialized list of columns somewhere in a preference for the user. The list of columns should probably be sanitized to avoid showing private columns.
If the user is not logged in, or you want a less persistent approach, simply store the list in a cookie.
If the list is not set (aka the cookie is empty), set it to a default or just render a default list of attributes.
Just doing some research on the best way to cache a paginated collection of items. Currently using jbuilder to output JSON and have been playing with various cache_key options.
The best example I've seen is by using the latest record's updated_at plus the amount of items in the collection.
def cache_key
pluck("COUNT(*)", "MAX(updated_at)").flatten.map(&:to_i).join("-")
end
defined here: https://gist.github.com/aaronjensen/6062912
However this won't work for paginated items, where I always have 10 items in my collection.
Are there any workarounds for this?
With a paginated collection, you're just getting an array. Any attempt to monkey patch Array to include a cache key would be a bit convoluted. Your best bet it just to use the cache method to generate a key on a collection-to-collection basis.
You can pass plenty of things to the cache method to generate a key. If you always have 10 items per page, I don't think the count is very valuable. However, the page number, and the last updated item would be.
cache ["v1/items_list/page-#{params[:page]}", #items.maximum('updated_at')] do
would generate a cache key like
v1/items_list/page-3/20140124164356774568000
With russian doll caching you should also cache each item in the list
# index.html.erb
<%= cache ["v1/items_list/page-#{params[:page]}", #items.maximum('updated_at')] do %>
<!-- v1/items_list/page-3/20140124164356774568000 -->
<%= render #items %>
<% end %>
# _item.html.erb
<%= cache ['v1', item] do %>
<!-- v1/items/15-20140124164356774568000 -->
<!-- render item -->
<% end %>
Caching pagination collections is tricky. The usual trick of using the collection count and max updated_at does mostly not apply!
As you said, the collection count is a given so kind of useless, unless you allow dynamic per_page values.
The latest updated_at is totally dependent on the sorting of your collection.
Imagine than a new record is added and ends up in page one. This means that one record, previously page 1, now enters page 2. One previous page 2 record now becomes page 3. If the new page 2 record is not updated more recently than the previous max, the cache key stays the same but the collection is not! The same happens when a record is deleted.
Only if you can guarantee that new records always end up on the last page and no records will ever be deleted, using the max updated_at is a solid way to go.
As a solution, you could include the total record count and the total max updated_at in the cache key, in addition to page number and the per page value. This will require extra queries, but can be worth it, depending on your database configuration and record count.
Another solution is using a key that takes into account some reduced form of the actual collection content. For example, also taking into account all record id's.
If you are using postgres as a database, this gem might help you, though I've never used it myself.
https://github.com/cmer/scope_cache_key
And the rails 4 fork:
https://github.com/joshblour/scope_cache_key
In my user model I have a friends method that returns the hash of all the user's facebook friends. In my view I iterate through the entire list to paginate that list alphabetically. I can't tell if my server is running really slow or if this is extremely inefficient. How can I make this process faster? Is it better to maybe create a friendsmodel? Please let me know if my method is inefficient, why, and how I might be able to make it faster. Thanks!
In my Home.html.erb view I have <%letter ='a'%> which changes when the user selects a different letter and the page refreshes.
<% current_user.friends.each do |user| %>
<% if user['name'].downcase.start_with? letter %>
do something
<% end %>
User Model
def facebook
#facebook ||= Koala::Facebook::API.new(token)
block_given? ? yield(#facebook) : #facebook
rescue Koala::Facebook::APIError => e
logger.info e.to_s
nil
end
def friends
facebook {|fb| fb.get_connections("me","friends")}.sort{|a,b| a['name']<=>b['name']}
end
You are making an external API call for every request. Plus user may have good number of friends like 500, 1000.
I in my fb app processing the data in background job(delayed job). You can use resque or sidekiq or some other background to process user data.
I would suggest you to make Friend model and have its association with users model. Then if you have some n+1 query problem you can use includes and instead of using sort use order it would be much faster then sort. Moreover instead of using each use find_each it will process the data in chunks you can google the difference between each and find_each. hope it would be helpful
One thing that will be slowing down each request for sure is the fact that your making an external API call in the middle of the request. The second thing to note is that your potentially bringing back a large amount of data, easily getting into the hundreds, if not thousands.
A more appropriate way to handle this would be to create a Friend model where each friend has a belongs to relationship to the User. In a background processor (ie delayed job, resque, sidekiq), iterate through your users and update their friends at some interval that your server can tolerate. It will cause some lag as to when the user's friends will show up. You'll have to be the judge as to how much lag time is tolerable, and it depends largely on your number of users and budget for hardware.
This is effectively a caching mechanism, and you may want to account for the fact that data will change, friends may be removed and so on. You could delete all the friends and recreate the whole list on each refresh. Doing so inside a transaction will keep the deletes from showing up until it is commited.
I'm hoping to get advice on the proper use of caching to speed up a timeline query in Rails. Here's the background:
I'm developing an iPhone app with a Rails backend. It's a social app, and like other social apps, its primary view is a timeline (i.e., newsfeed) of messages. This works very much like Twitter, where the timeline is made up of messages of the user and of his/her followers. The main query in the API request to retrieve the timeline is the following:
#messages = Message.where("user_id in (?) OR user_id = ?", current_user.followed_users.map(&:id), current_user)
Now this query gets quite inefficient, particularly at scale, so I'm looking into caching. Here are the two things I'm planning to do:
1) Use Redis to cache timelines as lists of message ids
Part of what makes this query so expensive is figuring out which messages to display on-the-fly. My plan here is to keep create a Redis list of message ids for each user. Assuming I build this correctly when a Timeline API request comes in I can call Redis to get a pre-processed ordered list of the ids of the messages to display. For example, I might get something like this: "[21, 18, 15, 14, 8, 5]"
2) Use Memcached to cache individual message objects
While I believe the first point will help a great deal, there's still the potential problem of retrieving the individual message objects from the database. The message objects can get quite big. With them, I return related objects like comments, likes, the user, etc. Ideally, I would cache these individual message objects as well. This is where I'm confused.
Without caching, I would simply make a query call like this to retrieve the message objects:
#messages = Message.where("id in (?)", ids_from_redis)
Then I would return the timeline:
respond_with(:messages => #messages.as_json) # includes related likes, comments, user, etc.
Now given my desire to utilize Memcache to retrieve individual message objects, it seems like I need to retrieve the messages one at a time. Using psuedo-code I'm thinking something like this:
ids_from_redis.each do |m|
message = Rails.cache.fetch("message_#{m}") do
Message.find(m).as_json
end
#messages << message
end
Here are my two specific questions (sorry for the lengthy build):
1) Does this approach generally make sense (redis for lists, memcached for objects)?
2) Specifically, on the pseudo-code below, is this the only way to do this? It feels inefficient grabbing the messages one-by-one but I'm not sure how else to do it given my intention to do object-level caching.
Appreciate any feedback as this is my first time attempting something like this.
On the face of it, this seems reasonable. Redis is well suited to storing lists etc, can be made persistent etc, and memcached will be very fast for retrieving individual messages, even if you call it sequentially like that.
The issue here is that you're going to need to clear/supplement that redis cache each time a message is posted. It seems a bit of a waste just to clear the cache in this circumstance, because you'll already have gone to the trouble of identifying every recipient of the message.
So, without wishing to answer the wrong question, have you thought about 'rendering' the visibility of messages into the database (or redis, for that matter) when each message is posted? Something like this:
class Message
belongs_to :sender
has_many :visibilities
before_create :render_visibility
sender.followers.each do |follower|
visibilities.build(:user => follower)
end
def
end
You could then render the list of messages quite simply:
class User
has_many :visibilities
has_many :messages, :through => :visibilities
end
# in your timeline view:
<%= current_user.messages.each { |message| render message } %>
I would then add of individual messages like this:
# In your message partial, caching individual rendered messages:
<%= cache(message) do %>
<!-- render your message here -->
<% end %>
I would also then add caching of entire timelines like this:
# In your timeline view
<%= cache("timeline-for-#{current_user}-#{current_user.messages.last.cache_key}") do %>
<%= current_user.messages.each { |message| render message } %>
<% end %>
What this should achieve (I've not tested it) is that the entire timeline HTML will be cached until a new message is posted. When that happens, the timeline will be re-rendered, but all the individual messages will come from the cache rather than being rendered again (with the possible exception of any new ones that haven't been viewed by anyone else!)
Note that this assumes that the message rendering is the same for every user. If it isn't, you'll need to cache the messages per user too, which would be a bit of a shame, so try not to do this if you can!
FWIW, I believe this is vaguely (and I mean vaguely) what twitter do. They have a 'big data' approach to it though, where the tweets are exploded and inserted into follower timelines across a large cluster of machines. What I've described here will struggle to scale in a write-heavy environment with lots of followers, although you could improve this somewhat by using resque or similar.
P.S. I've been a bit lazy with the code here - you should look to refactor this to move e.g. the timeline cache key generation into a helper and/or the person model.