I'm hoping to get advice on the proper use of caching to speed up a timeline query in Rails. Here's the background:
I'm developing an iPhone app with a Rails backend. It's a social app, and like other social apps, its primary view is a timeline (i.e., newsfeed) of messages. This works very much like Twitter, where the timeline is made up of messages of the user and of his/her followers. The main query in the API request to retrieve the timeline is the following:
#messages = Message.where("user_id in (?) OR user_id = ?", current_user.followed_users.map(&:id), current_user)
Now this query gets quite inefficient, particularly at scale, so I'm looking into caching. Here are the two things I'm planning to do:
1) Use Redis to cache timelines as lists of message ids
Part of what makes this query so expensive is figuring out which messages to display on-the-fly. My plan here is to keep create a Redis list of message ids for each user. Assuming I build this correctly when a Timeline API request comes in I can call Redis to get a pre-processed ordered list of the ids of the messages to display. For example, I might get something like this: "[21, 18, 15, 14, 8, 5]"
2) Use Memcached to cache individual message objects
While I believe the first point will help a great deal, there's still the potential problem of retrieving the individual message objects from the database. The message objects can get quite big. With them, I return related objects like comments, likes, the user, etc. Ideally, I would cache these individual message objects as well. This is where I'm confused.
Without caching, I would simply make a query call like this to retrieve the message objects:
#messages = Message.where("id in (?)", ids_from_redis)
Then I would return the timeline:
respond_with(:messages => #messages.as_json) # includes related likes, comments, user, etc.
Now given my desire to utilize Memcache to retrieve individual message objects, it seems like I need to retrieve the messages one at a time. Using psuedo-code I'm thinking something like this:
ids_from_redis.each do |m|
message = Rails.cache.fetch("message_#{m}") do
Message.find(m).as_json
end
#messages << message
end
Here are my two specific questions (sorry for the lengthy build):
1) Does this approach generally make sense (redis for lists, memcached for objects)?
2) Specifically, on the pseudo-code below, is this the only way to do this? It feels inefficient grabbing the messages one-by-one but I'm not sure how else to do it given my intention to do object-level caching.
Appreciate any feedback as this is my first time attempting something like this.
On the face of it, this seems reasonable. Redis is well suited to storing lists etc, can be made persistent etc, and memcached will be very fast for retrieving individual messages, even if you call it sequentially like that.
The issue here is that you're going to need to clear/supplement that redis cache each time a message is posted. It seems a bit of a waste just to clear the cache in this circumstance, because you'll already have gone to the trouble of identifying every recipient of the message.
So, without wishing to answer the wrong question, have you thought about 'rendering' the visibility of messages into the database (or redis, for that matter) when each message is posted? Something like this:
class Message
belongs_to :sender
has_many :visibilities
before_create :render_visibility
sender.followers.each do |follower|
visibilities.build(:user => follower)
end
def
end
You could then render the list of messages quite simply:
class User
has_many :visibilities
has_many :messages, :through => :visibilities
end
# in your timeline view:
<%= current_user.messages.each { |message| render message } %>
I would then add of individual messages like this:
# In your message partial, caching individual rendered messages:
<%= cache(message) do %>
<!-- render your message here -->
<% end %>
I would also then add caching of entire timelines like this:
# In your timeline view
<%= cache("timeline-for-#{current_user}-#{current_user.messages.last.cache_key}") do %>
<%= current_user.messages.each { |message| render message } %>
<% end %>
What this should achieve (I've not tested it) is that the entire timeline HTML will be cached until a new message is posted. When that happens, the timeline will be re-rendered, but all the individual messages will come from the cache rather than being rendered again (with the possible exception of any new ones that haven't been viewed by anyone else!)
Note that this assumes that the message rendering is the same for every user. If it isn't, you'll need to cache the messages per user too, which would be a bit of a shame, so try not to do this if you can!
FWIW, I believe this is vaguely (and I mean vaguely) what twitter do. They have a 'big data' approach to it though, where the tweets are exploded and inserted into follower timelines across a large cluster of machines. What I've described here will struggle to scale in a write-heavy environment with lots of followers, although you could improve this somewhat by using resque or similar.
P.S. I've been a bit lazy with the code here - you should look to refactor this to move e.g. the timeline cache key generation into a helper and/or the person model.
Related
For performance reason, I use as often as possible the only() keyword when writing up a mongoid query in order to specify the fields I want to load.
The usual suspect, is for instance when I want a user's email of all my admins only for display purposes.
I would write:
User.where(:groups => :admins).only(:email).each do |u|
puts u.email
end
I do this because my user model are quite full of a lot of data that I can gladly ignore when listing a bunch of emails.
However, now let imagine, that my users are referenced via a Project model, so that for each project I can do: project.user. Thanks to mongoid's lazy loading, my object user will only get instantiated (and queried from the DB) when I call upon the reference.
But what if I want to list all the email of the owner of all admin project for instance ?
I would write this:
Project.where(:admin_type => true).each do |p|
puts p.user.email
end
The major problem here is that doing this, I load the entire user object for each projects, and if there are lots of project matching the query that could get pretty heavy. So how do I load only the emails ?
I could do this:
User.where(:_id => p.user_id).only(:email).first.email
But this obviously defeat the purpose of the nice syntax of simply doing:
p.user.email
I wish I could write something like: p.user.only(:email).email, but I can't. Any ideas ?
Alex
Answer from creator of Mongoid. It's not possible yet. It's been added as feature request.
I think you need to denormalize here. First of all, read A Note on Denormalization.
You can implement denormalization by self using mongoid events or use great mongoid_denormalize gem. It pretty straight and after implementing it you could use p.user_email or something in your queries.
I have a RoR application which contains an API to manage applications, each of which contain recipes (and groups, ingredients, measurements).
Once the user has finished managing the recipes, they download a JSON file of the entire application. Because each application could have hundreds of recipes, the files can be large. It also means there is a lot of DB calls to get all the required data to export.
Now because of this, the request to download the application can take upwards of 30 seconds, sometimes more.
My current code looks something like this:
application.categories.each do |c|
c.recipes.each do |r|
r.groups.each do |r|
r.ingredients.each do |r|
Within each loop I'm storing the data in a HASH and then giving it to the user.
My question is: where do I go from here?
Is there a way to grab all the data I require from the DB in one query? From looking at the log, I can see it is running hundreds of queries.
If the above solution is still slow, is this something I should put into a background process, and then email the user a link (or similar)?
There are of course ways to grab more data at once. This is done with Rails includes or joins, depending on your needs. See this article for some detailed information.
The basic idea is that you can join between your tables so that each time new queries aren't generated. When you do application.categories, that's one query. For each of those categories, you'll do another query: c.recipes - this creates N+1 queries, where N is the number of categories you have. Rather, you can include them off the get go to create 1 or 2 queries (depending on what Rails does).
The basic syntax is easy:
Application.includes(:categories => :recipes).each do |application| ...
This generates 1 (or 2 - again, see article) query that grabs all applications, their categories, and each categories recipies all at once. You can tack on the groups and ingredients too.
As for putting the work in the background, my suggestion would be to just have a loading image, or get fancy by using a progress bar.
First of all I have to assume that the required has_many and belongs_to associations exist.
Generally you can do something like
c.recipes.includes(:groups)
or even
c.recipes.includes(:groups => :ingredients)
which will fetch recipes and groups (and ingredients) at once.
But since you have a quite big data set IMO it would be better if you limited that technique to the deepest levels.
The most usefull approach would be to use find_each and includes together.
(find_each fetches the items in batches in order to keep the memory usage low)
perhaps something like
application.categories.each do |c|
c.recipes.find_each do |r|
r.groups.includes(:ingredients).each do |r|
r.ingredients.each do |r|
...
end
end
end
end
Now even that can take quite a long time (for an http request) so you can consider using some async processing where the client will generate a request that is going to be processed by the server as a background job, and when that is ready, you can provide a download link (or send an email) to the client.
Resque is one possible solution for handling the async part.
In my user model I have a friends method that returns the hash of all the user's facebook friends. In my view I iterate through the entire list to paginate that list alphabetically. I can't tell if my server is running really slow or if this is extremely inefficient. How can I make this process faster? Is it better to maybe create a friendsmodel? Please let me know if my method is inefficient, why, and how I might be able to make it faster. Thanks!
In my Home.html.erb view I have <%letter ='a'%> which changes when the user selects a different letter and the page refreshes.
<% current_user.friends.each do |user| %>
<% if user['name'].downcase.start_with? letter %>
do something
<% end %>
User Model
def facebook
#facebook ||= Koala::Facebook::API.new(token)
block_given? ? yield(#facebook) : #facebook
rescue Koala::Facebook::APIError => e
logger.info e.to_s
nil
end
def friends
facebook {|fb| fb.get_connections("me","friends")}.sort{|a,b| a['name']<=>b['name']}
end
You are making an external API call for every request. Plus user may have good number of friends like 500, 1000.
I in my fb app processing the data in background job(delayed job). You can use resque or sidekiq or some other background to process user data.
I would suggest you to make Friend model and have its association with users model. Then if you have some n+1 query problem you can use includes and instead of using sort use order it would be much faster then sort. Moreover instead of using each use find_each it will process the data in chunks you can google the difference between each and find_each. hope it would be helpful
One thing that will be slowing down each request for sure is the fact that your making an external API call in the middle of the request. The second thing to note is that your potentially bringing back a large amount of data, easily getting into the hundreds, if not thousands.
A more appropriate way to handle this would be to create a Friend model where each friend has a belongs to relationship to the User. In a background processor (ie delayed job, resque, sidekiq), iterate through your users and update their friends at some interval that your server can tolerate. It will cause some lag as to when the user's friends will show up. You'll have to be the judge as to how much lag time is tolerable, and it depends largely on your number of users and budget for hardware.
This is effectively a caching mechanism, and you may want to account for the fact that data will change, friends may be removed and so on. You could delete all the friends and recreate the whole list on each refresh. Doing so inside a transaction will keep the deletes from showing up until it is commited.
I'm having a very hard time trying to figure out how to do this the MVC way. I have a Comment model which holds a body attribute. This attribute may contain mentions such as the following:
Hi! This is me mentioning #someone.
Everytime someone posts a comment, an accessor method in the model converts all #mention to something like #user:231# where 231 would be the user's id. This way, if the mentioned user changes their username, I can still link & mention him without problems on older comments.
Now, I want to be able to access the body attribute and get the mentions already converted to links. It appears that doing this the MVC way, from within the model is not possible from what I have investigated.
Is there any easy way to do this? I don't wanna have to convert all the mentions on the controller because I think it could lead to repeated code and non-testable code.
Could anyone give me some advice on this?
Thanks!
Parsing the message into a particular format and then re-saving it in the database where it can then be edited at a later date is silly. I'm sorry to be so blunt, but doing it this way is fundamentally broken for one major reason: when a user goes to edit the message later on, they'll see the formatted text unless you format it back. Do you really want to be responsible for doing this?
I would hope not. Because you're a programmer, you're naturally lazy and would like to do things in as few steps as possible.
What I would recommend doing to solve this problem is to parse the message when you display it on the page. Before you go screaming at me that this is computationally intensive if you've got a large amount of hits, hear me out. When it's displayed on the page, you can then cache it like this:
<% cache comment do %>
# code goes here
<% end %>
This will store the final output in whatever cache you've set up with Rails, possibly Memcached or Redis, using a cache key which includes the comment's updated_at timestamp. Pay attention to this, it'll be useful later.
Retrieval from this cache will be faster than parsing it, and will be easier for you than to convert the message back and forth between its versions.
When a comment is updated, the updated_at timestamp will be different and so the new comment will be rendered first, then cached. In Memcached (so I'm told) it will clear the oldest cache key that hasn't been referenced if it needs more memory, thereby cleaning out the older comments.
Wouldn't you end up mangling the original message? Let's say I originally posted:
"Hi! This is me mentioning #bob."
From what I understand, you want to store this as:
"Hi! This is me mentioning #user:1#"
Now, if bob were to change his username to "fred", my message would now look like this:
"Hi! This is me mentioning #fred"
It may be easier to simply store a many-to-many relation between messages and users it mentions. That way, you still can easily see which messages mention a specific user, but you don't need to mangle the original message to do so.
If you need to convert each mention into a link, you could order the entries in the relationship table in the same order that they appear in the message.
maybe this gem help you https://github.com/twitter/twitter-text-rb
First, include Twitter::Autolink module from your class or helper
module ApplicationHelper
include Twitter::Autolink
end
From views, you can call it by :
<%= auto_link("Hi #john_doe, welcome to #ruby") %>
it will generate link to twitter john_doe username and ruby hashtag
I'm still pretty new to Rails and need your help: I have been creating a social fitness analytics site (zednine.com) that has an activity stream that lists workouts posted on the site. Several pages currently show the 10 most recently updated workouts. I'd like to add a link at the bottom to "Older workouts." On click, this should show the next 10 workouts in the page, immediately below the first 10, with a new link to Older below (now 20 displayed in the page) -- just like the news stream on Facebook and several other social networks.
What I've tried so far:
I'm currently using a find with :limit to get the first N results
I can set up a unique find with :limit and :offset for each set of N results with hidden divs, but that's lame and does not extend well
I also looked at:
pagination, including will_paginate, but not clear whether this can help for in same page chunking?
collections...?
What is the right/a good way to do this?
Also, how can I include records from multiple tables in this sort of stream? E.g., list could include workouts from one table, journal entries from another, comments from a third, all intermixed and sorted by date?
Thank you!
Will_paginate will do the job, just pass in the page you want:
<%= link_to "Older Posts", model_route_path(:page => next_page) %>
As for the second question, simply create a Feeds model (or tack it onto an existing one). Then have a method which fetches recent entries from the various other models and sorts them by created_at date. I would probably implement #recent method on each of the models and call that in your Feed object.
Models:
class Feed
def index
entries = []
entries << Journal.recent
entries << Comment.recent
# etc
entries.sort_by {|entry| entry['created_at']}
end
end
class Journal < ActiveRecord::Base
def recent
self.find_all_by_created_at(:limit => 10)
end
end
Or something like that. You will have to be very careful about scalability here.