Rails caching techniques for a personalized news feed - ruby-on-rails

In a scenario where there are users that have posts, and each user has a view representing a news feed (much like with a logged in Tumblr account), and each post overview has a link to the comments with a comment counter per post, what is the best caching strategy here (On a Rails 4 stack)?
Assuming 5 users, A B C D E, with each being subscribed to the 2 users on their right (A is subscribed to B and C, B is subscribed to C and D etc.) and only having the users they've subscribed to showing up on their news feed view.
Edit:
Assume a fan-out-on-write approach is taken, where each user has a unique set (of post ids) in Redis, and on every post create, the id of the new post is appended to every of the post creator's friends' sets. The redis sets act as an index and a user's feed is fetched via a single SQL query.
Bearing this in mind, caching each feed should be a matter of this approach:
Check set in redis (first hit)
write #feed_array to memcached
fetch posts with single SQL command and save to #feed
write #feed to memcached
Check set in redis (second hit)
If set values match #feed_array then return #feed from memcached. Otherwise new SQL query and override #feed in memcached
This approach would mean easy cache use for the views when iterating through the #post divs, but how would one handle the comment counts?

unrelated of the application stack that you are using, i don't think that a caching approach scales in your situation. twitter-like functionality is often handled by de-normalization.
in your situation, this could mean implementing a feed model for each user, appending new posts of the followers, so that it is fast to load the 'timeline' of a user from his own feed, instead of joining all his (possible thousands) of friends.

Related

Include vs Join

I have 3 models
User - has many debits and has many credits
Debit - belongs to User
Credit - belongs to User
Debit and credit are very similar. The fields are basically the same.
I'm trying to run a query on my models to return all fields from debit and credit where user is current_user
User.left_outer_joins(:debits, :credits).where("users.id = ?", #user.id)
As expected returned all fields from User as many times as there were records in credits and debits.
User.includes(:credits, :debits).order(created_at: :asc).where("users.id = ?", #user.id)
It ran 3 queries and I thought it should be done in one.
The second part of this question is. How I could I add the record type into the query?
as in records from credits would have an extra field to show credits and same for debits
I have looked into ActiveRecordUnion gem but I did not see how it would solve the problem here
includes can't magically retrieve everything you want it to in one query; it will run one query per model (typically) that you need to hit. Instead, it eliminates future unnecessary queries. Take the following examples:
Bad
users = User.first(5)
users.each do |user|
p user.debits.first
end
There will be 6 queries in total here, one to User retrieving all the users, then one for each .debits call in the loop.
Good!
users = User.includes(:debits).first(5)
users.each do |user|
p user.debits.first
end
You'll only make two queries here: one for the users and one for their associated debits. This is how includes speeds up your application, by eagerly loading things you know you'll need.
As for your comment, yes it seems to make sense to combine them into one table. Depending on your situation, I'd recommend looking into Single Table Inheritance (STI). If you don't go this route, be careful with adding a column called type, Rails won't like that!
First of all, in the first query, by calling the query on User class you are asking for records of type User and if you do not want user objects you are performing an extra join which could be costly. (COULD BE not will be)
If you want credit and debit records simply call queries on Credit and Debit models. If you load user object somewhere prior to this point, use includes preload eager_load to do load linked credit and debit record all at once.
There is two way of pre-loading records in Rails. In the first, Rails performs single query of each type of record and the second one Rails perform only a one query and load objects of different types using the data returned.
includes is a smart pre-loader that performs either one of the ways depending on which one it thinks would be faster.
If you want to force Rails to use one query no matter what, eager_load is what you are looking for.
Please read all about includes, eager_load and preload in the article here.

how to model a parse class to include isliked, isfollowing fields along with the PFQuery results

I can't find an efficient way to query Posts(PFObject) or Users(PFUser) classes and also have the isPostLiked(boolean) and isUserFollowed(boolean) included in the results array respectively.
Lets say, I have queried and received 25 Posts from the server. I want to fill in the like heart button with red if I have previously liked this Post. It would be very inefficient to query all the likes of these Posts and see if current user is contained in the results.
Is it possible to write a cloud code function to insert an 'isLiked' field to the query results and return it to the User for instance?
I am open to new strategies since I am stuck here. It is obvious that most of the social apps are having this need as a standard so there must be an effective solution. Thanks
Your best action is to rid yourself of the relational database thinking. It seems to me you have a separate Likes class that tracks which user likes which post.
In the NoSQL space you should focus on your queries when you plan your datamodel. Ask yourself this question:
How do I want to query my data?
In this use case, I'm thinking you might want to
Show how many likes a Post has
Maybe show which users did like the Post
Track whether the current user has liked a certain post
Maybe find all the Posts the current user has liked?
To solve this, I would do the following:
On the Post class, add a column likedby.
On the User class, add a column likedposts.
Both these columns are Array columns
Every time a user likes a post, you add a Pointer to the current user to the likedby array column for the Post AND a pointer to the post to the likedposts array column for the User.
This makes it very easy to
find how many likes a post has (number of elements in likedby)
list all the users that liked the post (using query.includeKey("likedby") on the Post)
check if the current user has already liked the post (if likedby array contains currentuser)
list all the posts a user has liked (using query.includeKey("likedposts") on the User).
Use the same logic for followings.

PFObject once visited should not get retrieved again

I am using parse.com for my survey application, In that I am implementing like mechanism where I have set of two images which users will be able to see and they have to like one of them. which be part of my survey.
Now I am downloading 20 sets per query then asking user click More then i download next 20 sets n so on..
when I query all the 20 sets which user have already votes is getting downloaded again., so how do i stop that ? so I do not get those sets repeated again and again.
Have a look at the Anypic tutorial on parse.com how they use the Activity class to track likes, comments etc. Use this as a template for how to plan your data model as opposed to relational principals.
One possible solution is to store all voted photos in an array on i.e. a voting object, or even the user object, and query for photos that are NOT in this array.
You should store your voting operations in somewhere.
So let me analyze the two possibilities to store your operations:
In your local (in the device): If you decide store them in local, you will retrieve some objects (sets) from the Parse and then you will look for unvoted ones, and probably you will lose some of them, maybe all of them. So it is not feasible.
In the Parse: As i explained, you should store them in Parse.
You can do it using:
Relations. You can take a look at:
https://www.parse.com/docs/relations_guide#top
When retrieving new sets, you should get sets which the current user hasn't voted yet. You can do this with Relational Queries in Parse. You can take a look at the documentation about it:
https://www.parse.com/docs/ios_guide#queries-relational/iOS
Or creating a Join Table.
This would be a custom implementation of a new class where you join your user with a set. Maybe you can store additional info about voting operations, like the time of voting.

Querying Mongodb collection based on parent's attribute

I've got a Posts document that belong to Users, and Users have an :approved attribute. How can I query my Posts using Mongodb s.t. I only get those for where User has :approved => true ?
I could write a loop that creates a new array, but that seems inefficient.
MongoDB does not have any notion of joins.
You've stated in the comments that Posts and Users are separate collections, but your query clearly involves data from both collections, which would imply a join.
I could write a loop that creates a new array, but that seems inefficient.
A join operation in SQL is basically a loop that happens on the server. With no join support on the server side, you'll have to make your own.
Note that many of the libraries (like Morphia) actually have some of this functionality built-in. You are using Mongoid which may have some of this support, but you'll have to do some hunting.
The easiest way to think about it would be to query for unique user ids of users who are approved and then query for post documents where the poster's user_id is in that set.
As Rubish said, you could de-normalize by adding an approved field to the post document. When a user's approval status is toggled (they become approved or unapproved) do an update on the posts collection where, for all of that user's posts, you toggle the denormalized approval field.
Using the denormalized method lets you do one query instead of two (simplifying the logic for the most common case) and isn't too much of a pain to maintain.
Let me know if that makes sense.

Minimizing calls to database in rails

i am familiar with memcached and eager loading, but neither seems to solve the problem i am facing.
My main performance lag comes from hundreds of data retrieval calls from the database. The tricky thing is that I do not know which set of users i need to retrieve until i have several steps of computation.
I can refactor my code, but i was wondering how you experts handle this situation? I think it should be a fairly common situation
def newsfeed
- find out which users i need
- retrieve those users via DB
- find out which events happened for these users
- for each of those events
- retrieve new set of users
- find out which groups are relevant
- for each of those groups
- retrieve new set of users
- etc, etc
end
Denormalization is the magic password for your situation.
There are several ways to do this:
For example, store the ids of the last 10 users in the event and group.
Or create a new model NewsFeedItem (belongs_to :parent, :polymorphic => true). When a user attends an event, create a NewsFeedItem with denormalized informations like this users name, his profile pic etc. Saves you from second queries to user_events and users.
You should be able to do this with only one query per Event / Group loop. What you'll want to do is: inside your for loop add user ids to a Set, then after the for loop, retrieve all the User records with those ids. Rinse and Repeat. Here is an example:
def newsfeed
user_ids = Set.new
# find out which users i need
... add ids to user_ids
# retrieve those users via DB
users = User.find(user_ids.to_a)
# find out which events happened for these users
# you might want to add a condition
# that limits then events returned to only recent ones
events = Event.find_by_user_id(user_ids.to_a)
user_ids = Set.new
events.each do |event|
user_ids << discover_user_ids_for_event(event)
# retrieve new set of users
users = User.find(user_ids.to_a)
# ... and so on
end
I'm not sure what your method is supposed to return, but you can likely figure out how to use the idea of grouping finds together by working with collections of IDs to minimize DB queries.
Do you want to show all the details at once (I mean when the page is loading do you really want to load all of those information) , If not what you can do is, load them on demand
as follows
def newsfeed
find out which users i need
retrieve those users via DB
find out which events happened for these users
once you show the events give them a button or something to drill down to other details (on -demand) then load them using AJAX (so that page will not refresh)
use this technique repeatedly when users want to go deep details
By doing this , you will save lots of processing power and will get only the details user needs
I dont know if this is applicable to your situation
If not then you have to find a more optimized way of loading details
cheers,
sameera
I understand that you are trying to perform some kind of algorithm on the basis of your data to do some kind of recommendation or similar sort of thing.
I have two suggestions:
1) You reevaluate your algorithm / design on the basis of what you actually want to achieve. For instance, in cases where an application has users who can potentially have lots of posts and the app wants to perform some algorithm on the basis of the number of posts then it will be quite expensive to count their posts every time. To optimise this, a post_count column can be added on the user model and increase that count whenever a user successfully does a post. Similarly, if you can establish some kind of relation like this between your user, events, groups etc, then think of something on those lines.
2) If first solution is not feasible, then for anything like this you must avoid doing multiple queries and then using ruby for crunching data which would obviously be very expensive and is never advisable if you have large data set. So what you need here is to make one sql query using join and get all data in just one go. Also pick only those field names from the database that you need. It really helps in case of large data sets. For instance, if you need user id and event_id from user and events table and nothing else then do something like so
User.find(:all,
:select => 'users.id, users.event_id',
:joins => 'join events on users.id = events.user_id',
:conditions => ['users.id in (your user ids)'])
I hope this will point you in the right direction.

Resources