I have a Rails app that keeps count of User, Comments for many Movie_Categories. The models look like this:
class User < ActiveRecord::Base
has_many :comments
end
class Movie < ActiveRecord::Base
belongs_to :movie_category
has_many :comments
end
class MovieCategory < ActiveRecord::Base
has_many :movies
end
class Comment < ActiveRecord::Base
belongs_to :movie
belongs_to :user
end
I can of course find User's Comments count by MovieCategory doing something like this for each MovieCategory:
#user.comment.where("movie_category_id =?", movie_category_id)
However, it puts a lot of load on the server, if I need to make this call too frequently, so I was thinking about doing the calculation once per hour (in a background job) for all Users for all Movie_Categories and then storing the counts in the User table, each Movie Category with it's own column. That way, I don't have to run the calculations for each user as frequently and can just access the counts from the User table.
The thing I'm wondering is if there is a more dry way to do this since I'm not sure when my Movie_Categories will stop growing (and with each time comes a new table field). I also thought about caching the User show views (where these counts appear), but even so, if I don't have these columns in the User table then it seems like each time a new User page is loaded (or cached expired) it will have to run through calculating all of this for the User comment counts again.
Is there a better approach for the issue that I'm facing with not putting too much burden on the server?
Given your comments about being in development at the moment, I would say don't worry about it until you have to worry about it! However, if you want to plan ahead my suggestions would be to go with fragment caching and indexes on the foreign keys.
If your site grows to the size you're talking about, running migrations to add additional fields to your users table could take significant time.
I note you've referred to caching in your question so assume you're broadly familiar with it but given a view such as:
<ul>
<li>Action Movies: 23 Comments</li>
<li>Comedy Movies: 14 Comments</li>
</ul>
You would wrap this in a cache block:
<% cache "user-#{user.id}-comments", #user.comments.last.created_at.to_i do %>
...
<% end %>
This will cache the fragment displaying the counts and auto expire it each time a new comment is posted by that user. You could really get into granular detail by caching each <li> and expiring only when a comment is posted in that category but it might be overkill at an early stage.
For the index on the foreign keys, you can add this in a migration using the syntax:
add_index :comments, :movie_category_id
I don't think the query you run is that bad but you never know until you hit production and scale quite what effect it will have.
Related
Many content-aggregators, like reddit or hackernews, sort their stories with an algorithm based on a combination of the number of upvotes and the time since it has been submitted. The simplest way to implement such sorting would be to make a function in the database that would calculate the ranking for every item and sort based on that - but that would quickly become inefficient, since it would require to calculate the ranking for all the items on each query.
Another way would be to save the ranking for every item in the database. But when would I recalculate it? If I did it only when a submission was voted on, then those which were not voted on would stay with the same ranking even though they should fall because of passing time. So, what's the best way to implement such sorting? I'm not asking what's the best algorithm for that, but how it would be applied.
You should store number of votes with Rails' standard counter_cache:
class Vote < ActiveRecord::Base
belongs_to :post, counter_cache: true
end
class Post < ActiveRecord::Base
# Migration for this model should create 'votes_count' column:
# t.integer :votes_count
has_many :votes, dependent: :destroy
end
What about applying complex algorithm, especially one which has time as one of its variables/parameters, well, I doubt there is better way (in general – you can come up with more simple algorithm to save efforts) than recalculating rating each midnight.
In Rails, you can use whenever gem:
# config/schedule.rb
every 1.day, at: '12am' do
runner Post.update_rating
end
# post.rb
class Post < ActiveRecord::Base
# Add 'rating' column in migration
# t.integer :rating
def self.update_rating
# Ruby-way
# self.find_in_batches do |batch|
# batch.each do |post|
# post.update(rating: (post.votes.to_f / (Date.today - post.created_at.to_date)).to_i)
# end
# end
# SQL-way (probably a quicker one, but you should think on how
# not to lock your database for a long period of time)
# SOME_DATE_FUNCTION will depend on DB engine that you use
self.update_all(:rate, "votes_count / SOME_DATE_FUNCTION(created_at, NOW())")
end
end
UPD. Response to the points made in comments.
If we're dealing with a site that is as dynamic as reddit, we should refresh the ratings more often.
As far as I understand you consider Reddit's main page as a reference on posts sorted by rating (which is calculated using vote count and age penalty). I seriously doubt that Reddit recalculates 'age penalty' more often than once per day. Of course, it could calculate vote counts in real-time (and that's what you can do easily with counter_cache in Rails).
So when new records suddenly gets 1000 upvotes, it goes to the main page immediately. But then each midnight it will receive penalty which equals to, say, 20% of the votes it received that day.
Well, you said that we won't discuss the specific algorithm, which could be really complex :)
Also, do you think the ranking should in addition to that be recalculated each time a post is upvoted?
Of course, nothing should stop you to update 'vote counts' part of rating in real-time (and show records based on the resulting rating). Then you can calculate 'age penalty' for each record once per day, sinking old posts a bit.
I'm not sure the best way to ask this question, and I realize it may be out of the scope of this. I am attempting to learn rails by making simple apps that I could potentially use one day. I realize these aren't robust or secure apps, but making something I could use gives me some motivation. I've begun to get out of scope of the simple "create a blog/twitter" phase and can't find much help.
I'm attempting to make an app to book outdoor trips.
Models: Leaders, Groups, Trips, Activities, Locations and Plans
The idea is to create a "plan" to send to someone that is a publicly viewable grouping of trips. I've got everything in place to manipulate everything by the plans. They are all straightfoward models and relations.
I'm getting hung up on the best way to create a plan, and add multiple, existing trips to it. Each trip has a plan_id which can be set and the plan can simply pull that collection, but I don't know how to best (and most simply - without javascript if possible) show a list of trips and be able to select multiple and add them to a plan.
Does this make sense? I think the easiest way to begin to unravel it would be to check out the git repo: https://github.com/ryanmccrary/cabra
The https://github.com/ryanmccrary/cabra/tree/trip-plan-add branch is a half-baked attempt at one method, but I think I went about it the wrong way.
I'm not looking for the "solution" as much as the best way to do something like this and possibly some hints to get me started...
Have you considered using simple_form?
Creating complex forms at the beginning of learning Rails can be a major setback, with simple form, you can just do this and trips that are already created will apear.
<%= simple_form_for #plan do |f| %>
<%= f.input :name %>
<%= f.association :trips %>
<%= f.button :submit %>
<% end %>
simple_form can and will make you do certain things and html markup a certain way which will make it less customizable, but if you are just trying to make simple has_many relationships, this would be the fastest way to go.
Hope it helps, I know how frustrating it can be to learn a framework and get stuck on something that seems to be trivial at first but turns out to be something that ruins the whole week.
Should try looking at some railscast episodes, there haven't been updates for a while, but there are very specific topics with source code.
Added:
If you want to associate existing trips to your plan, you would also need to make the associations between them a has many to many. You would need to create another table that would hold the association between the two together. lets call this an "agenda", sorry can't think of a good word for it. For more about associations, consult the RailsGuides.
models/trip.rb
class Trip < ActiveRecord::Base
has_many agendas
has_many plans, through: agenda
end
models/plan.rb
class Plan < ActiveRecord::Base
has_many agendas
has_many trips, through: agenda
end
models/agenda.rb
class Agenda < ActiveRecord::Base
belongs_to plan
belongs_to trip
end
I'm seeking brainstorming input for a Rails design issue I've run across.
I have simple Book reviews feature. There's a Book class, a User class, and a UserBook class (a.k.a., reviews and ratings).
class User < ActiveRecord::Base
has_many :user_books
end
# (book_id, user_id, review data...)
class UserBook < ActiveRecord::Base
belongs_to :user
belongs_to :book
end
In the corresponding book controller for the "show" book action, I need to load the book data along with the set of book reviews. I also need to find out whether the current user (if there is one) has contributed to those reviews.
I'm currently running two queries, Book.where(...) and UserBook.where(...), and placing the results into two separate objects passed on to the view. Now, while I could run a third query to find whether the user is among those reviews (on UserBook), I'd prefer to pull that from the #reviews result set. But do I do that in the controller, or in the view?
Also worth noting is that in the view I have to draw Add vs Update review buttons accordingly, with their corresponding ajax URLs. So I'd prefer to know it before I start looping through a result set.
If I detect this in the controller though, I'll need three instance variables passed in, which I understand is considered distasteful in Rails land. Not sure how to avoid this.
Suggestions appreciated.
This smells like a case for has_many through, which is designed for cases where you want to access the data of a third table through an intermediate table (in this case, UserBook)
Great explanation of has_many :through here
Might look something like this:
class User < ActiveRecord::Base
has_many :user_books
has_many :users, through: :books
end
Then you can simply call
#user = User.find(x)
#user.user_books` # (perhaps aliased as `User.find(x).reviews`)
and
#user.books
to get a list of all books associated with the User.
This way, you can gain access to all of the information you need for a particular user with a single #user instance variable.
PS - You'll want to take a look at the concept of Eager loading, which will prevent you from making extraneous database calls while fetching all of this information.
I know how and what eager loading is, and I've searched around Google and StackOverflow for other eager loading questions. Some of them were enlightening, but none solved my particular question so I'm going to ask one myself.
Here is my schema:
class Organization < ActiveRecord::Base
has_many :activities
has_many :users_logged, source: :user, through: :activities
end
class Activity < ActiveRecord::Base
belongs_to :user
belongs_to :organization
end
class User < ActiveRecord::Base
has_many :activities
end
This project was inherited and I can't do any major schema redesigns so if the question is not possible with the schema then so be it. I just want to try and speed up generation of these reports which involves loading all the Organizations, something like:
Organizations.includes(:users_logged).joins(:activities)
Now, I realize here that the includes(:users_logged) will load all the activities but the joins(:activities) is still necessary for the fields that are ordered/grouped by which I've excluded as they are not important to the question.
The question now is, I would like to eager load all the Activities for all users that have been selected by the includes(:users_logged) directive as I will not only need the Organization details, it's associated activities, and all users that have logged activities for the organization. In addition to that I need to load all the activities for the user (ideally associated with the organization but I can work that out once I figure out how to eager load).
My current implementation:
#orgs = Organization.includes(:users_logged).joins(:activities).all
#orgs.each do |org|
org.users_logged.each do |user|
# Just work with user.activities
end
end
This results in a query per user that averages 0.3 ms with the current amount of test data which adds up rather quickly. Is there a way to eager load all activities for a collection of users?
I'm posting this an answer to make sure that it's clear it's been resolved. Once I'm capable I will accept this as the correct answer.
Baldrick's comment got me to thinking about how exactly I was filtering out activities for the user so that I only had the activities logged to the active organization. I was performing:
user.activities.where("organization_id = ?", org.id)
Which would force a query every time regardless of what I've done. So I changed it to:
user.activities.select { |o| o.id == org.id }
Which now performs what I've wanted (after #Baldrick's recommendation as well).
I have models UserVote, Comment, Edit, etc, all of which have a user_id attribute. I'm trying to create a sort of timeline of recent activity, and this has me querying all 5 of my models separately and sorting by datetime. However, with accounts that have a lot of activity, these 5 queries take a very long time to execute. I'd like to find a way to optimize the performance, and I figured combining the 5 queries might work.
I haven't been able to come up with any working query to achieve what I'd like.
Thanks for any help!
I think the best suggestion in the comments is from Steve Jorgensen, with "I have generally seen this done by adding records to an activity log, and then querying that.".
If you want to take this idea to the next level, check out sphinx (a search engine designed for indexing database content). You can integrate easily with rails using thinksphinx - http://freelancing-god.github.com/ts/en/.
Also, as Tim Peters brings up, you really should have indexs on all of your fkeys, regardless of how you solve this - http://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/add_index.
I think it is good idea to use Polymorphic associations for this problem - http://guides.rubyonrails.org/association_basics.html#polymorphic-associations
class TimeLine < ActiveRecord::Base
belongs_to :timelineable, :polymorphic => true
end
class UserVote < ActiveRecord::Base
has_many :time_lines, :as => :timelineable
end
class Comments < ActiveRecord::Base
has_many :time_lines, :as => :timelineable
end
Now you can sort time_line and access associated resources.