The Challenge:
I need to find the most popular discussion in a forum.
Background Information:
forum has many discussions
discussion belongs to forum
discussion has an attribute called views which stores how many
times a user has viewed a discussion.
using a postgres database.
My Solution:
Create an instance method in the Forum model that loops through every single discussion and sees how many view each one has:
def most_popular_discussion
record_view = 0
self.discussions.each do |d|
record_views = d.views if d.views > record_views
end
record_views
end
Why I've Made A Question:
My solution appears to be disastrously inefficient as it queries the discussion table for every single entry. This method will get slower and slower as the database gets bigger and bigger. I wouldn't mind too much, but the most_popular_discourse method is also going to be requested a lot (on every user's profile page), and will really slow things up.
So how should I find the largest integer in a table? or (and I think this is probably the better way) should I actually save the record number of views, rather than working it out every time?
Maybe have another table called statistics for my application to use, with just two columns, name:string and information:string and use it store miscellaneous statistics?
Then, every-time someone views a discussion, I'd do something like this:
def iterate_views(ip)
current_views = self.views + 1
self.views = current_views
record_views_statistic = Statistic.find_by(name: 'record_views')
record_views_statistic.update_attributes(information: current_views.to_s) if current_views > record_views_statistic.information
# I convert current_views to a string before saving because the statistics table's `information` column holds strings in order to keep the table open and miscellaneous.
end
what do you think about that approach? Both interact with the database a fair bit, but this second approach wouldn't slow down proportionally to the amount of data in the database.
This approach will give you the most popular discussion, and is much simpler than your two solutions.
def most_popular_discussion
self.discussions.order(views: :desc).first
end
To get the highest number of views, you could either use most_popular_discussion.views or use a function like:
def record_views
self.discussions.maximum(:views)
end
Note that I've included ways to find both the most viewed discussion and the highest number of views, because your challenge says you'd like to find the most popular discussion but both of your solutions just seem to find the record number of views among a forum's discussions.
As for your solutions, your second one seems to be closer to a good solution, but why not just cache the most popular discussion's views count in the Forum model? Say we add a record_views column to the forums table.
class Discussion < ActiveRecord::Base
belongs_to :forum
def iterate_views
self.views += 1
if self.forum.present? && self.views > self.forum.record_views
self.forum.record_views = self.views
end
end
end
Then, to find the most popular discussion in the Forum model (assuming ties don't matter):
def most_popular_discussion
self.discussions.where(views: self.record_views).first
end
Related
I have the following problem. I need to do a massive query of table named professionals but I need to optimize the query because for each professional I call the associated tables.
But I have a problem with two associated tables: comments and tariffs.
Comments:
I need to call 3 comments for each professional. I try with:
#professionals.includes(:comments).where(:comments => { type: 0 } ).last(3)
The problem the query only brings 3 professionals, not what I need, all the professionals with only three comments where type be equal to zero.
And when I try:
#professionals.includes(:comments).where(:comments => { type: 0 } )
The result is only professionals with (all the) comments when I need all the professional with or without comments. But if the professional have comments I only need the last three comments where the type be equals zero
Tariffs:
With tariffs I have a similar problem, in this case I need the last 4 tariffs for each professional. I try with:
#professionals.includes(:tariffs).last(4)
But only brings the last 4 professionals.
Models:
class Comment < ActiveRecord::Base
belongs_to :client
belongs_to :professional
end
class Professionals < ActiveRecord::Base
has_many :comment
end
You can't use limit on the joining table in ActiveRecord. The limit is applied to the first relation, which in this case happens to be #professionals.
You have a few choices choices:
Preload all comments for each professional and limit them on output (reduces the number of queries needed but increases memory consumption since you are potentially preloading a lot of AR objects).
Lazy load the required number of comments (increases the number of queries by n+1, but reduces the potential memory consumption).
Write a custom query with raw SQL.
If you preload everything, then you don't have to change much. Just limit the number of comments white iterating through each #professional.
#professionals.each do |professional|
#professional.comments.limit(3)
end
If you lazy load only what you need, then you would apply the limit scope to the comments relation.
#professionals.all
#professionals.each do |professional|
#professional.comments.where(type: 0).limit(3)
end
Writing a custom query is a bit more complex. But you might find that it might be less performant depending on the number of joins you have to make, and how well indexed your table is.
I suggest you take approach two, and use query and fragment caching to improve performance. For example:
- cache #professionals do
- #professionals.each do |professional|
- cache professional do
= professional.name
This approach will hit the database the first time, but after subsequent loads comments will be read from the cache, avoiding the DB hit. You can read more about caching in the Rails Guides.
In my current application, I need the ability to track points on a weekly basis so that the point totals for the user reset back to zero each week. I was planning on using the gem merit: https://github.com/tute/merit to track points.
In my users profile I have a field that is storing the points. What I have been unable to locate is how I can have rails on an auto basis for all users clear this field.
I have come across some information Rails reset single column I think this may be the answer in terms of resetting it every Sunday at a set time -- but I am uncertain on this last part and in addition where the code would go (model or controller)
Also, would welcome any suggestions if their is a better method.
You'd be better making a Point model, which belongs_to :user
This will allow you to add any points you want, and can then query the table based on the created_at column to get a .count of the points for the timespan you want
I can give you more info if you think it appropriate
Models
One principle we live by is to extend our models as much as possible
You want each model to hold only its data, thus ensuring more efficient db calls. I'm not super experienced with databases, but it's my opinion that having a lot of smaller models is more efficient than one huge model
So in your question, you wanted to assign some points to a user. The "right" way to do this is to store all the points perpetually; which can only be done with its own model
Points
#app/models/point.rb
Class Point < ActiveRecord::Base
belongs_to :user
end
#app/models/user.rb
Class User < ActiveRecord::Base
has_many :points
end
Points table could look like this:
points
id | user_id | value | created_at | updated_at
Saving
To save the points, you will literally just have to add extra records to the points table. The simplest way to achieve this will be to merge the params, like this:
#app/controllers/points_controller.rb
class PointsController < ApplicationController
def new
#points = Point.new
end
def create
#points = Point.new(points_params)
#points.save
end
private
def points_params
params.require(:points).permit(:value).merge(:user_id => current_user.id)
end
end
You can define the "number" of points by setting in the value column (which I'd set as default 1). This will be how StackOverflow gives different numbers of points; by setting the value column differently ;)
Counting
To get weekly countings, you'll have to create some sort of function which will allow you to split the points by week. Like this:
#app/models/point.rb -> THIS NEEDS MORE WORK
def self.weekly
where(:created_at => Time.now.next_week..Time.now.next_week.end_of_week)
end
That function won't work as it is
I'll sort out the function properly for you if you let me know a little more about how you'd like to record / display the weekly stats. Is it going to be operated via a cron job or something?
Based on your description, you might want to simply track the users points and the time that they got them. Then you can query for any 1 week period (or different periods if you decide you want all-time, annual, etc) and you won't lose historical data.
I have a "Vote" table in my database which is growing in size everyday, currently at around 100 million rows. For internal analytics / insights I used to have a rake task which would compute a few basic metrics, like the number of votes made daily in the past few days. It's just a COUNT with a where clause on the date "created_at".
This rake task was doing fine until I deleted the index on "created_at" because it seems that it had a negative impact on the app performance for all the other user-facing queries that didn't need this index, especially when inserting a new row.
Currently I don't have a lot of insights as to what is going on in my app and in this table. However I don't really want to add indexes on such a large table if it's only for my own use.
What else can I try ?
Alternately, you could sidestep the Vote table altogether and keep an external tally.
Every time a vote is cast, a separate tally class that keeps a running count of votes cast will be invoked. There will be one tally record per day. A tally record will have an integer representing the number of votes cast on that day.
Each increment call to the tally class will find a tally record for the current date (today), increment the vote count, and save the record. If no record exists, one will be created and incremented accordingly.
For example, let's have a class called VoteTally with two attributes: a date (date), and a vote count (integer), no timestamps, no associations. Here's what the model will look like:
class VoteTally < ActiveRecord::Base
def self.tally_up!
find_or_create_by_date(Date.today).increment!(:votes)
end
def self.tally_down!
find_or_create_by_date(Date.today).decrement!(:votes)
end
def self.votes_on(date)
find_by_date(date).votes
end
end
Then, in the Vote model:
class Vote < ActiveRecord::Base
after_create :tally_up
after_destroy :tally_down
# ...
private
def tally_up ; VoteTally.tally_up! ; end
def tally_down ; VoteTally.tally_down! ; end
end
These methods will get vote counts:
VoteTally.votes_on Date.today
VoteTally.votes_on Date.yesterday
VoteTally.votes_on 3.days.ago
VoteTally.votes_on Date.parse("5/28/13")
Of course, this is a simple example and you will have to adapt it to suit. This will result in an extra query during vote casting, but it's a hell of a lot faster than a where clause on 100M records with no index. Minor inaccuracies are possible with this solution, but I assume that's acceptable given the anecdotal nature of daily vote counts.
It's just a COUNT with a where clause on the date "created_at".
In that case the only credible index you can use is the one on created_at...
If write performance is an issue (methinks it's unlikely...) and you're using a composite primary key, clustering the table using that index might help too.
If the index has really an impact on the write performance, and it's only a few persons which run statistics now and then, you might consider another general approach:
You could separate your "transaction processing database" from your "reporting database".
You could update your reporting database on a regular basis, and create reporting-only indexes only there. What is more queries regarding reports will not conflict with transaction-oriented traffic, and it doesn't matter how long they run.
Of course, this increases a certain delay, and it increases system complexity. On the other hand, if you roll-forward your reporting database on a regular basis, you can ensure that your backup scheme actually works.
This is related to a question a year and change ago.
I put up an example of the question that should work out of the box, provided you have sqlite3 available: https://github.com/cairo140/rails-eager-loading-counts-demo
Installation instructions (for the main branch)
git clone git://github.com/cairo140/rails-eager-loading-counts-demo.git
cd rails-eager-loading-counts-demo
rails s
I have a fuller write-up in the repository, but my general question is this.
How can I make Rails eager load counts in a way that minimizes db queries across the board?
The n+1 problem emerges whenever you use #count on an association, despite having included that association via #includes(:associated) in the ActiveRelation. A workaround is to use #length, but this works well only when the object it's being called on has already been loaded up, not to mention that I suspect it duplicates something that the Rails internals have done already. Also, an issue with using #length is that it results in an unfortunate over-loading when the association was not loaded to begin with and the count is all you need.
From the readme:
We can dodge this issue by running #length on the posts array (see appendix), which is already loaded, but it would be nice to have count readily available as well. Not only is it more consistent; it provides a path of access that doesn't necessarily require posts to be loaded. For instance, if you have a partial that displays the count no matter what, but half the time, the partial is called with posts loaded and half the time without, you are faced with the following scenario:
Using #count
n COUNT style queries when posts are already loaded
n COUNT style queries when posts are not already loaded
Using #length
Zero additional queries when posts are already loaded
n * style queries when posts are not already loaded
Between these two choices, there is no dominant option. But it would be nice to revise #count to defer to #length or access the length that is some other way stored behind the scenes so that we can have the following scenario:
Using revised #count
Zero additional queries when posts are already loaded
n COUNT style queries when posts are not already loaded
So what's the correct approach here? Is there something I've overlooked (very, very likely)?
As #apneadiving suggested, counter_cache works well because the counter column gets automatically updated when records are added or removed. So when you load the parent object, the count is included in the object without needing to access the other table.
However, if for whatever reason you don't like that approach, you could do this:
Post.find(:all,
:select => "posts.*, count(comments.id) `comments_count`",
:joins => "left join comments on comments.post_id = posts.id")
An alternative approach to the one of Zubin:
Post.select('posts.*, count(comments.id) `comments_count`').joins(:comments).group('posts.id')
It appears that the best way to implement this sort of facility might be to create SQL Views (ref: here and here) for the seperate model-and-child-count objects that you want; and their associated ActiveRecord models.
You might be able to be very clever and use subclassing on the original model combined with set_table_name :sql_view_name to retain all the original methods on the objects, and maybe even some of their associations.
For instance, say we were to add 'Post.has_many :comments' to your example, like in #Zubin's answer above; then one might be able to do:
class CreatePostsWithCommentsCountsView < ActiveRecord::Migration
def self.up
#Create SQL View called posts_with_comments_counts which maps over
# select posts.*, count(comments.id) as comments_count from posts
# left outer join comments on comments.post_id = posts.id
# group by posts.id
# (As zubin pointed out above.)
#*Except* this is in SQL so perhaps we'll be able to do further
# reducing queries against it *as though it were any other table.*
end
end
class PostWithCommentsCount < Post #Here there be cleverness.
#The class definition sets up PWCC
# with all the regular methods of
# Post (pointing to the posts table
# due to Rails' STI facility.)
set_table_name :posts_with_comment_counts #But then we point it to the
# SQL view instead.
#If you don't really care about
# the methods of Post being in PWCC
# then you could just make it a
# normal subclass of AR::Base.
end
PostWithCommentsCount.all(:include => :user) #Obviously, this sort of "upward
# looking" include is best used in big lists like "latest posts" rather than
# "These posts for this user." But hopefully it illustrates the improved
# activerecordiness of this style of solution.
PostWithCommentsCount.all(:include => :comments) #And I'm pretty sure you
# should be able to do this without issue as well. And it _should_ only be
# the two queries.
I have set up a small gem that adds an includes_count method to ActiveRecord, that uses a SELECT COUNT to fetch the number of records in an association, without resorting to a JOIN which might be expensive (depending on the case).
See https://github.com/manastech/includes-count
Hope it helps!
I've got a tiny model (let's call it "Node") that represents a tree-like structure. Each node contains only a name and a reference to its father:
class Node < ActiveRecord::Base
validates_presence_of :name, :parent_id
end
The table isn't very big - less than 100 elements. It's updated rarely - in the last 4 months 20 new elements were added, in one occasion, by the site admin.
Yet it is used quite a lot on my application. Given its tree-like structure, on some occasions a request triggers more than 30 database hits (including ajax calls, which I use quite a lot).
I'd like to use some sort of caching in order to lower the database access - since the table is so small, I thought about caching all registers in memory.
Is this possible rails 2.3? Is there a better way to deal with this?
Why don't you just load them all every time to avoid getting hit with multiple loads?
Here's a simple example:
before_filter :load_all_nodes
def load_all_nodes
#nodes = Node.all.inject({ }) { |h, n| h[n.id] = n; n }
end
This will give you a hash indexed by Node#id so you can use this cache in place of a find call:
# Previously
#node = Node.find(params[:id])
# Now
#node = #nodes[params[:id].to_i]
For small, simple records, loading them in quickly in one fetch is a fairly inexpensive operation.
Have you looked at any of the plugins that give tree like behaviour.
Ryan Bates has a railscast on acts_as_tree however acts_as_nested_set or one of the other projects inspired by it such as awesome_nested_set or acts_as_better_nested_set may be better fits for your needs.
These projects allow you to get a node and all of its children with one sql query. The acts_as_better_nested_set site has a good description of how this method works.
After looking in several places, I think tadman's solution is the simplest one.
For a more flexible solution, I've found this gist:
http://gist.github.com/72250/
Regards!