BACKGROUND: I have a set of Posts that can be voted on. I'd like to sort Posts according to their "vote score" which is determined by the following equation:
( (#post.votes.count) / ( (Time.now - #post.created_at) ** 1 ) )
I am currently defining the vote score as such:
def vote_score(x)
( (x.votes.count) / ( (Time.now - x.created_at) ** 1 ) )
end
And sorting them as such:
#posts = #posts.sort! { |a,b| vote_score((b) <=> vote_score((a) }
OBJECTIVE: This method takes a tremendous toll on my apps load times. Is there a better, more efficient way to accomplish this kind of sorting?
If you are using MySQL you can do the entire thing using a query:
SELECT posts.id,
(COUNT(votes.id)/(TIME_TO_SEC(NOW()) - TIME_TO_SEC(posts.created_at))) as score
FROM posts INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
ORDER BY score DESC
Or:
class Post
scope :with_score, select('posts.*')
.select('(COUNT(votes.id)/(TIME_TO_SEC(NOW()) - TIME_TO_SEC(posts.created_at))) as score')
.joins(:votes)
.group('posts.id')
.order('score DESC')
end
Which would make your entire query:
#posts = Post.with_score.all
P.S: You can then modify your Post class to use the SQL version of score if it is present. You can also make the score function cached in an instance so you don't have to re-calculate it every time you ask for a post's score:
class Post
def score
#score ||= self[:score] || (votes.count/(Time.now.utc - x.created_at.utc)
end
end
P.S: The SQLLite3 equivalent is:
strftime('%s','now') - strftime('%s',posts.created_at)
You shouldn't use sort! if you are going to assign to the same variable (it is wrong in this case), you should change the sort to:
#posts.sort!{|a, b| vote_score(b) <=> vote_score(a) }
It looks like you are counting the votes for Post each time you call another Post which is hitting the database quite a bit and probably the source of the toll on your load times, you can use a counter_cache to count each time a vote is made and store that in the posts table. This will make it so you only do one db query to load from the posts table.
http://guides.rubyonrails.org/association_basics.html
Related
I have a method that ranks user's response rates in our system called ranked_users
def ranked_users
User.joins(:responds).group(:id).select(
"users.*, SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) avg, RANK () OVER (
ORDER BY SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) DESC, CASE WHEN users.id = '#{
current_user.id
}' THEN 1 ELSE 0 END DESC
) rank"
)
.where('users.active = true')
.where('answers.created_at BETWEEN ? AND ?', Time.now - 12.months, Time.now)
end
result = ranked_users
I then take the top three with top_3 = ranked_users.limit(3)
If the user is not in the top 3, I want to append them with their rank to the list:
user_rank = result.find_by(id: current_user.id)
Whenever I call user_rank.rank it returns 1. I know this is because it's applying the find_by clause first and then ranking them. Is there a way to enforce the find_by clause happens only on the result of the first query? I tried doing result.load.find_by(...) but had the same issue. I could convert the entire result into an array but I want the solution to be highly scalable.
If you expect lots of users with lots of answers and high load on your rating system - you can create a materialized view for the ranking query with (user_id, avg, rank, etc.) and refresh it periodically instead of calculating rank every time (say, a few times per day or even less often). There's gem scenic for this.
You can even have indexes on rank and user id on the view and your query will be two simple fast reads from it.
I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end
Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end
I have an Article model with a view_count attribute.
I want to create a rank method in the Article model that returns the rank of the article. Ie. highest view_count gets rank: 1.
How would I do this? My first instinct was to query for Article.all and write some ruby code to do this. Is there a more efficient way of doing this via queries?
Not particularly efficient, but something like
def rank
(Article.where('view_count > ?', self.view_count).count) + 1
end
You can order the elements while getting them from the Databse
Article.order(:view_count) maybe with a limit Article.order(:view_count).limit(10)
If you really need to store a rank you could now do something like:
top_ten = Article.order(:view_count).limit(10)
top_ten.each_with_index.do |article,i|
article.rank = i
article.save
end
But you would have to update this every time any view_countchanges so it should be better to just
store the view_count and display the rank while you generate the list.
I have a table named acts, and I'll like to run a query that rolls up act values for a whole week. I'd like to make sure the query always returns one row for each day of the week, even if there are no records for that day. Right now I'm doing it like this:
def self.this_week_totals
sunday = Time.now.beginning_of_week(:sunday).strftime("%Y-%m-%d")
connection.select_values(<<-EOQ)
SELECT COALESCE(SUM(end_time - start_time), '0:00:00') AS total_time
FROM generate_series(0, 6) AS s(t)
LEFT JOIN acts
ON acts.day = '#{sunday}'::date + s.t
WHERE deleted_at IS NULL
GROUP BY s.t
ORDER BY s.t
EOQ
end
Is there any way I could make this a named scope on the Act class so it can be combined with other conditions, for example to filter the Acts by a client_id? Since acts isn't in my FROM, but is part of the LEFT JOIN, I'm guessing not, but perhaps someone out there knows a way.
Edit: Just to clarify, the goal is for this method to always return exactly 7 Act objects, regardless of what's in the database.
if you want your query object to be chainable it must be an ActiveRelation object
where, select, order and the other Arel objects return ActiveRelation objects that are chainable, so if the below works you can chain off of the returned query object
note in rails 3 and up having a class method that returns an ActiveRelation is basically the same as a scope, they are both chainable query objects
class Act
def self.this_week_totals
sunday = Time.now.beginning_of_week(:sunday).strftime("%Y-%m-%d")
select("COALESCE(SUM(end_time - start_time), '0:00:00') AS total_time")
.from("generate_series(0, 6) AS s(t)")
.joins("LEFT JOIN acts ON acts.day = '#{sunday}'::date + s.t")
.where("deleted_at IS NULL")
.group("s.t")
.order("s.t")
end
# ...
end
client_id = params[:client_id]
Act.this_week_totals.where("client_id = ?", client_id)
http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-from
Although I really thought I could use the solution from #house9, I don't see any way to avoid compromising on at least one of these goals:
Always yield 7 Act objects.
Return an ActiveRelation so I can compose this method with other scopes.
Permit joining to the clients table.
Here is the part-SQL/part-Ruby solution I'm actually using, which sadly gives up on point #2 above and also returns tuples rather than Acts:
def self.this_week(wk=0)
sunday = Time.now.beginning_of_week(:sunday)
sunday += wk.weeks
not_deleted.where("day BETWEEN ? AND ?", sunday, sunday + 7.days)
end
scope :select_sum_total_hours,
select("EXTRACT(EPOCH FROM COALESCE(SUM(end_time - start_time), '0:00:00'))/3600 AS total_hours")
scope :select_sum_total_fees,
joins(:client).
select("SUM(COALESCE(clients.rate, 0) * EXTRACT(EPOCH FROM COALESCE(end_time - start_time, '0:00:00'))/3600) AS total_fees")
def self.this_week_totals_by_day(wk=0)
totals = Hash[
this_week(wk)
.select("EXTRACT(DAY FROM day) AS just_day")
.select_sum_total_hours
.select_sum_total_fees
.group("day")
.order("day")
.map{|act| [act.just_day, [act.total_hours.to_f, act.total_fees.to_money]]}
]
sunday = Time.now.beginning_of_week(:sunday)
sunday += wk.weeks
(0..6).map do |x|
totals[(sunday + x.days).strftime("%d")] || [0, 0.to_money]
end
end
That could be DRYed up a bit, and it would produce errors if there were ever a month with fewer than 7 days, but hopefully it shows what I'm doing. The scopes for this_week, select_sum_total_hours, and select_sum_total_fees are used elsewhere, so I want to pull them out into scopes rather than repeating them in several big raw SQL strings.
I want to be able to show posts and have them sorted by a couple criteria, first by the amount of votes they have on them and second by the date at which they were created. I don't want posts that are more than a week old being displayed so only posts in the last week. I tried doing this:
<%= render #posts.sort_by { |post| post.votes.count if post.created_at < 1.week.ago.utc }.reverse %>
but it gave me an error of comparison of NilClass with 2 failed
I know the code works by just sorting posts by vote count but I also want to limit time so could someone tell me how this can be done. I'm still new so sorry for the simplicity.
Solution by #Salil is ok, but I would suggest adding counter_cache column ( http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html ) and changing recent_post code (from this comment: https://stackoverflow.com/a/11498634/1392074 ) into:
def self.recent_posts
Post.where("created_at >= ?", 1.week.ago.utc).order("votes_count DESC, created_at DESC")
end
The code to find posts should be in Model and not on Views.
There is always a good idea that you should fetch the records which we need to display instead fetching the records and showing some of it on views.
You should do something like following
in your post.rb
def self.recent_posts
Post.select("p.*, COUNT(v.id) AS count").where("post.created_at >= 1.week.ago.utc").joins("p LEFT JOIN votes v on p.id=v.post_id").order("count, created_at DESC")
end