I have a model called Impression - which counts the views of each Card (also a model).
I am trying to print a table with the cards, that has at least 10 views in the last month.
So I started with -
cards = Card.joins(:impressions).where("impressions.created_at > ? AND impressions.created_at < ?", Date.today-30.days, Date.today).uniq
And then I did -
cards.select {|card| card.impressions.count >= 10 }
But it runs a long long times. I want something much more efficient.
Any ideas for counting the number of impressions and sorting them?
I want to do it efficiently as I can - without iterating over the whole DB with the N+1 problem, cause it could get pretty ugly.
Does your impressions tables' created_at column have index?
If you are querying it and there is none - you could add it by generating a migration file.
add_index(:impressions, :created_at)
And you could use pure SQL to add condition to your query like:
cards = Card.joins(:impressions)
.where("impressions.created_at > ? AND impressions.created_at < ?", Date.today-30.days, Date.today)
.group('cards.id')
.having('COUNT(impressions.*) >= 10')
You can use .last() function
Try this
cards = Card.joins(:impressions).where("impressions.created_at BETWEEN ? AND ?", 1.month.ago.beginning_of_month , 1.month.ago.end_of_month).uniq.last(10)
I would use having and AREL
Try this :
cards = Card.joins(:impressions).where(impressions: {created_at: (Date.today-30.days..Date.today)}).having(Impressions.arel_table[:id].count.gteq(10)).uniq
Don't know how your relations looks and the quantity of data you have. Thus assuming your relations look like
Card
class Card
has_many :impressions
end
Impression
class Impression
belongs_to :card
end
my suggestion is to try something like this:
cards_id = Card.pluck(:id)
past_one_month_impressions_arel = Impression.where("impressions.created_at > ? AND impressions.created_at < ?", Date.today-30.days, Date.today)
cards_id_having_atleast_10_views = cards_id.select do |c_id|
arel = past_one_month_impressions_arel.where(impressions: { card_id: cid } )
arel.count >= 10
end
cards = Card.where(id: cards_id_having_atleast_10_views)
Also you may try using find_each to iterate in batches if you have a lot of data.
Hope that helps.
Thanks.
Related
In my Rails App, I did a alot of range search to group objects, like
scope :best_of_the_week, ->(time) do
start_time = time.beginning_of_week
end_time = time.end_of_week
where("created_at > ? AND created_at < ?", start_time, end_time).where('votes_count > ?', 300).order('votes_count DESC').first(8)
end
In this case, do I need to add index to created_at? and what about votes_count?
Addtionally, how can I elegantly combine the first two where searches? Or does combining them make any difference?
If you want max performance to this query, create an index for both. If you don't want to create too many indexes, you should index created_at, date seems do have a bigger range as the time goes (and size of database).
I like to use the find_by_sql and make SELECT retrieve just the essential data to improve performance, if you have too many var chars fields this will have a nice impact.
Just for sintax sugar
where("between ? and ?", start_time, end_time).(other stuff)
Let's say you have an assocation in one of your models like this:
class User
has_many :articles
end
Now assume you need to get 3 arrays, one for the articles written yesterday, one of for the articles written in the last 7 days, and one of for the articles written in the last 30 days.
Of course you might do this:
articles_yesterday = user.articles.where("posted_at >= ?", Date.yesterday)
articles_last7d = user.articles.where("posted_at >= ?", 7.days.ago.to_date)
articles_last30d = user.articles.where("posted_at >= ?", 30.days.ago.to_date)
However, this will run 3 separate database queries. More efficiently, you could do this:
articles_last30d = user.articles.where("posted_at >= ?", 30.days.ago.to_date)
articles_yesterday = articles_last30d.select { |article|
article.posted_at >= Date.yesterday
}
articles_last7d = articles_last30d.select { |article|
article.posted_at >= 7.days.ago.to_date
}
Now of course this is a contrived example and there is no guarantee that the array select will actually be faster than a database query, but let's just assume that it is.
My question is: Is there any way (e.g. some gem) to write this code in a way which eliminates this problem by making sure that you simply specify the association conditions, and the application itself will decide whether it needs to perform another database query or not?
ActiveRecord itself does not seem to cover this problem appropriately. You are forced to decide between querying the database every time or treating the association as an array.
There are a couple of ways to handle this:
You can create separate associations for each level that you want by specifying a conditions hash on the association definition. Then you can simply eager load these associations for your User query, and you will be hitting the db 3x for the entire operation instead of 3x for each user.
class User
has_many articles_yesterday, class_name: Article, conditions: ['posted_at >= ?', Date.yesterday]
# other associations the same way
end
User.where(...).includes(:articles_yesterday, :articles_7days, :articles_30days)
You could do a group by.
What it comes down to is you need to profile your code and determine what's going to be fastest for your app (or if you should even bother with it at all)
You can get rid of the necessity of checking the query with something like the code below.
class User
has_many :articles
def article_30d
#articles_last30d ||= user.articles.where("posted_at >= ?", 30.days.ago.to_date)
end
def articles_last7d
#articles_last7d ||= articles_last30d.select { |article| article.posted_at >= 7.days.ago.to_date }
end
def articles_yesterday
#articles_yesterday ||= articles_last30d.select { |article| article.posted_at >= Date.yesterday }
end
end
What it does:
Makes only one query maximum, if any of the three is used
Calculates only the used array, and the 30d version in any case, but only once
It does not however simplifies the initial 30d query even if you do not use it. Is it enough, or you need something more?
So I have a Vendor model, and a Sale model. An entry is made in my Sale model whenever an order is placed via a vendor.
On my vendor model, I have 3 cache columns. sales_today, sales_this_week, and sales_lifetime.
For the first two, I calculated it something like this:
def update_sales_today
today = Date.today.beginning_of_day
sales_today = Sale.where("created_at >= ?", today).find_all_by_vendor_id(self.id)
self.sales_today = 0
sales_today.each do |s|
self.sales_today = self.sales_today + s.amount
end
self.save
end
So that resets that value everytime it is accessed and re-calculates it based on the most current records.
The weekly one is similar but I use a range of dates instead of today.
But...I am not quite sure how to do Lifetime data.
I don't want to clear out my value and have to sum all the Sale.amount for all the sales records for my vendor, every single time I update this record. That's why I am even implementing a cache in the first place.
What's the best way to approach this, from a performance perspective?
I might use ActiveRecord's sum method in this case (docs). All in one:
today = Date.today
vendor_sales = Sale.where(:vendor_id => self.id)
self.sales_today = vendor_sales.
where("created_at >= ?", today.beginning_of_day).
sum("amount")
self.sales_this_week = vendor_sales.
where("created_at >= ?", today.beginning_of_week).
sum("amount")
self.sales_lifetime = vendor_sales.sum("amount")
This would mean you wouldn't have to load lots of sales objects in memory to add the amounts.
You can use callbacks on the create and destroy events for your Sales model:
class SalesController < ApplicationController
after_save :increment_vendor_lifetime_sales
before_destroy :decrement_vendor_lifetime_sales
def increment_vendor_lifetime_sales
vendor.update_attribute :sales_lifetime, vendor.sales_lifetime + amount
end
def decrement_vendor_lifetime_sales
vendor.update_attribute :sales_lifetime, vendor.sales_lifetime - amount
end
end
This problem seems fairly simple, but I've never encountered one like this.
Here are the settings:
Post has_many :reader_links
Post has_many :readers, :through => :reader_links
I need to find out if there are readers reading a post.
#post.reader_links.where('created_at >= ?', 45.minutes.ago).any?
Works great.
#post.readers.where('created_at >= ?', 45.minutes.ago),any?
throws an ambiguous table column error because it's confused whether the created_at column means that of reader object or reader_link object. This happens because the class of a reader is actually User. How do I query readers who were created by reader_links 45 minutes ago?
I'm looking for something like..
#post.readers.where('reader_link.created_at >= ?', 45.minutes.ago)
If I get it right, you just need to specify which created_at column you're talking about:
#post.readers.where('reader_links.created_at >= ?', 45.minutes.ago).any?
You coul merge the scopes to get rid of ambigious errors, so each scope has it's own visibility range.
using meta_where:
Post.scoped & (ReaderLink.scoped & User.where(:created_at.gt => 45.minutes.ago))
without meta_where:
Post.scoped.merge(ReaderLink.scoped.merge(User.where('created_at >= ?', 45.minutes.ago))
This will result in arrays of Post objects containing the reader_links and readers data for all readers younger than 45 minutes. Please try it in the rails console.
Edit: for a single post
post_with_fresh_users = Post.where('id = ?', some_id).merge(ReaderLink.scoped.merge(User.where('created_at >= ?', 45.minutes.ago))
Edit: all fresh readers of a post (different order)
fresh_readers_for_post = User.where('created_at >= ?', 45.minutes.ago).merge(ReaderLink.scoped.merge(Post.where('id = ?', #post.id))
How it works:
http://benhoskin.gs/2012/07/04/arel-merge-a-hidden-gem
I have two models: ScheduledCourse and ScheduledSession.
scheduled_course has_many scheduled_sessions
scheduled_session belongs_to scheduled_course
ScheduledCourse has a virtual attribute...
def start_at
s = ScheduledSession.where("scheduled_course_id = ?", self.id).order("happening_at ASC").limit(1)
s[0].happening_at
end
... the start_at virtual attribute checks all the ScheduledSessions that belongs to the ScheduledCourse and it picks the earliest one. So start_at is the date when the first session happens.
Now I need to write in the controller so get only the records that start today and go into the future. Also I need to write another query that gets only past courses.
I can't do the following because start_at is a virtual attribute
#scheduled_courses = ScheduledCourse.where('start_at >= ?', Date.today).page(params[:page])
#scheduled_courses = ScheduledCourse.where('start_at <= ?', Date.today)
SQLite3::SQLException: no such column: start_at: SELECT "scheduled_courses".* FROM "scheduled_courses" WHERE (start_at >= '2012-03-13') LIMIT 25 OFFSET 0
You can't perform SQL queries on columns that aren't in the database. You should consider making this a real database column if you intend to do queries on it instead of a fake column; but if you want to select items from this collection, you can still do so. You just have to do it in Ruby.
ScheduledCourse.page(params).find_all {|s| s.start_at >= Date.today}
Veraticus is right; You cannot use virtual attributes in queries.
However, I think you could just do:
ScheduledCourse.joins(:scheduled_sessions).where('scheduled_courses.happening_at >= ?', Date.today)
It will join the tables together by matching ids, and then you can look at the 'happening_at' column, which is what your 'start_at' attribute really is.
Disclaimer: Untested, but should work.
I wonder if this would be solved by a subquery ( the subquery being to find the earliest date first). If so, perhaps the solution here might help point in a useful direction...