Making this ActiveRecord query more efficient? - ruby-on-rails

I have User and Gift models. A user can send gifts to another users. I have a relational table telling me which users received a gift. On the other hand, a user belongs to a School, which can be free or paid.
I want the count of users that have received a gift in the last week for a specific type of school (this is, free or paid).
I can do:
Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).collect(&:gift_recipients).flatten.uniq.count.
Or, I want to know how many users sent gifts the last week. This works:
Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).collect(&:user_id).uniq.count.
If I want to know how many users have sent or received a gift in the last week I can do:
(Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).collect(&:gift_recipients).flatten + Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).collect(&:user_id)).uniq.count
All this works fine but if the database is big enough this is really slow. Do you have any suggestions to make it more efficient, maybe using raw SQL where needed?
"gifts"
user_id (integer)
school_id (integer)
created_at (datetime)
updated_at (datetime)
"gift_recipients" is a table like
gift_id | recipient_id,

You do not want to do this using collect(), which is loading all of the results into memory and filtering them within an Array of ActiveRecords. This is slow and dangerous, as it could potential leak/use all of the memory available, depending on the size of the data vs. your server.
Once you post your schema I can help you query/aggregate this in SQL, which is the right way to do it.
For example, instead of:
Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).collect(&:user_id).uniq.count
You should use:
Gift.joins(:schools).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).count('distinct user_id')
...which will count the distinct user_ids in SQL and return the result instead of returning all of the objects and counting them in memory.

I saw this old post and I wanted to make a couple of comments:
As Winfield said
Gift.joins(:school).where("created_at >= ? AND schools.free_school = ?", Time.now.beggining_of_week, true).count('distinct user_id')
is a good way of doing this. I would do
Gift.joins(:school).count('distinct user_id', :conditions => ["gifts.created_at >= ? AND free_school = ?", Time.now.beginning_of_week, true])
but just because this is nicer to my eyes, a personal thing, you can check that both produces exactly the same SQL query. Note that is necessary to write
gifts.created_at
to avoid ambiguity because both tables has a column with this name, in the case of the column name
free_school
there is no ambiguity as this is not a column name in gifts tables. For the first query i was doing
Gift.joins(:school).where("created_at >= ? AND schools.free_school = ?", Time.now.beginning_of_week, true).collect(&:user_id).uniq.count
which is awkward. This works better
Gift.joins(:school).count("distinct user_id", :conditions => ["gifts.created_at >= ? AND free_school = ?", Time.now.beginning_of_week, true])
which avoid the problem of bringing gifts to memory and filtering them with ruby.
Up to this there's nothing new. The key point here is that my problem was calculating the number of users who sent or received a gift during the last week. For this I came up with the following
senders_ids = Gift.joins(:school).find(:all, :select => 'distinct user_id', :conditions => ['gifts.created_at >= ? AND free_school = ?', Time.now.beginning_of_week, type]).map {|g| g.user_id}
receivers_ids = Gift.joins(:school).find(:all, :select => 'distinct rec.recipient_id', :conditions => ['gifts.created_at >= ? AND free_school = ?', Time.now.beginning_of_week, type], :joins => "INNER JOIN gifts_recipients as rec on rec.gift_id = gifts.id").map {|g| g.recipient_id}
(senders_ids + receivers_ids).uniq.count
I'm pretty sure that exists a better way of doing this, I mean, returning exactly this number in a single SQL query, but at least the results are arrays of objects containing only the id (recipient_id for the receivers case), not bringing all objects into memory. Well this is just hoping to be useful for someone new to sql queries through rails like me :).

Related

ActiveRecord query performance, performing a where after initial query has been executed

I have this query:
absences = Absence.joins(:user).where('users.company_id = ?', #company.id).where('"from" <= ? and "to" >= ?', self.date, self.date).group('user_id').select('user_id, sum(hours) as hours')
This will return user_id's with a total of hours.
Now I need to to loop through all users of the company and do some calculations.
company.users.each do |user|
tc = TimeCheck.find_or_initialize_by(:user_id => user.id, :date => self.date)
tc.expected_hours = user.working_hours - absences.where('user_id = ?', user.id).first.hours
end
For performance reasons I want to have only one query to the absences table (the first one) and afterwards to look in memory for the correct user. How do I best accomplish this? I believe by default absences will be a ActiveRecord::Relation and not a result set. Is there a command I can use to instruct activerecord to execute the query, and afterwards search in memory?
Or do I need to store absences as array or hash first?
One SQL optimization you could make is:
change:
absences.where('user_id = ?', user.id).first.hours
to:
absences.detect { |u| u.user_id == user.id }.hours
Also, You might not need to loop through company.users. You may be able to loop through absences instead, depending on the business requirements.

How to query the result of a method in Ruby (rails)

I'm struggling with a particular feature.
In my users model I have a method to work out their age based on the current date less their first order date.
I'd like to be able to find all users who are older than X days. I can find active users by querying a column called state for 'active' users. But I'm unsure how to query the result of the age method to find users older than X.
Does anyone have any ideas?
Many thanks and seasons greetings.
**Edit
In postgresql I would write;
WITH
firstbill as (
SELECT
DISTINCT(user_id) as customer,
DATE(MIN(billed_at)) as first_order
FROM orders
WHERE state = 'shipped'
GROUP BY 1
ORDER BY 1)
SELECT count*
FROM
(SELECT *, (current_date - first_order) as age
FROM firstbill
JOIN users on users.id = customer) as t2
WHERE age >= 21
I have tried using User.find_by_sql["above query"] but that returns an array not activerecord relation which makes any further joins a little harder
You cannot really query for the return value of a method. Because to do so, you need to load all users and then call that method on every user, like this User.all.select(&:your_method?). That will be very slow if you have many users.
But for your particular example you can write something like this to let the database return the correct users (assuming you have a first_ordercolumn on your user):
User.where('first_order <= ?', 90.days.ago)
or
User.where('first_order <= ?', 1.month.ago)
I think the following startment should return the same users than your Postgresql example:
User.
select('users.*, MIN(DATE(orders.billed_at)) AS first_order_on').
joins('orders ON orders.user_id = users.id'). # just `(:orders)` with `has_many :order` on User
where('orders.state = ?', 'shipped').
group('users.id').
having('first_order_on <= ?', 21.days.ago.to_date)
Solved by using
scope :acquired, User.joins(:orders).where("orders.state = ?", "shipped")
scope :older_than_age, ->(age) {
acquired.group("users.id").having("(current_date - date(min(orders.shipped_at))) >= ?", age)
}

Ambiguous table reference

This problem seems fairly simple, but I've never encountered one like this.
Here are the settings:
Post has_many :reader_links
Post has_many :readers, :through => :reader_links
I need to find out if there are readers reading a post.
#post.reader_links.where('created_at >= ?', 45.minutes.ago).any?
Works great.
#post.readers.where('created_at >= ?', 45.minutes.ago),any?
throws an ambiguous table column error because it's confused whether the created_at column means that of reader object or reader_link object. This happens because the class of a reader is actually User. How do I query readers who were created by reader_links 45 minutes ago?
I'm looking for something like..
#post.readers.where('reader_link.created_at >= ?', 45.minutes.ago)
If I get it right, you just need to specify which created_at column you're talking about:
#post.readers.where('reader_links.created_at >= ?', 45.minutes.ago).any?
You coul merge the scopes to get rid of ambigious errors, so each scope has it's own visibility range.
using meta_where:
Post.scoped & (ReaderLink.scoped & User.where(:created_at.gt => 45.minutes.ago))
without meta_where:
Post.scoped.merge(ReaderLink.scoped.merge(User.where('created_at >= ?', 45.minutes.ago))
This will result in arrays of Post objects containing the reader_links and readers data for all readers younger than 45 minutes. Please try it in the rails console.
Edit: for a single post
post_with_fresh_users = Post.where('id = ?', some_id).merge(ReaderLink.scoped.merge(User.where('created_at >= ?', 45.minutes.ago))
Edit: all fresh readers of a post (different order)
fresh_readers_for_post = User.where('created_at >= ?', 45.minutes.ago).merge(ReaderLink.scoped.merge(Post.where('id = ?', #post.id))
How it works:
http://benhoskin.gs/2012/07/04/arel-merge-a-hidden-gem

Rails 3. How to perform a "where" query by a virtual attribute?

I have two models: ScheduledCourse and ScheduledSession.
scheduled_course has_many scheduled_sessions
scheduled_session belongs_to scheduled_course
ScheduledCourse has a virtual attribute...
def start_at
s = ScheduledSession.where("scheduled_course_id = ?", self.id).order("happening_at ASC").limit(1)
s[0].happening_at
end
... the start_at virtual attribute checks all the ScheduledSessions that belongs to the ScheduledCourse and it picks the earliest one. So start_at is the date when the first session happens.
Now I need to write in the controller so get only the records that start today and go into the future. Also I need to write another query that gets only past courses.
I can't do the following because start_at is a virtual attribute
#scheduled_courses = ScheduledCourse.where('start_at >= ?', Date.today).page(params[:page])
#scheduled_courses = ScheduledCourse.where('start_at <= ?', Date.today)
SQLite3::SQLException: no such column: start_at: SELECT "scheduled_courses".* FROM "scheduled_courses" WHERE (start_at >= '2012-03-13') LIMIT 25 OFFSET 0
You can't perform SQL queries on columns that aren't in the database. You should consider making this a real database column if you intend to do queries on it instead of a fake column; but if you want to select items from this collection, you can still do so. You just have to do it in Ruby.
ScheduledCourse.page(params).find_all {|s| s.start_at >= Date.today}
Veraticus is right; You cannot use virtual attributes in queries.
However, I think you could just do:
ScheduledCourse.joins(:scheduled_sessions).where('scheduled_courses.happening_at >= ?', Date.today)
It will join the tables together by matching ids, and then you can look at the 'happening_at' column, which is what your 'start_at' attribute really is.
Disclaimer: Untested, but should work.
I wonder if this would be solved by a subquery ( the subquery being to find the earliest date first). If so, perhaps the solution here might help point in a useful direction...

Rails 3 Query - IN (?) can be blank

This query won't return any records, when hidden_episodes_ids is empty.
:conditions => ["episodes.show_id in (?) AND air_date >= ? AND air_date <= ? AND episodes.id NOT IN (?)", #show_ids, #start_day, #end_day, hidden_episodes_ids]
If it's empty, the SQL will look like NOT IN (null)
So my solution is:
if hidden_episodes_ids.any?
*mode code*:conditions => ["episodes.show_id in (?) AND air_date >= ? AND air_date <= ? AND episodes.id NOT IN (?)", #show_ids, #start_day, #end_day, hidden_episodes_ids]
else
*mode code*:conditions => ["episodes.show_id in (?) AND air_date >= ? AND air_date <= ?", #show_ids, #start_day, #end_day]
end
But it is rather ugly (My real query is actually 5 lines, with joins and selects etc..)
Is there a way to use a single query and avoid the NOT IN (null)?
PS: These are old queries migrated into Rails 3, hence the :conditions
You should just use the where method instead as that'll help clean all of this up. You just chain it together:
scope = Thing.where(:episodes => { :show_id => #show_ids })
scope = scope.where('air_date BETWEEN ? AND ?', #start_day, #end_day)
if (hidden_episode_ids.any?)
scope = scope.where('episodes.id NOT IN (?)', hidden_episode_ids)
end
Being able to conditionally modify the scope avoids a lot of duplication.

Resources