Possible override the way count works, or finding a better way, altogether to do this - ruby-on-rails

I have this scope in my artist model that gives me the artists, in the order of their popularity within a certain time period. popularity in the popularity_caches table is computed every day.
scope :by_popularity, lambda { |*args|
options = (default_popularity_options).merge(args[0] || {})
select("SUM(popularity) AS popularity, artists.*").
from("popularity_caches FORCE INDEX (popularity_cache_group), artists FORCE INDEX (index_artists_on_id_and_genre_id)").
where("popularity_caches.target_type = 'Artist'").
where("popularity_caches.target_id = artists.id").
where("popularity_caches.time_frame = ?", options[:time_frame]).
where("popularity_caches.started_on > ?", options[:started_on]).
where("popularity_caches.started_on < ?", options[:ended_on]).
group("artists.id").
order("popularity DESC")
}
This seems to work except when I want to get the count: Artist.by_popularity.count. I get a funky hash in return (probably the count of artists that have popularity_caches within that period):
#<OrderedHash {295954=>1, 20143=>1, 157532=>1, 181291=>1, 300086=>1, 50100=>1, 262898=>1, 293888=>1, 130158=>2, 279943=>1, 336758=>1, 100201=>1, 134290=>2, 22726=>3, 144620=>2, 62497=>2 # snip
This is the SQL I probably want in return:
SELECT COUNT(DISTINCT(artists.id)) AS count_all
FROM popularity_caches FORCE INDEX (popularity_cache_group), artists FORCE INDEX (index_artists_on_id_and_genre_id)
WHERE (popularity_caches.target_type = 'Artist')
AND (popularity_caches.target_id = artists.id)
AND (popularity_caches.time_frame = 'week')
AND (popularity_caches.started_on > '2011-02-28 16:00:00')
AND (popularity_caches.started_on < '2011-10-05')
ORDER BY popularity DESC
To get the count, I had to make a separate method that pretty much does the same thing, except the SQL is formed differently. It kinds sucks through, because when I want to paginate, I have to pass two things:
#artists = Artists.by_popularity(some args).paginate(
:total_entries => Artist.count_by_popularity(pass in the same args here as in Artist.by_popularity),
:per_page => 5,
page => ...
)
That smells to me because it's very brittle.
Is there a way to do this in ARel? Maybe override how it counts things (distinct artists.id) and removing the group by so it doesn't return a hash for the count?
Thanks!

Solved with the amazing scuttle.io:
PopularityCach.select(
Arel::Nodes::Group.new(Artist.arel_table[:id]).count.as('count_all')
).where(
PopularityCach.arel_table[:target_type].eq('Artist').and(
PopularityCach.arel_table[:target_id].eq(Artist.arel_table[:id]).and(
PopularityCach.arel_table[:time_frame].eq('week').and(
PopularityCach.arel_table[:started_on].gt('2011-02-28 16:00:00').and(
PopularityCach.arel_table[:started_on].lt('2011-10-05')
)
)
)
)
).order(:popularity).reverse_order

Related

ActiveRecord query performance, performing a where after initial query has been executed

I have this query:
absences = Absence.joins(:user).where('users.company_id = ?', #company.id).where('"from" <= ? and "to" >= ?', self.date, self.date).group('user_id').select('user_id, sum(hours) as hours')
This will return user_id's with a total of hours.
Now I need to to loop through all users of the company and do some calculations.
company.users.each do |user|
tc = TimeCheck.find_or_initialize_by(:user_id => user.id, :date => self.date)
tc.expected_hours = user.working_hours - absences.where('user_id = ?', user.id).first.hours
end
For performance reasons I want to have only one query to the absences table (the first one) and afterwards to look in memory for the correct user. How do I best accomplish this? I believe by default absences will be a ActiveRecord::Relation and not a result set. Is there a command I can use to instruct activerecord to execute the query, and afterwards search in memory?
Or do I need to store absences as array or hash first?
One SQL optimization you could make is:
change:
absences.where('user_id = ?', user.id).first.hours
to:
absences.detect { |u| u.user_id == user.id }.hours
Also, You might not need to loop through company.users. You may be able to loop through absences instead, depending on the business requirements.

Most Efficient Way to Get Counts of Users with Certain Attributes in Ruby

I have a collection of users with various statuses: active, disabled, or deleted (as an enum). I want a count of users with each status as well as a count of the total number of users. What is the most efficient way for me to do that?
I've read the questions on size vs. length vs. count in Ruby and that makes me think I should load all of the user records and then iterate over the collection multiple times to get the length of each status array.
This is what my code looks like currently:
# pagination code omitted...
all_users = User.all
total_count = all_users.length
active_count = all_users.select {|u| u.status == User.statuses['active']}.length
disabled_count = all_users.select {|u| u.status == User.statuses['disabled']}.length
deleted_count = all_users.select {|u| u.status == User.statuses['deleted']}.length
The requests from the client take about 1.25-1.5 seconds as written for 1,000 users.
I've also tried making multiple DB queries with code like this:
# pagination code omitted...
total_count = User.count
active_count = User.where(status: User.statuses['active']).count
disabled_count = User.where(status: User.statuses['disabled']).count
deleted_count = User.where(status: User.statuses['deleted']).count
That might be marginally faster by ~100ms. Is there a faster way to do this?
I'm not sure if it is relevant, but for background info: I am using Rails as an API in this context to an AngularJS frontend. I am using Kaminari to paginate the collection, but I still need counts of each status. I am in a B2B environment so it is unlikely that any instance will have more than 1,000 users. I don't need to scale higher than that.
Thanks in advance!
Do it all at once, in the database by grouping your count query.
User.group(:status).count
Then to get the total number of users just sum the result. Here's an example from one of my tables. Here I'm grouping on a boolean field, but you can group on whatever you want.
> Course.group(:is_enabled).count
=> {false=>46, true=>26524}
That might be marginally faster by ~100ms.
Create an index on your 'status' column in your database:
# in your terminal
rails g migration AddIndexOnStatusOfUsers
# in db/migrate/xxxxx_add_index_on_status_of_users.rb
def change
add_index :users, :status
end
You should benchmark them all and let us know. Would be interesting. Pure SQL answers are always more scalable of course...
u = User.select('user.status')
active_count = 0
disabled_count = 0
deleted_count = 0
u.each do |u|
if u.status = 'active'
active_count += 1
elsif u.status = 'deleted'
deleted_count +=1
else
disabled_count +=1
end
end

find_each with order and limit

I need to limit and order batches of records and am using find_each. I've seen a lot of people asking for this and no really good solution. If I've missed it, please post a link!
I have 30M records and want to deal with 10M with the highest value in the weight column.
I tried using this method someone wrote: find_each_with_order but can't get it to work.
The code from that site doesn't take order as an option. Seems strange given that the name is find_each_with_order. I added it as follows:
class ActiveRecord::Base
# normal find_each does not use given order but uses id asc
def self.find_each_with_order(options={})
raise "offset is not yet supported" if options[:offset]
page = 1
limit = options[:limit] || 1000
order = options[:order] || 'id asc'
loop do
offset = (page-1) * limit
batch = find(:all, options.merge(:limit=>limit, :offset=>offset, :order=>order))
page += 1
batch.each{|x| yield x }
break if batch.size < limit
end
end
and I'm trying to use it as follows:
class GetStuff
def self.grab_em
file = File.open("1000 things.txt", "w")
rels = Thing.find_each_with_order({:limit=>100, :order=>"weight desc"})
binding.pry
things.each do |t|
binding.pry
file.write("#{t.name} #{t.id} #{t.weight}\n" )
if t.id % 20 == 0
puts t.id.to_s
end
end
file.close
end
end
BTW I have the data in postgres and am going to grab a subset and move it to neo4j, so I'm tagging with neo4j in case any of you neo4j people know how to do this. thanks.
Not exactly sure if this is what you're looking for, but you can do something like this:
weight = Thing.order(:weight).select(:weight).last(10_000_000).first.weight
Thing.where("weight > ?", weight).find_each do |t|
...your code...
end

how to paginate records from multiple models? (do I need a polymorphic join?)

After quite a bit of searching, I'm still a bit lost. There are a few other similar questions out there that deal with paginating multiple models, but they are either unanswered or they pagainate each model separately.
I need to paginate all records of an Account at once.
class Account
:has_many :emails
:has_many :tasks
:has_many :notes
end
So, I'd like to find the 30 most recent "things" no matter what they are. Is this even possible with the current pagination solutions out there?
Like using some combination of eager loading and Kaminari or will_paginate?
Or, should I first set up a polymorphic join of all these things, called Items. Then paginate the most recent 30 items, then do a lookup of the associated records of those items.
And if so, I'm not really sure what that code should look like. Any suggestions?
Which way is better? (or even possible)
Rails 3.1, Ruby 1.9.2, app not in production.
with will_paginate :
#records = #do your work and fetch array of records you want to paginate ( various types )
then do the following :
current_page = params[:page] || 1
per_page = 10
#records = WillPaginate::Collection.create(current_page, per_page, records.size) do |pager|
pager.replace(#records)
end
then in your view :
<%=will_paginate #records%>
Good question... I'm not sure of a "good" solution, but you could do a hacky one in ruby:
You'd need to first fetch out the 30 latest of each type of "thing", and put them into an array, indexed by created_at, then sort that array by created_at and take the top 30.
A totally non-refactored start might be something like:
emails = Account.emails.all(:limit => 30, :order => :created_at)
tasks = Account.tasks.all(:limit => 30, :order => :created_at)
notes = Account.notes.all(:limit => 30, :order => :created_at)
thing_array = (emails + tasks + notes).map {|thing| [thing.created_at, thing] }
# sort by the first item of each array (== the date)
thing_array_sorted = thing_array.sort_by {|a,b| a[0] <=> b[0] }
# then just grab the top thirty
things_to_show = thing_array_sorted.slice(0,30)
Note: not tested, could be full of bugs... ;)
emails = account.emails
tasks = account.tasks
notes = account.notes
#records = [emails + tasks + notes].flatten.sort_by(&:updated_at).reverse
#records = WillPaginate::Collection.create(params[:page] || 1, 30, #records.size) do |pager|
pager.replace(#records)
end
Thats it... :)

Rails 2.3.5 Problem Building Conditions Array dynamically when using in (?)

Rails 2.3.5
I've looked at a number of other questions relating to building conditions dynamically for an ActiveRecord find.
I'm aware there are some great gems out there like search logic and that this is better in Rails3. However, I'm using geokit for geospacial search and I'm trying to build just a standard conditions set that will allow me to combine a slew of different filters.
I have 12 different filters that I'm trying to combine dynamically for an advanced search. I need to be able to mix equality, greater than, less than, in (?) and IS NULLs conditions.
Here's an example of what I'm trying to get working:
conditions = []
conditions << ["sites.site_type in (?)", params[:site_categories]] if params[:site_categories]
conditions << [<< ["sites.operational_status = ?", 'operational'] if params[:oponly] == 1
condition_set = [conditions.map{|c| c[0] }.join(" AND "), *conditions.map{|c| c[1..-1] }.flatten]
#sites = Site.find :all,
:origin => [lat,lng],
:units => distance_unit,
:limit => limit,
:within => range,
:include => [:chargers, :site_reports, :networks],
:conditions => condition_set,
:order => 'distance asc'
I seem to be able to get this working fine when there are only single variables for the conditions expression but when I have something that is a (?) and has an array of values I'm getting an error for the wrong number of bind conditions. The way I'm joining and flattening the conditions (based on the answer from Combine arrays of conditions in Rails) seems not to handle an array properly and I don't understand the flattening logic enough to track down the issue.
So let's say I have 3 values in params[:site_categories] I'll the above code leaves me with the following:
Conditions is
[["sites.operational_status = ?", "operational"], ["sites.site_type in (?)", ["shopping", "food", "lodging"]]]
The flattened attempt is:
["sites.operational_status = ? AND sites.site_type in (?)", ["operational"], [["shopping", "food", "lodging"]]]
Which gives me:
wrong number of bind variables (4 for 2)
I'm going to step back and work on converting all of this to named scopes but I'd really like to understand how to get this working this way.
Rails 4
users = User.all
users = User.where(id: params[id]) if params[id].present?
users = User.where(state: states) if states.present?
users.each do |u|
puts u.name
end
Old answer
Monkey patch the Array class. Create a file called monkey_patch.rb in config/initializers directory.
class Array
def where(*args)
sql = args.first
unless (sql.is_a?(String) and sql.present?)
return self
end
self[0] = self.first.present? ? " #{self.first} AND #{sql} " : sql
self.concat(args[1..-1])
end
end
Now you can do this:
cond = []
cond.where("id = ?", params[id]) if params[id].present?
cond.where("state IN (?)", states) unless states.empty?
User.all(:conditions => cond)

Resources