Rail's ActiveRecord find_each with DISTINCT select - ruby-on-rails

I want to get a list of all unique emails I have in my database and process them in batch using find_each.
The code below works fine until it has more then 1000 records (batch size) to process. Then it breaks after the 1000th record with the error message Primary key not included in the custom select clause
Tourist.select('DISTINCT email').where("DATE(created_at) = ?", Date.today- 1).find_in_batches do |group|
something
end
So, how can I chain all this:
I only need a specific field (email)
I need them to be unique
I need a where a clause
I need a find_each

You have to do it manually with a loop limit and offset
batch_size = 1000
offset = 0
loop do
emails = Tourist.where("DATE(created_at) = ?", Date.today-1).select('DISTINCT email').limit(batch_size).offset(offset)
emails.each do |email|
# your stuff
end
break if emails.size < batch_size
offset += batch_size
end
Of course this is needed only if the request will retrieve a large number of emails. Otherwise simply use Tourist.where(condition).pluck('DISTINCT email').each { |email| your stuff }

Related

Rails .where query chained to sql function, is there a way to call it on the results without converting them to an array?

I have a method that ranks user's response rates in our system called ranked_users
def ranked_users
User.joins(:responds).group(:id).select(
"users.*, SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) avg, RANK () OVER (
ORDER BY SUM(CASE WHEN answers.response != 3 THEN 1 ELSE 0 END ) DESC, CASE WHEN users.id = '#{
current_user.id
}' THEN 1 ELSE 0 END DESC
) rank"
)
.where('users.active = true')
.where('answers.created_at BETWEEN ? AND ?', Time.now - 12.months, Time.now)
end
result = ranked_users
I then take the top three with top_3 = ranked_users.limit(3)
If the user is not in the top 3, I want to append them with their rank to the list:
user_rank = result.find_by(id: current_user.id)
Whenever I call user_rank.rank it returns 1. I know this is because it's applying the find_by clause first and then ranking them. Is there a way to enforce the find_by clause happens only on the result of the first query? I tried doing result.load.find_by(...) but had the same issue. I could convert the entire result into an array but I want the solution to be highly scalable.
If you expect lots of users with lots of answers and high load on your rating system - you can create a materialized view for the ranking query with (user_id, avg, rank, etc.) and refresh it periodically instead of calculating rank every time (say, a few times per day or even less often). There's gem scenic for this.
You can even have indexes on rank and user id on the view and your query will be two simple fast reads from it.

Ruby on Rails with sqlite, trying to query and return results from the last 7 days?

Noob here, I'm trying to query my SQLite database for entries that have been made in the last 7 days and then return them.
This is the current attempt
user.rb
def featuredfeed
#x = []
#s = []
Recipe.all.each do |y|
#x << "SELECT id FROM recipes WHERE id = #{y.id} AND created_at > datetime('now','-7 days')"
end
Recipe.all.each do |d|
#t = "SELECT id FROM recipes where id = #{d.id}"
#x.each do |p|
if #t = p
#s << d
end
end
end
#s
end
This code returns each recipe 6(total number of objects in the DB) times regardless of how old it is.
#x should only be 3 id's
#x = [13,15,16]
if i run
SELECT id FROM recipes WHERE id = 13 AND created_at > datetime('now','-7 days')
1 Rows returned with id 13 is returned
but if look for an id that is more than 7 days old such as 12
SELECT id FROM recipes WHERE id = 12 AND created_at > datetime('now','-7 days')
0 Rows returned
I'm probably over complicating this but I've spent way too long on it at this point.
the return type has to be Recipe.
To return objects created within last 7 days just use where clause:
Recipe.where('created_at >= ?', 1.week.ago)
Check out docs for more info on querying db.
Edit according to comments:
Since you are using acts_as_votable gem, add the votes caching, so that filtering by votes score is straightforward:
Recipe.where('cached_votes_total >= ?', 10)
Ruby is expressive. I would take the opportunity to use a scope. With Active Record Scopes, this query can be represented in a meaningful way within your code, using syntactic sugar.
scope :from_recent_week, -> { where('created_at >= ?', Time.zone.now - 1.week) }
This allows you to chain your scoped query and enhance readability:
Recipe.from_recent_week.each do
something_more_meaningful_than_a_SQL_query
end
It looks to me that your problem is database abstraction, something Rails does for you. If you are looking for a function that returns the three ids you indicate, I think you would want to do this:
#x = Recipe.from_recent_week.map(&:id)
No need for any of the other fluff, no declarations necessary. I also would encourage you to use a different variable name instead of #x. Please use something more like:
#ids_from_recent_week = Recipe.from_recent_week.map(&:id)

Most Efficient Way to Get Counts of Users with Certain Attributes in Ruby

I have a collection of users with various statuses: active, disabled, or deleted (as an enum). I want a count of users with each status as well as a count of the total number of users. What is the most efficient way for me to do that?
I've read the questions on size vs. length vs. count in Ruby and that makes me think I should load all of the user records and then iterate over the collection multiple times to get the length of each status array.
This is what my code looks like currently:
# pagination code omitted...
all_users = User.all
total_count = all_users.length
active_count = all_users.select {|u| u.status == User.statuses['active']}.length
disabled_count = all_users.select {|u| u.status == User.statuses['disabled']}.length
deleted_count = all_users.select {|u| u.status == User.statuses['deleted']}.length
The requests from the client take about 1.25-1.5 seconds as written for 1,000 users.
I've also tried making multiple DB queries with code like this:
# pagination code omitted...
total_count = User.count
active_count = User.where(status: User.statuses['active']).count
disabled_count = User.where(status: User.statuses['disabled']).count
deleted_count = User.where(status: User.statuses['deleted']).count
That might be marginally faster by ~100ms. Is there a faster way to do this?
I'm not sure if it is relevant, but for background info: I am using Rails as an API in this context to an AngularJS frontend. I am using Kaminari to paginate the collection, but I still need counts of each status. I am in a B2B environment so it is unlikely that any instance will have more than 1,000 users. I don't need to scale higher than that.
Thanks in advance!
Do it all at once, in the database by grouping your count query.
User.group(:status).count
Then to get the total number of users just sum the result. Here's an example from one of my tables. Here I'm grouping on a boolean field, but you can group on whatever you want.
> Course.group(:is_enabled).count
=> {false=>46, true=>26524}
That might be marginally faster by ~100ms.
Create an index on your 'status' column in your database:
# in your terminal
rails g migration AddIndexOnStatusOfUsers
# in db/migrate/xxxxx_add_index_on_status_of_users.rb
def change
add_index :users, :status
end
You should benchmark them all and let us know. Would be interesting. Pure SQL answers are always more scalable of course...
u = User.select('user.status')
active_count = 0
disabled_count = 0
deleted_count = 0
u.each do |u|
if u.status = 'active'
active_count += 1
elsif u.status = 'deleted'
deleted_count +=1
else
disabled_count +=1
end
end

find_each with order and limit

I need to limit and order batches of records and am using find_each. I've seen a lot of people asking for this and no really good solution. If I've missed it, please post a link!
I have 30M records and want to deal with 10M with the highest value in the weight column.
I tried using this method someone wrote: find_each_with_order but can't get it to work.
The code from that site doesn't take order as an option. Seems strange given that the name is find_each_with_order. I added it as follows:
class ActiveRecord::Base
# normal find_each does not use given order but uses id asc
def self.find_each_with_order(options={})
raise "offset is not yet supported" if options[:offset]
page = 1
limit = options[:limit] || 1000
order = options[:order] || 'id asc'
loop do
offset = (page-1) * limit
batch = find(:all, options.merge(:limit=>limit, :offset=>offset, :order=>order))
page += 1
batch.each{|x| yield x }
break if batch.size < limit
end
end
and I'm trying to use it as follows:
class GetStuff
def self.grab_em
file = File.open("1000 things.txt", "w")
rels = Thing.find_each_with_order({:limit=>100, :order=>"weight desc"})
binding.pry
things.each do |t|
binding.pry
file.write("#{t.name} #{t.id} #{t.weight}\n" )
if t.id % 20 == 0
puts t.id.to_s
end
end
file.close
end
end
BTW I have the data in postgres and am going to grab a subset and move it to neo4j, so I'm tagging with neo4j in case any of you neo4j people know how to do this. thanks.
Not exactly sure if this is what you're looking for, but you can do something like this:
weight = Thing.order(:weight).select(:weight).last(10_000_000).first.weight
Thing.where("weight > ?", weight).find_each do |t|
...your code...
end

Rails: how and where to add this method

I have an app where I retrieve a list of users from a specific country.
I did this in the UsersController:
#fromcanada = User.find(:all, :conditions => { :country => 'canada' })
and then turned it into a scope on the User model
scope :canada, where(:country => 'Canada').order('created_at DESC')
but I also want to be able to retrieve a random person or multiple persons from the country. I found this method that's supposed to be an efficient way to retrieve a random user from the database.
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
end
end
However, I have a few questions about how to add it, and how the syntax works.
Where would I put that code? Direct in the User model?
Syntax: so that I don't use code that I don't understand, can you explain how the syntax is working? I don't get (c = count). What is count counting? What is rand(c) doing? Is it finding the first one starting at the offset? If rand is an expensive method (hence the need to create a different more efficient random method), why use the expensive 'rand' in this new more efficient random method?
How could I add the call to random on my find method in the UsersController? How to add it to the scope in the model?
Building on question 3, is there a way to get two or three random users?
I wouldn't monkey patch that (or anything else!) into ActiveRecord, putting that into your User would make more sense.
The count is counting how many elements there are in your table and storing that number in c. Then rand(c) gives you a random integer in the interval [0,c) (i.e. 0 <= rand(c) < c). The :offset works the way you think it does.
rand isn't terribly expensive but doing order by random() inside the database can be very expensive. The random method that you're looking at is just a convenient way to get a random record/object from the database.
Adding it to your own User would look something like this:
def self.random
n = scoped.count
scoped.offset(rand(n)).first
end
That would allow you to chain random after a bunch of scopes:
u = User.canadians_eh.some_other_scope.random
but the result of random would be a single user so your chaining would stop there.
If you wanted multiple users you'd want to call random multiple times until you got the number of users you wanted. You could try this:
def self.random
n = scoped.count
scoped.offset(rand(n))
end
us = User.canadians_eh.random.limit(3)
to get three random users but the users would be clustered together in whatever order the database ended up with after your other scopes and that's probably not what you're after. If you want three you'd be better off with something like this:
# In User...
def self.random
n = scoped.count
scoped.offset(rand(n)).first
end
# Somewhere else...
scopes = User.canadians_eh.some_other_scope
users = 3.times.each_with_object([]) do |_, users|
users << scopes.random
scopes = scopes.where('id != :latest', :latest => users.last.id)
end
You'd just grab a random user, update your scope chain to exclude them, and repeat until you're done. You would, of course, want to make sure you had three users first.
You might want to move the ordering out of your canada scope: one scope, one task.
That code is injecting a new method into ActiveRecord::Base. I would put it in lib/ext/activerecord/base.rb. But you can put it anywhere you want.
count is a method being called on self. self will be some class inheriting from ActiveRecord::Base, eg. User. User.count returns the number of user records (sql: SELECT count(*) from users;). rand is a ruby stdlib method Kernel#rand. rand(c) returns a random integer in the Range 0...c and c was previously computed by calling #count. rand is not expensive.
You don't call random with find, User#random is a find, it returns one random record from all User records. In your controller you say User.random and it returns a single random record (or nil if there are no user records at all).
modify the AR::Base::random method like so:
module ActiveRecord
class Base
def self.random( how_many = 1 )
if (c = count) != 0
res = (0..how_many).inject([]) do |m,i|
m << find(:first, :offset =>rand(c))
end
how_many == 1 ? res.first : res
end
end
end
end
User.random(3) # => [<User Rand1>,<User Rand2>,<User Rand3>]

Resources