Lets say I have a Collection of users. Is there a way of using mongoid to find n random users in the collection where it does not return the same user twice? For now lets say the user collection looks like this:
class User
include Mongoid::Document
field :name
end
Simple huh?
Thanks
If you just want one document, and don't want to define a new criteria method, you could just do this:
random_model = Model.skip(rand(Model.count)).first
If you want to find a random model based on some criteria:
criteria = Model.scoped_whatever.where(conditions) # query example
random_model = criteria.skip(rand(criteria.count)).first
The best solution is going to depend on the expected size of the collection.
For tiny collections, just get all of them and .shuffle.slice!
For small sizes of n, you can get away with something like this:
result = (0..User.count-1).sort_by{rand}.slice(0, n).collect! do |i| User.skip(i).first end
For large sizes of n, I would recommend creating a "random" column to sort by. See here for details: http://cookbook.mongodb.org/patterns/random-attribute/ https://github.com/mongodb/cookbook/blob/master/content/patterns/random-attribute.txt
MongoDB 3.2 comes to the rescue with $sample (link to doc)
EDIT : The most recent of Mongoid has implemented $sample, so you can call YourCollection.all.sample(5)
Previous versions of mongoid
Mongoid doesn't support sample until Mongoid 6, so you have to run this aggregate query with the Mongo driver :
samples = User.collection.aggregate([ { '$sample': { size: 3 } } ])
# call samples.to_a if you want to get the objects in memory
What you can do with that
I believe the functionnality should make its way soon to Mongoid, but in the meantime
module Utility
module_function
def sample(model, count)
ids = model.collection.aggregate([
{ '$sample': { size: count } }, # Sample from the collection
{ '$project': { _id: 1} } # Keep only ID fields
]).to_a.map(&:values).flatten # Some Ruby magic
model.find(ids)
end
end
Utility.sample(User, 50)
If you really want simplicity you could use this instead:
class Mongoid::Criteria
def random(n = 1)
indexes = (0..self.count-1).sort_by{rand}.slice(0,n).collect!
if n == 1
return self.skip(indexes.first).first
else
return indexes.map{ |index| self.skip(index).first }
end
end
end
module Mongoid
module Finders
def random(n = 1)
criteria.random(n)
end
end
end
You just have to call User.random(5) and you'll get 5 random users.
It'll also work with filtering, so if you want only registered users you can do User.where(:registered => true).random(5).
This will take a while for large collections so I recommend using an alternate method where you would take a random division of the count (e.g.: 25 000 to 30 000) and randomize that range.
You can do this by
generate random offset which will further satisfy to pick the next n
elements (without exceeding the limit)
Assume count is 10, and the n is 5
to do this check the given n is less than the total count
if no set the offset to 0, and go to step 8
if yes, subtract the n from the total count, and you will get a number 5
Use this to find a random number, the number definitely will be from 0 to 5 (Assume 2)
Use the random number 2 as offset
now you can take the random 5 users by simply passing this offset and the n (5) as a limit.
now you get users from 3 to 7
code
>> cnt = User.count
=> 10
>> n = 5
=> 5
>> offset = 0
=> 0
>> if n<cnt
>> offset = rand(cnt-n)
>> end
>> 2
>> User.skip(offset).limit(n)
and you can put this in a method
def get_random_users(n)
offset = 0
cnt = User.count
if n < cnt
offset = rand(cnt-n)
end
User.skip(offset).limit(n)
end
and call it like
rand_users = get_random_users(5)
hope this helps
Since I want to keep a criteria, I do:
scope :random, ->{
random_field_for_ordering = fields.keys.sample
random_direction_to_order = %w(asc desc).sample
order_by([[random_field_for_ordering, random_direction_to_order]])
}
Just encountered such a problem. Tried
Model.all.sample
and it works for me
The approach from #moox is really interesting but I doubt that monkeypatching the whole Mongoid is a good idea here. So my approach is just to write a concern Randomizable that can included in each model you use this feature. This goes to app/models/concerns/randomizeable.rb:
module Randomizable
extend ActiveSupport::Concern
module ClassMethods
def random(n = 1)
indexes = (0..count - 1).sort_by { rand }.slice(0, n).collect!
return skip(indexes.first).first if n == 1
indexes.map { |index| skip(index).first }
end
end
end
Then your User model would look like this:
class User
include Mongoid::Document
include Randomizable
field :name
end
And the tests....
require 'spec_helper'
class RandomizableCollection
include Mongoid::Document
include Randomizable
field :name
end
describe RandomizableCollection do
before do
RandomizableCollection.create name: 'Hans Bratwurst'
RandomizableCollection.create name: 'Werner Salami'
RandomizableCollection.create name: 'Susi Wienerli'
end
it 'returns a random document' do
srand(2)
expect(RandomizableCollection.random(1).name).to eq 'Werner Salami'
end
it 'returns an array of random documents' do
srand(1)
expect(RandomizableCollection.random(2).map &:name).to eq ['Susi Wienerli', 'Hans Bratwurst']
end
end
I think it is better to focus on randomizing the returned result set so I tried:
Model.all.to_a.shuffle
Hope this helps.
Related
Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.
I need to take some random documents using Rails and MongoId. Since I plan to have very large collections I decided to put a 'random' field in each document and to select documents using that field. I wrote the following method in the model:
def random(qty)
if count <= qty
all
else
collection = [ ]
while collection.size < qty
collection << where(:random_field.gt => rand).first
end
collection
end
end
This function actually works and the collection is filled with qty random elements. But as I try to use it like a scope like this:
User.students.random(5)
I get:
undefined method `random' for #<Array:0x0000000bf78748>
If instead I try to make the method like a lambda scope I get:
undefined method `to_criteria' for #<Array:0x0000000df824f8>
Given that I'm not interested in applying any other scopes after the random one, how can I use my method in a chain?
Thanks in advance.
I ended up extending the Mongoid::Criteria class with the following. Don't know if it's the best option. Actually I believe it's quite slow since it executes at least qty queries.
I don't know if not_in is available for normal ActiveRecord modules. However you can remove the not_in part if needed. It's just an optimization to reduce the number of queries.
On collections that have a double (or larger) number of documents than qty, you should have exactly qty queries.
module Mongoid
class Criteria
def random(qty)
if count <= qty
all
else
res = [ ]
ids = [ ]
while res.size < qty
el = where(:random_field.gt => rand).not_in(id: ids).first
unless el.nil?
res << el
ids << el._id
end
end
res
end
end
end
end
Hope you find this useful :)
I am trying to calculate a weighted average of a variable in my model based on a second variable in my model and I'm having trouble finding a way to do it through ActiveRecord.
class Employer < ActiveRecord::Base
attr_accessible :name, :number_of_employees, :average_age
def self.wt_avg_age
#return sum(number_of_employee * average_age)/sum(number_of_employees)
end
end
In straight SQL, I would use:
SELECT id, SUM(number_of_employees*average_age)/SUM(number_of_employees)
FROM employer
GROUP BY name
Can I execute something like this on an ActiveRecord relation in an eloquent way (i.e., without pulling down separate arrays and iterating through every record to get my numerator)? I have tried different combinations using .select(), .pluck(), and sum() without any luck. I'm having trouble getting the ActiveRecord object to perform the sumproduct.
You should be able to do something like:
Employer.select("name, (SUM(number_of_employees*average_age)/SUM(number_of_employees)) as sum").group(:name)
That will return Employer instances to you, but they will only have the .name and .sum attributes on them. This will run the exact SQL query that you wanted.
It looks like ActiveRecord::Calculations#sum takes a block:
# File activerecord/lib/active_record/relation/calculations.rb, line 92
def sum(*args)
if block_given?
self.to_a.sum(*args) {|*block_args| yield(*block_args)}
else
calculate(:sum, *args)
end
end
(also see http://api.rubyonrails.org/classes/Enumerable.html#method-i-sum)
So you might try:
def self.wt_avg_age
numerator = self.all.sum { |e| e.number_of_employee * e.average_age }
denominator = self.sum :number_of_employees
return numerator / denominator
end
Take a try, maybe it can works:
def self.wt_avg_age
a = Employer.sum("number_of_employee * average_age")
b = Employer.sum('number_of_employees')
a/b
end
I find very verbose and tedious to test if records coming from the database are correctly ordered.
I'm thinking using the array '==' method to compare two searches arrays. The array's elements and order must be the same so it seems a good fit. The issue is that if elements are missing the test will fail even though they are strictly ordered properly.
I wonder if there is a better way...
Rails 4
app/models/person.rb
default_scope { order(name: :asc) }
test/models/person.rb
test "people should be ordered by name" do
xavier = Person.create(name: 'xavier')
albert = Person.create(name: 'albert')
all = Person.all
assert_operator all.index(albert), :<, all.index(xavier)
end
Rails 3
app/models/person.rb
default_scope order('name ASC')
test/unit/person_test.rb
test "people should be ordered by name" do
xavier = Person.create name: 'xavier'
albert = Person.create name: 'albert'
assert Person.all.index(albert) < Person.all.index(xavier)
end
I haven't come across a built-in way to do this nicely but here's a way to check if an array of objects is sorted by a member:
class MyObject
attr_reader :a
def initialize(value)
#a = value
end
end
a = MyObject.new(2)
b = MyObject.new(3)
c = MyObject.new(4)
myobjects = [a, b, c]
class Array
def sorted_by?(method)
self.each_cons(2) do |a|
return false if a[0].send(method) > a[1].send(method)
end
true
end
end
p myobjects.sorted_by?(:a) #=> true
Then you can use it using something like:
test "people should be ordered by name by default" do
people = Person.all
assert people.sorted_by?(:age)
end
I came across what I was looking for when I asked this question. Using the each_cons method, it makes the test very neat:
assert Person.all.each_cons(2).all?{|i,j| i.name >= j.name}
I think having your record selection sorted will give you a more proper ordered result set, and in fact its always good to order your results
By that way I think you will not need the array == method
HTH
sameera
Is there an easy way to obtain the average of an attribute in a collection?
For instance, each user has a score.
Given a collection of user(s) (#users), how can you get the average score for the group?
Is there anything like #users.average(:score)? I think I came across something like this for database fields, but I need it to work for a collection...
For your question, one could actually do:
#users.collect(&:score).sum.to_f/#users.length if #users.length > 0
Earlier I thought, #users.collect(&:score).average would have worked. For database fields, User.average(:score) will work. You can also add :conditions like other activerecord queries.
I use to extend our friend Array with this method:
class Array
# Calculates average of anything that responds to :"+" and :to_f
def avg
blank? and 0.0 or sum.to_f/size
end
end
Here's a little snippet to not only get the average but also the standard deviation.
class User
attr_accessor :score
def initialize(score)
#score = score
end
end
#users=[User.new(10), User.new(20), User.new(30), User.new(40)]
mean=#users.inject(0){|acc, user| acc + user.score} / #users.length.to_f
stddev = Math.sqrt(#users.inject(0) { |sum, u| sum + (u.score - mean) ** 2 } / #users.length.to_f )
u can use this here
http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html#method-i-average