Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.
Related
I have the following code snippet that works perfectly and as intended:
# Prepares the object design categories and connects them via bit mapping with the objects.design_category_flag
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with one transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for object design category
ObjectDesignCategory.where('0 = (#rownum:=0)').update_all('bit_flag_position = 1 << (#rownum := 1 + #rownum)')
# Resets the object design category flag
Object.update_all(design_category_flag: 0)
# Sets the new object design category bit flag
object_group_relation = Object.joins(:object_design_categories).select('BIT_OR(bit_flag_position) AS flag, objects.id AS object_id').group(:id)
join_str = "JOIN (#{object_group_relation.to_sql}) sub ON sub.object_id = objects.id"
Object.joins(join_str).update_all('design_category_flag = sub.flag')
end
But in my opinion it is quite difficult to read. So I tried to rewrite this code without raw SQL. What I created was this:
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with via transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for the object color group
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
# Resets the object color group flag
Object.update_all(color_group_flag: 0)
# Sets the new object color group bit flag
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
end
end
This also works fine, but when I run a benchmark for about 2000+ records, the second option is about a factor of 65 slower than the first. So my question is:
Does anyone have an idea how to redesign this code so that it doesn't require raw SQL and is still fast?
I can see 2 sources of slowing:
N+1 problem
Instantiating objects
Calls to DB
This code has the N+1 Problem. I think this may be the major cause of the slowing.
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
Change to
Object.includes(:object_color_groups).find_each do |object|
...
end
You can also use Object#update class method on this code (see below).
I don't think you can get around #2 without using raw SQL. But, you will need many objects (10K or 100K or more) to see a big difference.
To limit the calls to the DB, you can use Object#update class method to update many at once.
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
to
color_groups = ObjectColorGroup.with_index.map do |group, index|
[group.id, { bit_flag_position: group.bit_flag_position: 1 << index }]
end.to_h
ObjectColorGroup.update(color_groups.keys, color_groups.values)
The following is a single query, so no need to change.
Object.update_all(color_group_flag: 0)
Reference:
ActiveRecord#update class method API
ActiveRecord#update class method blog post
Rails Eager Loading
Here's the situation:
I have an Event model and I want to add prev / next buttons to a view to get the next event, but sorted by the event start datetime, not the ID/created_at.
So the events are created in the order that start, so I can compare IDs or get the next highest ID or anything like that. E.g. Event ID 2 starts before Event ID 3. So Event.next(3) should return Event ID 2.
At first I was passing the start datetime as a param and getting the next one, but this failed when there were 2 events with the same start. The param start datetime doesn't include microseconds, so what would happen is something like this:
order("start > ?",current_start).first
would keep returning the same event over and over because current_start wouldn't include microseconds, so the current event would technically be > than current_start by 0.000000124 seconds or something like that.
The way I got to work for everything was with a concern like this:
module PrevNext
extend ActiveSupport::Concern
module ClassMethods
def next(id)
find_by(id: chron_ids[current_index(id)+1])
end
def prev(id)
find_by(id: chron_ids[current_index(id)-1])
end
def chron_ids
#chron_ids ||= order("#{order_by_attr} ASC").ids
end
def current_index(id)
chron_ids.find_index(id)
end
def order_by_attr
#order_by_attr ||= 'created_at'
end
end
end
Model:
class Event < ActiveRecord::Base
...
include PrevNext
def self.order_by_attr
#order_by_attr ||= "start_datetime"
end
...
end
I know pulling all the IDs into an array is bad and dumb* but i don't know how to
Get a list of the records in the order I want
Jump to a specific record in that list (current event)
and then get the next record
...all in one ActiveRecord query. (Using Rails 4 w/ PostgreSQL)
*This table will likely never have more than 10k records, so it's not catastrophically bad and dumb.
The best I could manage was to pull out only the IDs in order and then memoize them.
Ideally, i'd like to do this by just passing the Event ID, rather than a start date params, since it's passed via GET param, so the less URL encoding and decoding the better.
There has to be a better way to do this. I posted it on Reddit as well, but the only suggested response didn't actually work.
Reddit Link
Any help or insight is appreciated. Thanks!
You can get the next n records by using the SQL OFFSET keyword:
china = Country.order(:population).first
india = City.order(:population).offset(1).take
# SELECT * FROM countries ORDER BY population LIMIT 1 OFFSET 1
Which is how pagination for example often is done:
#countries = Country.order(:population).limit(50)
#countries = scope.offset( params[:page].to_i * 50 ) if params[:page]
Another way to do this is by using would be query cursors. However ActiveRecord does not support this and it building a generally reusable solution would be quite a task and may not be very useful in the end.
This is an existing code written by someone else and am trying to enhance it. I am a java developer working on Ruby on Rails, so kindly be considerate.
I have entities like this
User
Delivery entity,
Delivery
belongs_to :user
named_scope :for_abcs, :conditions => {'deliveries.xyz_type' => ['Xyz1', 'Xyz2']},
many such named-scopes are defined.
Now to fetch the deliveries its written like this
#deliveries = current_user.deliveries.send("for_abcs").with(:xyz, :sender, :receiver)
...
...
...
# few other conditions added to #deliveries
finally
#deliveries.sort(...)
This sort is taking huge sql and giving performance issues. I want to use find_each, but find_each is only for Active Entity in Ruby on Rails, How can I achieve this (if possible) without much code change)
Earlier I used to do
Delevery.find_each
wherever it is
Delivery.find
Now I cant do as it is an array, what is the workaround or right procedure to do that in Ruby on Rails.
EDIT :
What I tried :
deliveries_temp = []
#deliveries.find_each(:batch_size=>999) do |delivery_temp|
deliveries_temp.push(delivery_temp)
end
This gave me error
undefined method `find_each' for []:Array
type(#deliveries) returned ActiveRecord::NamedScope::Scope , rails version 2.3.18
find_each should work on anything that returns a Relation (which includes scopes).
#deliveries = current_user.deliveries.for_abcs(:xyz, :sender, :receiver).find_each
Update
It sounds like you're using Rails 2.3. find_each is a class method in 2.3, so you'll need a way to extract the conditions from your scope and pass them to find_each. I found an article that looks promising, so give this a try:
Delivery.find_each(current_user.deliveries.for_abcs.scope(:find))
Also, I'm still not sure what that #with is doing. Maybe it's supposed to be #includes?
After lot of research for a week and learning about named_scopes by checking its source code. I understood what the problem was. The #deliveries is an object of class ActiveRecord::NamedScope::Scope . This class do not have find_each method. So I wrote a new named_scope for limit and offset in Delivery model file as follows :
named_scope :limit_and_offset, lambda { |lim,off| { :limit => lim, :offset=>off } }
After this , I called it in a loop passing offset and limit , for ex. first loop has offset=0, limit=999 , second loop has offset=999, limit=999 . I will add all the results into an emptry array. This loop continues till the result size is less than the limit value . This is working exactly the way I wanted , in batches.
set = 1
total_deliveries = []
set_limit=999
original_condition = #deliveries
loop do
offset = (set-1) * set_limit
temp_condition = original_condition.limit_and_offset(set_limit,offset)
temp_deliveries = temp_condition.find(:all)
total_deliveries+= temp_deliveries
set += 1
break if temp_deliveries.size < set_limit
end
#deliveries = total_deliveries.sort do |a, b|
I need to take some random documents using Rails and MongoId. Since I plan to have very large collections I decided to put a 'random' field in each document and to select documents using that field. I wrote the following method in the model:
def random(qty)
if count <= qty
all
else
collection = [ ]
while collection.size < qty
collection << where(:random_field.gt => rand).first
end
collection
end
end
This function actually works and the collection is filled with qty random elements. But as I try to use it like a scope like this:
User.students.random(5)
I get:
undefined method `random' for #<Array:0x0000000bf78748>
If instead I try to make the method like a lambda scope I get:
undefined method `to_criteria' for #<Array:0x0000000df824f8>
Given that I'm not interested in applying any other scopes after the random one, how can I use my method in a chain?
Thanks in advance.
I ended up extending the Mongoid::Criteria class with the following. Don't know if it's the best option. Actually I believe it's quite slow since it executes at least qty queries.
I don't know if not_in is available for normal ActiveRecord modules. However you can remove the not_in part if needed. It's just an optimization to reduce the number of queries.
On collections that have a double (or larger) number of documents than qty, you should have exactly qty queries.
module Mongoid
class Criteria
def random(qty)
if count <= qty
all
else
res = [ ]
ids = [ ]
while res.size < qty
el = where(:random_field.gt => rand).not_in(id: ids).first
unless el.nil?
res << el
ids << el._id
end
end
res
end
end
end
end
Hope you find this useful :)
named_scope :with_country, lambad { |country_id| ...}
named_scope :with_language, lambad { |language_id| ...}
named_scope :with_gender, lambad { |gender_id| ...}
if params[:country_id]
Event.with_country(params[:country_id])
elsif params[:langauge_id]
Event.with_state(params[:language_id])
else
......
#so many combinations
end
If I get both country and language then I need to apply both of them. In my real application I have 8 different named_scopes that could be applied depending on the case. How to apply named_scopes incrementally or hold on to named_scopes somewhere and then later apply in one shot.
I tried holding on to values like this
tmp = Event.with_country(1)
but that fires the sql instantly.
I guess I can write something like
if !params[:country_id].blank? && !params[:language_id].blank? && !params[:gender_id].blank?
Event.with_country(params[:country_id]).with_language(..).with_gender
elsif country && language
elsif country && gender
elsif country && gender
.. you see the problem
Actually, the SQL does not fire instantly. Though I haven't bothered to look up how Rails pulls off this magic (though now I'm curious), the query isn't fired until you actually inspect the result set's contents.
So if you run the following in the console:
wc = Event.with_country(Country.first.id);nil # line returns nil, so wc remains uninspected
wc.with_state(State.first.id)
you'll note that no Event query is fired for the first line, whereas one large Event query is fired for the second. As such, you can safely store Event.with_country(params[:country_id]) as a variable and add more scopes to it later, since the query will only be fired at the end.
To confirm that this is true, try the approach I'm describing, and check your server logs to confirm that only one query is being fired on the page itself for events.
Check Anonymous Scopes.
I had to do something similar, having many filters applied in a view. What I did was create named_scopes with conditions:
named_scope :with_filter, lambda{|filter| { :conditions => {:field => filter}} unless filter.blank?}
In the same class there is a method which receives the params from the action and returns the filtered records:
def self.filter(params)
ClassObject
.with_filter(params[:filter1])
.with_filter2(params[:filter2])
end
Like that you can add all the filters using named_scopes and they are used depending on the params that are sent.
I took the idea from here: http://www.idolhands.com/ruby-on-rails/guides-tips-and-tutorials/add-filters-to-views-using-named-scopes-in-rails
Event.with_country(params[:country_id]).with_state(params[:language_id])
will work and won't fire the SQL until the end (if you try it in the console, it'll happen right away because the console will call to_s on the results. IRL the SQL won't fire until the end).
I suspect you also need to be sure each named_scope tests the existence of what is passed in:
named_scope :with_country, lambda { |country_id| country_id.nil? ? {} : {:conditions=>...} }
This will be easy with Rails 3:
products = Product.where("price = 100").limit(5) # No query executed yet
products = products.order("created_at DESC") # Adding to the query, still no execution
products.each { |product| puts product.price } # That's when the SQL query is actually fired
class Product < ActiveRecord::Base
named_scope :pricey, where("price > 100")
named_scope :latest, order("created_at DESC").limit(10)
end
The short answer is to simply shift the scope as required, narrowing it down depending on what parameters are present:
scope = Example
# Only apply to parameters that are present and not empty
if (!params[:foo].blank?)
scope = scope.with_foo(params[:foo])
end
if (!params[:bar].blank?)
scope = scope.with_bar(params[:bar])
end
results = scope.all
A better approach would be to use something like Searchlogic (http://github.com/binarylogic/searchlogic) which encapsulates all of this for you.