Batch search for named_scope - ruby-on-rails

This is an existing code written by someone else and am trying to enhance it. I am a java developer working on Ruby on Rails, so kindly be considerate.
I have entities like this
User
Delivery entity,
Delivery
belongs_to :user
named_scope :for_abcs, :conditions => {'deliveries.xyz_type' => ['Xyz1', 'Xyz2']},
many such named-scopes are defined.
Now to fetch the deliveries its written like this
#deliveries = current_user.deliveries.send("for_abcs").with(:xyz, :sender, :receiver)
...
...
...
# few other conditions added to #deliveries
finally
#deliveries.sort(...)
This sort is taking huge sql and giving performance issues. I want to use find_each, but find_each is only for Active Entity in Ruby on Rails, How can I achieve this (if possible) without much code change)
Earlier I used to do
Delevery.find_each
wherever it is
Delivery.find
Now I cant do as it is an array, what is the workaround or right procedure to do that in Ruby on Rails.
EDIT :
What I tried :
deliveries_temp = []
#deliveries.find_each(:batch_size=>999) do |delivery_temp|
deliveries_temp.push(delivery_temp)
end
This gave me error
undefined method `find_each' for []:Array
type(#deliveries) returned ActiveRecord::NamedScope::Scope , rails version 2.3.18

find_each should work on anything that returns a Relation (which includes scopes).
#deliveries = current_user.deliveries.for_abcs(:xyz, :sender, :receiver).find_each
Update
It sounds like you're using Rails 2.3. find_each is a class method in 2.3, so you'll need a way to extract the conditions from your scope and pass them to find_each. I found an article that looks promising, so give this a try:
Delivery.find_each(current_user.deliveries.for_abcs.scope(:find))
Also, I'm still not sure what that #with is doing. Maybe it's supposed to be #includes?

After lot of research for a week and learning about named_scopes by checking its source code. I understood what the problem was. The #deliveries is an object of class ActiveRecord::NamedScope::Scope . This class do not have find_each method. So I wrote a new named_scope for limit and offset in Delivery model file as follows :
named_scope :limit_and_offset, lambda { |lim,off| { :limit => lim, :offset=>off } }
After this , I called it in a loop passing offset and limit , for ex. first loop has offset=0, limit=999 , second loop has offset=999, limit=999 . I will add all the results into an emptry array. This loop continues till the result size is less than the limit value . This is working exactly the way I wanted , in batches.
set = 1
total_deliveries = []
set_limit=999
original_condition = #deliveries
loop do
offset = (set-1) * set_limit
temp_condition = original_condition.limit_and_offset(set_limit,offset)
temp_deliveries = temp_condition.find(:all)
total_deliveries+= temp_deliveries
set += 1
break if temp_deliveries.size < set_limit
end
#deliveries = total_deliveries.sort do |a, b|

Related

Equivalent of find_each for foo_ids?

Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.

Pagination Best Practices Ruby

So I am developing a rails app, and I am working on paginating the feed. While I was doing it I wondered if I was doing it the right way because my load times were over 1500ms. My code was:
stories = Story.feed
#stories = Kaminari.paginate_array(stories).page(params[:page]).per(params[:pageSize])
I have a few questions about this:
Should I be paginating Story.feed, or is there some sort of method
that only returns some the stories I need?
Is this load time normal?
What are other things I can be doing to optimize this
(Also, Story.feed returns an array of story objects. The code for that is here:
def self.feed
rawStories = Story.includes([:likes, :viewers, :user, :storyblocks]).all
newFeaturedStories = rawStories.where(:featured => true).where(:updated_at.gte => (Date.today - 3)).desc(:created_at).entries
normalStories = rawStories.not_in(:featured => true, :or => [:updated_at.gte => (Date.today - 3)]).desc(:created_at).entries
newFeaturedStories.entries.concat(normalStories.entries)
end
I am using mongoid and mongodb
The issue is that you get all feeds from db in an array and this takes long time.
I suggest you use the any_of query from this great gem.
From there, do:
def self.feed_stories
newFeaturedStories = Story.where(:featured => true).where(:updated_at.gte => (Date.today - 3.days))
normalStories = Story.not_in(:featured => true, :or => [:updated_at.gte => (Date.today - 3.days)])
Story.includes([:likes, :viewers, :user, :storyblocks]).any_of(newFeaturedStories, normalStories).desc(:created_at)
end
Then paginate this:
selected_stories = Story.feed_stories.per(page_size).page(page)
Dont really understand what are your entries but get them at this moment.
To sum up: the idea s to make a unique paginated db query.
I suspect that when you call Kaminari.paginate_array on an ActiveRecord::Relation, it causes the whole result set to be fetched from DB and loaded in memory similar to calling Model.all.to_a.
To avoid this, I'd first find a way to turn Story.feed into a scope, rather than a class method. Superficially they'll seem the sameā€”the differences are subtle but deep. See Active Record scopes vs class methods.
Next, ditch paginate_array in favor of chain Kaminari's page() and per() scopes.
For example (simplified version of yours):
class Article < ActiveRecord::Base
scope :featured, -> { where(featured: true) }
scope :last_3_days, -> { where(:updated_at.gte => (Date.today - 3)).desc(:created_at) }
scope :feed, -> { featured.last_3_days }
And then paginate simply by going:
Article.feed.per(page_size).page(page)
The biggest advantage of this is that Kaminari can chain into the generated SQL inserting the proper LIMIT and OFFSET clauses thereby reducing the size of the result set returned to only what needs to be displayed, as opposed to returning every matching record.
I think Will Paginate will help you out here -> mislav/will_paginate.
From there you can simply give your controller action .per_page(20) for example and after 20 objects (you can define the objects, see the wiki) there will be pagination

Execute method on mongoid scope chain

I need to take some random documents using Rails and MongoId. Since I plan to have very large collections I decided to put a 'random' field in each document and to select documents using that field. I wrote the following method in the model:
def random(qty)
if count <= qty
all
else
collection = [ ]
while collection.size < qty
collection << where(:random_field.gt => rand).first
end
collection
end
end
This function actually works and the collection is filled with qty random elements. But as I try to use it like a scope like this:
User.students.random(5)
I get:
undefined method `random' for #<Array:0x0000000bf78748>
If instead I try to make the method like a lambda scope I get:
undefined method `to_criteria' for #<Array:0x0000000df824f8>
Given that I'm not interested in applying any other scopes after the random one, how can I use my method in a chain?
Thanks in advance.
I ended up extending the Mongoid::Criteria class with the following. Don't know if it's the best option. Actually I believe it's quite slow since it executes at least qty queries.
I don't know if not_in is available for normal ActiveRecord modules. However you can remove the not_in part if needed. It's just an optimization to reduce the number of queries.
On collections that have a double (or larger) number of documents than qty, you should have exactly qty queries.
module Mongoid
class Criteria
def random(qty)
if count <= qty
all
else
res = [ ]
ids = [ ]
while res.size < qty
el = where(:random_field.gt => rand).not_in(id: ids).first
unless el.nil?
res << el
ids << el._id
end
end
res
end
end
end
end
Hope you find this useful :)

How do I find the .max of an attribute value among a group of different Models?

everyone: I am also open to just straight-up refactoring what I'm finding to be pretty repetitive, but to give a baseline of how it's working....
I have for every contact a Campaign, which has_many of three types of Models: Email, Call, and Letter.
When an Email (Call or Letter) has been executed for a specific contact, I have a Contact_Email(_or_Call_or_Letter) which belongs to both the Contact and the Model (Email_or_Call_or_Letter).
Each Contact_Email for example pairing has a :date_sent attribute. So does each Contact_Call and Contact_Letter.
How do I find the latest of all of them?
Here is the code I wrote that can find the latest Email and my finding retyping similar code for Call and Letter, but then stuck on how to do a .max on all of them:
def last_email(contact)
#get campaign the contact belongs to
#campaign = Campaign.find_by_id(contact.campaign_id)
#last_email = ContactEmail.find(:last,
:conditions => "contact_id = #{contact.id}",
:order => "date_sent DESC")
#last_call = ContactCall.find(:last,
:conditions => "contact_id = #{contact.id}",
:order => "date_sent DESC")
#last_letter = ContactLetter.find(:last,
:conditions => "contact_id = #{contact.id}",
:order => "date_sent DESC")
# how do I get the latest of all of these to display?
#email_template = Email.find_by_id(#last_email.email_id)
if #last_email.nil?
return "no email sent"
else
return #last_email.date_sent.to_s(:long) + link_to('email was sent', #email_template)
end
end
Question 1: With what I have, how can I find effectively #last_event given I can find the last Email, last Call, and last Letter for every contact?
Question 2: How can I remove the repetitive code that I have to write for each Model?
Do you have has_many associations setup in Contact referring to the other models? Something like:
class Contact < ActiveRecord::Base
has_many :contact_emails
has_many :contact_calls
has_many :contact_letters
end
If so, you can then create a last_event method on the Contact model:
def latest_event
[contact_emails, contact_calls, contact_letters].map do |assoc|
assoc.first(:order => 'date_sent DESC')
end.compact.sort_by { |e| e.date_sent }.last
end
Handling nil
When using the latest_event method you will get nil if there are no associated records. There are a couple of ways you can workaround this. The first is to check for nil first with something like:
contact.latest_event && contact.latest_event.date_sent
On late versions of Rails/Ruby you can also use Object#try which will call the method if it exists:
contact.latest_event.try(:date_sent)
I prefer not to use this as it doesn't check for nil but only if the object can respond to a method. This has cause some interesting errors if you expect nil if the object is nil but are calling a method which nil itself responds to.
Finally, my preferred method for the simple case is to use the andand gem which provides Object#andand. This greatly shortens the safe case above and saves calling of latest_event multiple times:
contact.latest_event.andand.date_sent
date_sent, nil and You.
For your example usage of calling to_s(:long), you could either use && or andand:
contact.latest_event.andand.date_sent.andand.to_s(:long)
or
contact.latest_event && contact.latest_event.date_sent.to_s(:long)
The first is safer if date_sent itself may be nil. Without using andand this could be written as:
contact.latest_event &&
contact.latest_event.date_sent &&
contact.latest_event.date_sent.to_s(:long)
which is rather complex and unwieldily in my opinion. I would recommend looking into andand
For question 1:
Just do
#last_event = [#last_letter, #last_email, #last_call].sort_by{|m| m.date_sent}.first
For question 2:
Well this is more interesting. This kind of depends on how exactly do your models look, but you might want to consider Single Table Inheritance for this type of scenario.

How to apply named_scopes incrementally in Rails

named_scope :with_country, lambad { |country_id| ...}
named_scope :with_language, lambad { |language_id| ...}
named_scope :with_gender, lambad { |gender_id| ...}
if params[:country_id]
Event.with_country(params[:country_id])
elsif params[:langauge_id]
Event.with_state(params[:language_id])
else
......
#so many combinations
end
If I get both country and language then I need to apply both of them. In my real application I have 8 different named_scopes that could be applied depending on the case. How to apply named_scopes incrementally or hold on to named_scopes somewhere and then later apply in one shot.
I tried holding on to values like this
tmp = Event.with_country(1)
but that fires the sql instantly.
I guess I can write something like
if !params[:country_id].blank? && !params[:language_id].blank? && !params[:gender_id].blank?
Event.with_country(params[:country_id]).with_language(..).with_gender
elsif country && language
elsif country && gender
elsif country && gender
.. you see the problem
Actually, the SQL does not fire instantly. Though I haven't bothered to look up how Rails pulls off this magic (though now I'm curious), the query isn't fired until you actually inspect the result set's contents.
So if you run the following in the console:
wc = Event.with_country(Country.first.id);nil # line returns nil, so wc remains uninspected
wc.with_state(State.first.id)
you'll note that no Event query is fired for the first line, whereas one large Event query is fired for the second. As such, you can safely store Event.with_country(params[:country_id]) as a variable and add more scopes to it later, since the query will only be fired at the end.
To confirm that this is true, try the approach I'm describing, and check your server logs to confirm that only one query is being fired on the page itself for events.
Check Anonymous Scopes.
I had to do something similar, having many filters applied in a view. What I did was create named_scopes with conditions:
named_scope :with_filter, lambda{|filter| { :conditions => {:field => filter}} unless filter.blank?}
In the same class there is a method which receives the params from the action and returns the filtered records:
def self.filter(params)
ClassObject
.with_filter(params[:filter1])
.with_filter2(params[:filter2])
end
Like that you can add all the filters using named_scopes and they are used depending on the params that are sent.
I took the idea from here: http://www.idolhands.com/ruby-on-rails/guides-tips-and-tutorials/add-filters-to-views-using-named-scopes-in-rails
Event.with_country(params[:country_id]).with_state(params[:language_id])
will work and won't fire the SQL until the end (if you try it in the console, it'll happen right away because the console will call to_s on the results. IRL the SQL won't fire until the end).
I suspect you also need to be sure each named_scope tests the existence of what is passed in:
named_scope :with_country, lambda { |country_id| country_id.nil? ? {} : {:conditions=>...} }
This will be easy with Rails 3:
products = Product.where("price = 100").limit(5) # No query executed yet
products = products.order("created_at DESC") # Adding to the query, still no execution
products.each { |product| puts product.price } # That's when the SQL query is actually fired
class Product < ActiveRecord::Base
named_scope :pricey, where("price > 100")
named_scope :latest, order("created_at DESC").limit(10)
end
The short answer is to simply shift the scope as required, narrowing it down depending on what parameters are present:
scope = Example
# Only apply to parameters that are present and not empty
if (!params[:foo].blank?)
scope = scope.with_foo(params[:foo])
end
if (!params[:bar].blank?)
scope = scope.with_bar(params[:bar])
end
results = scope.all
A better approach would be to use something like Searchlogic (http://github.com/binarylogic/searchlogic) which encapsulates all of this for you.

Resources