Rails: query execution even with eager load - ruby-on-rails

In Rails 3, given an Order that has many OrderRows,
if I call the sum function, I get an unwanted query executed
Order.eager_load(:order_rows).find(xxx).order_rows.sum{|x| x.unit_taxed_cents}
OrderRow Load (0.6ms) SELECT "order_rows".* FROM "order_rows" WHERE "order_rows"."order_id" = XXXX
=> 17900
instead, if I use, for example, a map, the additional query is not executed:
Order.eager_load(:order_rows).find(xxx).order_rows.inject(0){|sum, r| sum + r.unit_taxed_cents }
=> 17900
By definition (http://apidock.com/rails/v3.1.0/Enumerable/sum) the 2 functions should be the same thing
Why doensn't it behave like expected?
Thanks
EDIT:
exploring better what rails does,
Order.eager_load(:order_rows).find(xxx).order_rows.method(:sum).source_location
returns
gems/activesupport-3.2.18/lib/active_support/core_ext/enumerable.rb
the definition of the function is:
def sum(identity = 0, &block)
if block_given?
map(&block).sum(identity)
else
inject(:+) || identity
end
end
hence, I'd expect the same useless additional query if I run
Order.eager_load(:order_rows).find(xxx).order_rows.map{|x| x.subtotal_taxed}.sum
Instead it doesn't happen, and the behavior is correct

Related

How to access raw SQL statement generated by update_all (ActiveRecord method)

I'm just wondering if there's a way to access the raw SQL that's executed for an update_all ActiveRecord request. As an example, take the simple example below:
Something.update_all( ["to_update = ?"], ["id = ?" my_id] )
In the rails console I can see the raw SQL statement so I'm guessing it's available for me to access in some way?
PS - I'm specifically interested in update_all and can't change it to anything else.
Thanks!
If you look at the way update_all is implemented you can't call to_sql on it like you can on relations since it executes directly and returns an integer (the number of rows executed).
There is no way to tap into the flow or get the desired result except by duplicating the entire method and changing the last line:
module ActiveRecord
# = Active Record \Relation
class Relation
def update_all_to_sql(updates)
raise ArgumentError, "Empty list of attributes to change" if updates.blank?
if eager_loading?
relation = apply_join_dependency
return relation.update_all(updates)
end
stmt = Arel::UpdateManager.new
stmt.set Arel.sql(#klass.sanitize_sql_for_assignment(updates))
stmt.table(table)
if has_join_values? || offset_value
#klass.connection.join_to_update(stmt, arel, arel_attribute(primary_key))
else
stmt.key = arel_attribute(primary_key)
stmt.take(arel.limit)
stmt.order(*arel.orders)
stmt.wheres = arel.constraints
end
#- #klass.connection.update stmt, "#{#klass} Update All"
stmt.to_sql
end
end
end
The reason you see the log statements is that they are logged by the connection when it executes the statements. While you can override the logging its not really possible to do it for calls from a single AR method.
If you have set RAILS_LOG_LEVEL=debug Rails shows you which SQL statement it executed.
# Start Rails console in debug mode
$ RAILS_LOG_LEVEL=debug rails c
# Run your query
[1] pry(main)> Something.update_all( ["to_update = ?"], ["id = ?" my_id] )
SQL (619.8ms) UPDATE "somethings" WHERE id = 123 SET to_update = my_id;
# ^it prints out the query it executed

Equivalent of find_each for foo_ids?

Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.

Rails find with a block

I have seen Rails find method taking a block as
Consumer.find do |c|
c.id == 3
end
Which is similar to Consumer.find(3).
What are some of the use cases where we can actually use block for a find ?
It's a shortcut for .to_a.find { ... }. Here's the method's source code:
def find(*args)
if block_given?
to_a.find(*args) { |*block_args| yield(*block_args) }
else
find_with_ids(*args)
end
end
If you pass a block, it calls .to_a (loading all records) and invokes Enumerable#find on the array.
In other words, it allows you to use Enumerable#find on a ActiveRecord::Relation. This can be useful if your condition can't be expressed or evaluated in SQL, e.g. querying serialized attributes:
Consumer.find { |c| c.preferences[:foo] == :bar }
To avoid confusion, I'd prefer the more explicit version, though:
Consumer.all.to_a.find { |c| c.preferences[:foo] == :bar }
The result may be similar, but the SQL query is not similar to Consumer.find(3)
It is fetching all the consumers and then filtering based on the block. I cant think of a use case where this might be useful
Here is a sample query in the console
consumer = Consumer.find {|c|c.id == 2}
# Consumer Load (0.3ms) SELECT `consumers`.* FROM `consumers`
# => #<Consumer id: 2, name: "xyz", ..>
A good example of a use-case is if you have a JSON/JSONB column and don't want to get involved in the more complex JSON SQL.
required_item = item_collection.find do |item|
item.jsondata['json_array_property'][index]['property'] == clause
end
This is useful if you can constrain the scope of the item_collection to a date-range, for example, and have a smaller set of items that require filtering further.

Dynamic find methods Vs conditional statements

Student.find(:all, :conditions => [‘name = ? and status = ?’ ‘mohit’, 1])
Vs
Student.find_all_by_name_and_status(‘mohit’, 1)
Both the queries will result the same set of row but first one is preferable cause in the second way there will be exception generated method_missing and then rails will try to relate it as dynamic method. if fine then result set to returned.
Can any body explain me this in a good manner. What exactly is happening behind the screen. Please correct me if i am wrong.
You are right, the second way will go through a method_missing. ActiveRecord will parse the method name and if it is a valid name, it will generate a method on the fly.
If you look in the source of ActiveRecord::Base, in method_missing you'll see that developers left us a comment of how this generated method would look like:
# def self.find_by_login_and_activated(*args)
# options = args.extract_options!
# attributes = construct_attributes_from_arguments(
# [:login,:activated],
# args
# )
# finder_options = { :conditions => attributes }
# validate_find_options(options)
# set_readonly_option!(options)
#
# if options[:conditions]
# with_scope(:find => finder_options) do
# find(:first, options)
# end
# else
# find(:first, options.merge(finder_options))
# end
# end
So you see that generally it boils down to the same find method.
I would not say that the first way is preferable because of method_missing, because the performance penalty for that is negligible. The second way reads better and works well if you just need to fetch records based on attributes equal to some values.
However, this second form does not allow you to do anything beyond equality comparison (e.g., range comparison, "not equal to" expressions, joins, etc.). In such cases, you'll just have to use the find method with appropriate conditions and other parameters.

How to apply named_scopes incrementally in Rails

named_scope :with_country, lambad { |country_id| ...}
named_scope :with_language, lambad { |language_id| ...}
named_scope :with_gender, lambad { |gender_id| ...}
if params[:country_id]
Event.with_country(params[:country_id])
elsif params[:langauge_id]
Event.with_state(params[:language_id])
else
......
#so many combinations
end
If I get both country and language then I need to apply both of them. In my real application I have 8 different named_scopes that could be applied depending on the case. How to apply named_scopes incrementally or hold on to named_scopes somewhere and then later apply in one shot.
I tried holding on to values like this
tmp = Event.with_country(1)
but that fires the sql instantly.
I guess I can write something like
if !params[:country_id].blank? && !params[:language_id].blank? && !params[:gender_id].blank?
Event.with_country(params[:country_id]).with_language(..).with_gender
elsif country && language
elsif country && gender
elsif country && gender
.. you see the problem
Actually, the SQL does not fire instantly. Though I haven't bothered to look up how Rails pulls off this magic (though now I'm curious), the query isn't fired until you actually inspect the result set's contents.
So if you run the following in the console:
wc = Event.with_country(Country.first.id);nil # line returns nil, so wc remains uninspected
wc.with_state(State.first.id)
you'll note that no Event query is fired for the first line, whereas one large Event query is fired for the second. As such, you can safely store Event.with_country(params[:country_id]) as a variable and add more scopes to it later, since the query will only be fired at the end.
To confirm that this is true, try the approach I'm describing, and check your server logs to confirm that only one query is being fired on the page itself for events.
Check Anonymous Scopes.
I had to do something similar, having many filters applied in a view. What I did was create named_scopes with conditions:
named_scope :with_filter, lambda{|filter| { :conditions => {:field => filter}} unless filter.blank?}
In the same class there is a method which receives the params from the action and returns the filtered records:
def self.filter(params)
ClassObject
.with_filter(params[:filter1])
.with_filter2(params[:filter2])
end
Like that you can add all the filters using named_scopes and they are used depending on the params that are sent.
I took the idea from here: http://www.idolhands.com/ruby-on-rails/guides-tips-and-tutorials/add-filters-to-views-using-named-scopes-in-rails
Event.with_country(params[:country_id]).with_state(params[:language_id])
will work and won't fire the SQL until the end (if you try it in the console, it'll happen right away because the console will call to_s on the results. IRL the SQL won't fire until the end).
I suspect you also need to be sure each named_scope tests the existence of what is passed in:
named_scope :with_country, lambda { |country_id| country_id.nil? ? {} : {:conditions=>...} }
This will be easy with Rails 3:
products = Product.where("price = 100").limit(5) # No query executed yet
products = products.order("created_at DESC") # Adding to the query, still no execution
products.each { |product| puts product.price } # That's when the SQL query is actually fired
class Product < ActiveRecord::Base
named_scope :pricey, where("price > 100")
named_scope :latest, order("created_at DESC").limit(10)
end
The short answer is to simply shift the scope as required, narrowing it down depending on what parameters are present:
scope = Example
# Only apply to parameters that are present and not empty
if (!params[:foo].blank?)
scope = scope.with_foo(params[:foo])
end
if (!params[:bar].blank?)
scope = scope.with_bar(params[:bar])
end
results = scope.all
A better approach would be to use something like Searchlogic (http://github.com/binarylogic/searchlogic) which encapsulates all of this for you.

Resources