Ruby on Rails Performance - update row or check first - ruby-on-rails

I have a model Product that has attribute description and code which is an index.
I would like to alter the product in code based on a CSV file.
What is faster?
#p = Product.find_by_code(row[:code])
if #p.description != row[:desc]
#p.update_attribute(:description, row[:desc])
or
#p = Product.find_by_code(row[:code])
#p.update_attribute(:description, row[:desc])
Let's consider all cases, such as descriptions are equal and not equal at all.
How is = comparison implemented for strings and texts?

You should use the ruby Benchmark module and directly measure that !
require 'benchmark'
Benchmark.bm do |x|
x.report('sort!') do
#p = Product.find_by(code: row[:code])
if #p.description != row[:desc]
#p.description = row[:desc]
p.save
end
end
x.report('sort') do
#p = Product.find_by(code: row[:code])
#p.description = row[:desc]
p.save
end
end

Ruby on Rails is clever enough to know whether an attribute has actually changed, and so won't roundtrip to the database to update a field when it hasn't changed. You can see this on the Rails console (rails c) if you run your update_attribute code with the same value, and then with a changed value - you'll only see the SQL log output when it's changed.
If you use update_attributes instead (which takes a hash of attributes to change) and there is nothing to update, you'll see it does begin and end a transaction with the database, albeit with no commands within it.
Hope that helps!

Related

How to access raw SQL statement generated by update_all (ActiveRecord method)

I'm just wondering if there's a way to access the raw SQL that's executed for an update_all ActiveRecord request. As an example, take the simple example below:
Something.update_all( ["to_update = ?"], ["id = ?" my_id] )
In the rails console I can see the raw SQL statement so I'm guessing it's available for me to access in some way?
PS - I'm specifically interested in update_all and can't change it to anything else.
Thanks!
If you look at the way update_all is implemented you can't call to_sql on it like you can on relations since it executes directly and returns an integer (the number of rows executed).
There is no way to tap into the flow or get the desired result except by duplicating the entire method and changing the last line:
module ActiveRecord
# = Active Record \Relation
class Relation
def update_all_to_sql(updates)
raise ArgumentError, "Empty list of attributes to change" if updates.blank?
if eager_loading?
relation = apply_join_dependency
return relation.update_all(updates)
end
stmt = Arel::UpdateManager.new
stmt.set Arel.sql(#klass.sanitize_sql_for_assignment(updates))
stmt.table(table)
if has_join_values? || offset_value
#klass.connection.join_to_update(stmt, arel, arel_attribute(primary_key))
else
stmt.key = arel_attribute(primary_key)
stmt.take(arel.limit)
stmt.order(*arel.orders)
stmt.wheres = arel.constraints
end
#- #klass.connection.update stmt, "#{#klass} Update All"
stmt.to_sql
end
end
end
The reason you see the log statements is that they are logged by the connection when it executes the statements. While you can override the logging its not really possible to do it for calls from a single AR method.
If you have set RAILS_LOG_LEVEL=debug Rails shows you which SQL statement it executed.
# Start Rails console in debug mode
$ RAILS_LOG_LEVEL=debug rails c
# Run your query
[1] pry(main)> Something.update_all( ["to_update = ?"], ["id = ?" my_id] )
SQL (619.8ms) UPDATE "somethings" WHERE id = 123 SET to_update = my_id;
# ^it prints out the query it executed

Equivalent of find_each for foo_ids?

Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.

Converting Rails model to SQL insert Query?

Is there a way to convert a Rails model into an insert query?
For instance, if I have a model like:
m = Model.new
m.url = "url"
m.header = "header"
How can I get the corresponding SQL query ActiveRecord would generate if I did m.save?
I want to get: "INSERT INTO models(url, header) VALUES('url', 'header')" if possible.
Note: I don't want to actually save the model and get the query back (from log file, etc). I want to get the query IF I chose to save it.
On Rails 4.1, I found the below code snippet working:
record = Post.new(:title => 'Yay', :body => 'This is some insert SQL')
record.class.arel_table.create_insert
.tap { |im| im.insert(record.send(
:arel_attributes_with_values_for_create,
record.attribute_names)) }
.to_sql
Thanks to https://coderwall.com/p/obrxhq/how-to-generate-activerecord-insert-sql
Tested in Rails 3.2.13: I think I got it right this time, it definitely does not persist to the db this time. It also won't fire validations or callbacks so anything they change won't be in the results unless you've called them some other way.
Save this in lib as insert_sqlable.rb and you can then
#in your models or you can send it to ActiveRecord::Base
include InsertSqlable
Then it is model.insert_sql to see it.
#lib/insert_sqlable
module InsertSqlable
def insert_sql
values = arel_attributes_values
primary_key_value = nil
if self.class.primary_key && Hash === values
primary_key_value = values[values.keys.find { |k|
k.name == self.class.primary_key
}]
if !primary_key_value && connection.prefetch_primary_key?(self.class.table_name)
primary_key_value = connection.next_sequence_value(self.class.sequence_name)
values[self.class.arel_table[self.class.primary_key]] = primary_key_value
end
end
im = self.class.arel_table.create_insert
im.into self.class.arel_table
conn = self.class.connection
substitutes = values.sort_by { |arel_attr,_| arel_attr.name }
binds = substitutes.map do |arel_attr, value|
[self.class.columns_hash[arel_attr.name], value]
end
substitutes.each_with_index do |tuple, i|
tuple[1] = conn.substitute_at(binds[i][0], i)
end
if values.empty? # empty insert
im.values = Arel.sql(self.class.connectionconnection.empty_insert_statement_value)
else
im.insert substitutes
end
conn.to_sql(im,binds)
end
end
It turns out the code is in ActiveRecord::Relation and not ActiveRecord::Persistence. The only significant change is the last line which generates the sql instead of performing it.
If you dont want to save the model you call m.destroy when you are done with the object.
You can log the sql query by debugging it like this
Rails.logger.debug "INSERT INTO models(url, header) VALUES(#{m.url}, #{m.header}).inspect
After search a lot over the Internet and forums, I think I found a better solution for your problem: just requires two line of code.
I found a good gem that do exactly what you want, but this gem only works for Rails 3.2 and older. I talked with author and he doesn't want support this gem anymore. So I discovered by myself how to support Rails 4.0 and now I'm maintaining this gem.
Download the "models-to-sql-rails" gem here, supporting Rails 4.0 and older.
With this gem, you can easily do the following. (the examples inside values are just a joke, you will get the correct values when using it in your object).
For objects:
object.to_sql_insert
# INSERT INTO modelName (field1, field2) VALUES ('Wow, amaze gem', 'much doge')
For array of objets:
array_of_objects.to_sql_insert
# INSERT INTO modelName (field1, field2) VALUES ('Awesome doge', "im fucking cop")
# INSERT INTO modelName (field1, field2) VALUES ('much profit', 'much doge')
# (...)
Just see the Github of this project and you'll find how to install and use this wonderful gem.

How to disable ActiveRecord logging for a certain column?

I'm running into a problem which, in my opinion, must be a problem for most rails users but I could not find any solution for it yet.
When, for instance, performing a file upload of a potentially large, binary file and storing it in the database, you most certainly don't want rails or ActiveRecord to log this specific field in development mode (log file, stdout). In case of a fairly big file, this causes the query execution to break and almost kills my terminal.
Is there any reliable and non-hacky method of disabling logging for particular fields? Remember, I'm not talking about disabling logging for request parameters - this has been solved quite nicely.
Thanks for any information on that!
If this helps anyone, here is a Rails 4.1 compatible version of the snippet above that also includes redaction of non-binary bind params (e.g. a text or json column), and increases the logging to 100 char before redaction. Thanks for everyone's help here!
class ActiveRecord::ConnectionAdapters::AbstractAdapter
protected
def log_with_binary_truncate(sql, name="SQL", binds=[], statement_name = nil, &block)
binds = binds.map do |col, data|
if data.is_a?(String) && data.size > 100
data = "#{data[0,10]} [REDACTED #{data.size - 20} bytes] #{data[-10,10]}"
end
[col, data]
end
sql = sql.gsub(/(?<='\\x[0-9a-f]{100})[0-9a-f]{100,}?(?=[0-9a-f]{100}')/) do |match|
"[REDACTED #{match.size} chars]"
end
log_without_binary_truncate(sql, name, binds, statement_name, &block)
end
alias_method_chain :log, :binary_truncate
end
Create a file in config/initializers whitch modifies ActiveRecord::ConnectionAdapters::AbstractAdapter like so:
class ActiveRecord::ConnectionAdapters::AbstractAdapter
protected
def log_with_trunkate(sql, name="SQL", binds=[], &block)
b = binds.map {|k,v|
v = v.truncate(20) if v.is_a? String and v.size > 20
[k,v]
}
log_without_trunkate(sql, name, b, &block)
end
alias_method_chain :log, :trunkate
end
This will trunkate all fields that are longer than 20 chars in the output log.
NOTE: Works with rails 3, but apparently not 4 (which was not released when this question was answered)
In your application.rb file:
config.filter_parameters << :parameter_name
This will remove that attribute from displaying in your logs, replacing it with [FILTERED]
The common use case for filtering parameters is of course passwords, but I see no reason it shouldn't work with your binary file field.
Here's an implementation of the approach suggested by #Patrik that works for both inserts and updates against PostgreSQL. The regex may need to be tweaked depending upon the formatting of the SQL for other databases.
class ActiveRecord::ConnectionAdapters::AbstractAdapter
protected
def log_with_binary_truncate(sql, name="SQL", binds=[], &block)
binds = binds.map do |col, data|
if col.type == :binary && data.is_a?(String) && data.size > 27
data = "#{data[0,10]}[REDACTED #{data.size - 20} bytes]#{data[-10,10]}"
end
[col, data]
end
sql = sql.gsub(/(?<='\\x[0-9a-f]{20})[0-9a-f]{20,}?(?=[0-9a-f]{20}')/) do |match|
"[REDACTED #{match.size} chars]"
end
log_without_binary_truncate(sql, name, binds, &block)
end
alias_method_chain :log, :binary_truncate
end
I'm not deliriously happy with it, but it's good enough for now. It preserves the first and last 10 bytes of the binary string and indicates how many bytes/chars were removed out of the middle. It doesn't redact unless the redacted text is longer than the replacing text (i.e. if there aren't at least 20 chars to remove, then "[REDACTED xx chars]" would be longer than the replaced text, so there's no point). I did not do performance testing to determine whether using greedy or lazy repetition for the redacted chunk was faster. My instinct was to go lazy, so I did, but it's possible that greedy would be faster especially if there is only one binary field in the SQL.
In rails 5 you could put it in initializer:
module SqlLogFilter
FILTERS = Set.new(%w(geo_data value timeline))
def render_bind(attribute)
return [attribute.name, '<filtered>'] if FILTERS.include?(attribute.name)
super
end
end
ActiveRecord::LogSubscriber.prepend SqlLogFilter
For filter attributes geo_data, value and timeline for instance.
Here is a Rails 5 version. Out of the box Rails 5 truncates binary data, but not long text columns.
module LogTruncater
def render_bind(attribute)
num_chars = Integer(ENV['ACTIVERECORD_SQL_LOG_MAX_VALUE']) rescue 120
half_num_chars = num_chars / 2
value = if attribute.type.binary? && attribute.value
if attribute.value.is_a?(Hash)
"<#{attribute.value_for_database.to_s.bytesize} bytes of binary data>"
else
"<#{attribute.value.bytesize} bytes of binary data>"
end
else
attribute.value_for_database
end
if value.is_a?(String) && value.size > num_chars
value = "#{value[0,half_num_chars]} [REDACTED #{value.size - num_chars} chars] #{value[-half_num_chars,half_num_chars]}"
end
[attribute.name, value]
end
end
class ActiveRecord::LogSubscriber
prepend LogTruncater
end
I didn't find much on this either, though one thing you could do is
ActiveRecord::Base.logger = nil
to disable logging entirely, though you would probably not want to do that. A better solution might be to set the ActiveRecord logger to some custom subclass that doesn't log messages over a certain size, or does something smarter to parse out specific sections of a message that are too large.
This doesn't seem ideal, but it does seem like a workable solution, though I haven't looked at specific implementation details. I would be really interested to hear any better solutions.
I encountered the same problem, but I couldn't figure out a clean solution to the problem. I ended up writing a custom formatter for the Rails logger that filters out the blob.
The code above needs to be placed in config/initializers, and replace file_data with the column you want to remove and file_name with the column that appears after in the regular expression.
version for Rails 5.2+
module LogTruncater
def render_bind(attr, value)
num_chars = Integer(ENV['ACTIVERECORD_SQL_LOG_MAX_VALUE']) rescue 120
half_num_chars = num_chars / 2
if attr.is_a?(Array)
attr = attr.first
elsif attr.type.binary? && attr.value
value = "<#{attr.value_for_database.to_s.bytesize} bytes of binary data>"
end
if value.is_a?(String) && value.size > num_chars
value = "#{value[0,half_num_chars]} [REDACTED #{value.size - num_chars} chars] #{value[-half_num_chars,half_num_chars]}"
end
[attr && attr.name, value]
end
end
class ActiveRecord::LogSubscriber
prepend LogTruncater
end
This is what works for me for Rails 6:
# initializers/scrub_logs.rb
module ActiveSupport
module TaggedLogging
module Formatter # :nodoc:
# Hide PlaygroundTemplate#yaml column from SQL queries because it's huge.
def scrub_yaml_source(input)
input.gsub(/\["yaml", ".*, \["/, '["yaml", "REDACTED"], ["')
end
alias orig_call call
def call(severity, timestamp, progname, msg)
orig_call(severity, timestamp, progname, scrub_yaml_source(msg))
end
end
end
end
Replace yaml with the name of your column.

Setting many key/value pairs

I'm working on a rake task which imports from a JSON feed into an ActiveRecord called Person.
Person has quite a few attributes and rather than write lines of code for setting each attribute I'm trying different methods.
The closest I've got is shown below. This works nicely as far as outputing to screen but when I check the values have actually been set on the ActiveRecord itself it's always nil.
So it looks like I can't use .to_sym to solve my problem?
Any suggestions?
I should also mention that I'm just starting out with Ruby, have been doing quite a bit of Objective-c and now need to embrace the Interwebs :)
http = Net::HTTP.new(url.host, url.port)
http.read_timeout = 30
json = http.get(url.to_s).body
parsed = JSON.parse(json)
if parsed.has_key? 'code'
updatePerson = Person.find_or_initialize_by_code(parsed['code'])
puts updatePerson.code
parsed.each do |key, value|
puts "#{key} is #{value}"
symkey = key.to_sym
updatePerson[:symkey] = value.to_s
updatePerson.save
puts "#{key}....." # shows the current key
puts updatePerson[:symkey] # shows the correct value
puts updatePerson.first_name # a sample key, it's returning nil
end
You're probably looking for update_attributes():
if parsed.has_key?('code')
code = parsed.delete('code')
person = Person.find_or_initialize_by_code(code)
if person.update_attributes(parsed)
puts "#{person.first_name} successfully saved"
else
puts "Failed to save #{person.first_name}"
end
end
Your code can not assign any attribute, because you are always assigning to the single attribute named "symkey":
symkey = key.to_sym
updatePerson[:symkey] = value.to_s # assigns to attribute "symkey", not to the attribute with the name stored in variable symkey
If you want to make key into a symbol (which is probably not even necessary) and then use that as an index to access the attribute in updatePerson, you can write:
updatePerson[key.to_sym] = value.to_s
updatePerson.save
But this - more or less - is the same as
updatePerson.updateAttribute(key.to_sym, value.to_s) # update and save
except that no validation is triggered, so use with care.
And performancewise it might not be such a good idea to save the person after each assignment, so maybe you want to defer the .save() call until after you have assigned all attributes.
Nevertheless, updateAttributes(...) is something you might want to be looking into - if you do, do not forget to inform yourself on attr_protected or attr_accessible, as they protect attributes from "bulk assignment"
You can use write_attribute:
parsed.each do |key, value|
updatePerson.write_attribute(key, value)
end
updatePerson.save

Resources