Rails benchmark reports less time than activerecord query - ruby-on-rails

I'm trying to benchmark this query:
Person.where(age: 60)
When I run it in the console it says:
Person Load (1.2ms) SELECT "people".* FROM "people" WHERE "people"."age" = ? [["age", 60]]
When I benchmark it, it reports 0.17ms
def self.get_ages_sql
sixties = Person.where(age: 60)
end
Benchmark.bmbm do |x|
x.report('sql') {Person.get_ages_sql}
end
Whats the discrepency between:
0.17ms (benchmark)
vs
1.2ms (reported when I run command in console)

This code does not do database request actually:
Person.where(age: 60)
It just build ActiveRecord::Relation.
You can ensure by executing next code line by line in console and watch wich line actually produce DB request:
relation = Person.where(age: 60); 1
relation.class.name
relation.to_a
But Console misleads you with "Person Load ..." because it calls additional method like #inspect on each code line result. And this additional method causes DB request:
relation = Person.where(age: 60).inspect; 1
And that is why your benchmark is wrong - you do test of query creation not whole DB request. It should look like:
def self.get_ages_sql
Person.where(age: 60).to_a
end
Added: To get deep into Console, create
class ConsoleTest
def inspect
data.inspect
end
def data
'Doing DB request'
end
def self.test_data
ct = self.new
puts 'No request yet'
ct.data
end
end
and then try in Console:
ct = ConsoleTest.new
and
ct = ConsoleTest.new; 1
ct.data
and
ConsoleTest.test_data

Related

How can I count the number of accesses/queries to database through Mongoid?

I'm using the Mongoid in a Rails project. To improve the performance of large queries, I'm using the includes method to eager load the relationships.
I would like to know if there is an easy way to count the real number of queries performed by a block of code so that I can check if my includes really reduced the number of DB accesses as expected. Something like:
# It will perform a large query to gather data from companies and their relationships
count = Mongoid.count_queries do
Company.to_csv
end
puts count # Number of DB access
I want to use this feature to add Rspec tests to prove that my query remains efficient after changes (e.g; when adding data from a new relationship). In python's Django framework, for instance, one may use the assertNumQueries method to this end.
Checking on rubygems.org didn't yield anything that seems to do what you want.
You might be better off looking into app performance tools like New Relic, Scout, or DataDog. You may be able to get some out of the gate benchmarking specs with
https://github.com/piotrmurach/rspec-benchmark
I just implemented this feature to count mongo queries in my rspec suite in a small module using mongo Command Monitoring.
It can be used like this:
expect { code }.to change { finds("users") }.by(3)
expect { code }.to change { updates("contents") }.by(1)
expect { code }.not_to change { inserts }
Or:
MongoSpy.flush
# ..code..
expect(MongoSpy.queries).to match(
"find" => { "users" => 1, "contents" => 1 },
"update" => { "users" => 1 }
)
Here is the Gist (ready to copy) for the last up-to-date version: https://gist.github.com/jarthod/ab712e8a31798799841c5677cea3d1a0
And here is the current version:
module MongoSpy
module Helpers
%w(find delete insert update).each do |op|
define_method(op.pluralize) { |ns = nil|
ns ? MongoSpy.queries[op][ns] : MongoSpy.queries[op].values.sum
}
end
end
class << self
def queries
#queries ||= Hash.new { |h, k| h[k] = Hash.new(0) }
end
def flush
#queries = nil
end
def started(event)
op = event.command.keys.first # find, update, delete, createIndexes, etc.
ns = event.command[op] # collection name
return unless ns.is_a?(String)
queries[op][ns] += 1
end
def succeeded(_); end
def failed(_); end
end
end
Mongo::Monitoring::Global.subscribe(Mongo::Monitoring::COMMAND, MongoSpy)
RSpec.configure do |config|
config.include MongoSpy::Helpers
end
What you're looking for is command monitoring. With Mongoid and the Ruby Driver, you can create a custom command monitoring class that you can use to subscribe to all commands made to the server.
I've adapted this from the Command Monitoring Guide for the Mongo Ruby Driver.
For this particular example, make sure that your Rails app has the log level set to debug. You can read more about the Rails logger here.
The first thing you want to do is define a subscriber class. This is the class that tells your application what to do when the Mongo::Client performs commands against the database. Here is the example class from the documentation:
class CommandLogSubscriber
include Mongo::Loggable
# called when a command is started
def started(event)
log_debug("#{prefix(event)} | STARTED | #{format_command(event.command)}")
end
# called when a command finishes successfully
def succeeded(event)
log_debug("#{prefix(event)} | SUCCEEDED | #{event.duration}s")
end
# called when a command terminates with a failure
def failed(event)
log_debug("#{prefix(event)} | FAILED | #{event.message} | #{event.duration}s")
end
private
def logger
Mongo::Logger.logger
end
def format_command(args)
begin
args.inspect
rescue Exception
'<Unable to inspect arguments>'
end
end
def format_message(message)
format("COMMAND | %s".freeze, message)
end
def prefix(event)
"#{event.address.to_s} | #{event.database_name}.#{event.command_name}"
end
end
(Make sure this class is auto-loaded in your Rails application.)
Next, you want to attach this subscriber to the client you use to perform commands.
subscriber = CommandLogSubscriber.new
Mongo::Monitoring::Global.subscribe(Mongo::Monitoring::COMMAND, subscriber)
# This is the name of the default client, but it's possible you've defined
# a client with a custom name in config/mongoid.yml
client = Mongoid::Clients.from_name('default')
client.subscribe( Mongo::Monitoring::COMMAND, subscriber)
Now, when Mongoid executes any commands against the database, those commands will be logged to your console.
# For example, if you have a model called Book
Book.create(title: "Narnia")
# => D, [2020-03-27T10:29:07.426209 #43656] DEBUG -- : COMMAND | localhost:27017 | mongoid_test_development.insert | STARTED | {"insert"=>"books", "ordered"=>true, "documents"=>[{"_id"=>BSON::ObjectId('5e7e0db3f8f498aa88b26e5d'), "title"=>"Narnia", "updated_at"=>2020-03-27 14:29:07.42239 UTC, "created_at"=>2020-03-27 14:29:07.42239 UTC}], "lsid"=>{"id"=><BSON::Binary:0x10600 type=uuid data=0xfff8a93b6c964acb...>}}
# => ...
You can modify the CommandLogSubscriber class to do something other than logging (such as incrementing a global counter).

Ruby on Rails best way to update 100k records

I am in a situation where I have to update more than 100k records in the database with best efficient way Please see below my code:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.find_each(batch_size: 10000) do |orders|
order_action = orders.actions.where("sender LIKE ?", "%ConfirmJob%").first if orders.actions
if !order_action.blank?
orders.update_attribute(:confirmed_at, order_action.created_at)
puts "order id = #{orders.id} has been updated.".green
end
end
puts "== completed ==".blue
end
end
Here I am breaking records into 10000 of each batch size and then try to update the record on the basis of some conditions so could anyone suggest me a more efficient way to do the same task.
Thank you in advance!
You can try update_all:
Payments::Order.joins(:actions).where(Payment::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
So your code will look like this:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).where(Payments::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
puts "== completed ==".blue
end
end
Update:
I've investigated an issue and found out that bulk update with joined table is a long term issue in rails
As set part uses string parameter as it is I suggest to add from clause there.
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).
where(Order::Action.arel_table[:sender].matches("%ConfirmJob%")).
update_all("confirmed_at = actions.created_at FROM actions")
puts "== completed ==".blue
end
end
You are doing Payments::Order.find_each so your solution will loop for each Payment::Order when you only want to loop for the ones having actions.server like '%ConfirmJob%', so I will go with this solution:
Payments::Order
.includes(:actions)
.joins(:actions)
.where("actions.server like '%?%'", "ConfirmJob")
.find_each do |order|
order_action = order.actions.first
order.update!(confirmed_at: order_action.created_at)
end

How to speed up a very frequently made query using raw SQL and without ORM?

I have an API endpoint that accounts for a little less than half of the average response time (on averaging taking about 514 ms, yikes). The endpoint simply returns some statistics about stored data scoped to particular time periods, such as this week, last week, this month, and so on...
There are a number of ways that we could reduce it's impact, like getting the clients to hit it less and with more particular queries such as only querying for "this week" when only that data is used. Here we focus on what can be done at the database-level first. In our current implementation we generate this data for all "time scopes" on-the-fly and the number of queries is enormous and made multiple times per second. No caching is used, but maybe there is a way to use Rails's cache_key, or the low-level Rails.cache?
The current implementation look something like this:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
#user = user
summaries = Struct::Summaries.new
TimeScope::TIME_SCOPES.each do |scope|
foos = user.foos.by_scope(scope.to_sym)
summary = Struct::Summary.new
# e.g: summaries.last_week = build_summary(foos)
summaries.send("#{scope}=", build_summary(summary, foos))
end
summaries
end
private_class_method
def self.build_summary(summary, foos)
summary.all_quuz = #user.foos_count
summary.all_quux = all_quux(foos)
summary.quuw = quuw(foos).to_f
%w[foo bar baz qux].product(
%w[quux quuz corge]
).each do |a, b|
# e.g: summary.foo_quux = quux(foos, "foo")
summary.send("#{a.downcase}_#{b}=", send(b, foos, a) || 0)
end
summary
end
def self.all_quuz(foos)
foos.count
end
def self.all_quux(foos)
foos.sum(:quux)
end
def self.quuw(foos)
foos.quuwable.total_quuw
end
def self.corge(foos, foo_type)
return if foos.count.zero?
count = self.quuz(foos, foo_type) || 0
count.to_f / foos.count
end
def self.quux(foos, foo_type)
case foo_type
when "foo"
foos.where(foo: true).sum(:quux)
when "bar"
foos.bar.where(foo: false).sum(:quux)
when "baz"
foos.baz.where(foo: false).sum(:quux)
when "qux"
foos.qux.sum(:quux)
end
end
def self.quuz(foos, foo_type)
case trip_type
when "foo"
foos.where(foo: true).count
when "bar"
foos.bar.where(foo: false).count
when "baz"
foos.baz.where(foo: false).count
when "qux"
foos.qux.count
end
end
end
To avoid making changes to the model, or creating migrations to create a table to store this data (both of which may be valid and better solutions) I decided maybe it would be easier to construct one large sql query that will be executed at once in the hopes that it will be faster to build the query string and execute it without the overhead of active record set up and tear down of SQL queries.
The new approach looks something like this, it is horrifying to me and I know there must be a more elegant way:
class FooSummaries
include SummaryStructs
def self.generate_for(user)
results = ActiveRecord::Base.connection.execute(build_query_for(user))
results.each do |result|
# build up summary struct from query results
end
end
def self.build_query_for(user)
TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[foo bar baz qux].map do |foo_type|
%[
select
'#{scope}_#{foo_type}',
sum(quux) as quux,
count(*), as quuz,
round(100.0 * (count(*) / #{user.foos_count.to_f}), 3) as corge
from
"foos"
where
"foo"."user_id" = #{user.id}
and "foos"."foo_type" = '#{foo_type.humanize}'
and "foos"."end_time" between '#{time_scope.from}' AND '#{time_scope.to}'
and "foos"."foo" = '#{foo_type == 'foo' ? 't' : 'f'}'
union
]
end
end.join.reverse.sub("union".reverse, "").reverse
end
end
The funny way of replacing the last occurance of union also horrifies but it seems to work. There must be a beter way as there are probably many things that are wrong with the above implementation(s). It may be helpful to note that I use Postgresql and have no problem with writing queries that are not portable to other DB's. Any advice is truly appreciated!
Thanks for reading!
Update: I found a solution that works for me and sped up the endpoint that uses this service object by 500% ! Essentially the idea is, instead of building a query string and then executing it for each set of parameters, we create a prepared statement using prepare followed by an exec_prepared passing in parameters to the query. Since this query is made many times over this is a useful optmization because, as per the documentation:
A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
We prepare the query like so:
def prepare_query!
ActiveRecord::Base.transaction do
connection.prepare("foos_summary",
%[with scoped_foos as (
select
*
from
"foos"
where
"foos"."user_id" = $3
and ("foos"."end_time" between $4 and $5)
)
select
$1::text as scope,
$2::text as foo_type,
sum(quux)::float as quux,
sum(eggs + bacon + ham)::float as food,
count(*) as count,
round((sum(quux) / nullif(
(select
sum(quux)
from
scoped_foos), 0))::numeric,
5)::float as quuz
from
scoped_foos
where
(case $6
when 'Baz'
then (baz = 't')
else
(baz = 'f' and foo_type = $6)
end
)
])
end
You can see in this query we use a common table expression for more readability and to avoid writing the same select query twice over.
Then we execute the query, passing in the parameters we need:
def connection
#connection ||= ActiveRecord::Base.connection.raw_connection
end
def query_results
prepare_query! unless query_already_prepared?
#results ||= TimeScope::TIME_SCOPES.map do |scope|
time_scope = TimeScope.new(scope)
%w[bacon eggs ham spam].map do |foo_type|
connection.exec_prepared("foos_summary",
[scope,
foo_type,
#user.id,
time_scope.from,
time_scope.to,
foo_type.humanize])
end
end
end
Where query_already_prepared? is a simple check in the prepared statements table maintained by postgres:
def query_already_prepared?
connection.exec(%(select
name
from
pg_prepared_statements
where name = 'foos_summary')).count.positive?
end
A nice solution, I thought! Hopefully the technique illustrated here will help others with a similar problems.

Logging raw SQL errors in Rake Tasks

I'm using raw sql bulk updates (for performance reasons) in the context of a rake task. Something like the following:
update_sql = Book.connection.execute("UPDATE books AS b SET
stock = vs.stock,
promotion = vs.promotion,
sales = vs.sales
FROM (values #{values_string}) AS vs
(stock, promotion, sales) WHERE b.id = vs.id;")
While everything is "transparent" in local development, if this SQL fails in production during the execution of the rails task (for example because the promotion column is nil and the statement becomes invalid), no error is logged.
I can manually log this with catching the exception, like below, however some option that would allow for automatic logging would be better.
begin
...
rescue ActiveRecord::StatementInvalid => e
Rails.logger.fatal "Books update: ActiveRecord::StatementInvalid: "+ e.to_s
end
You can make your own custom class in your model folder:
app/models/custom_sql_logger.rb :
class CustomSqlLogger
def self.debug(msg=nil)
#custom_log ||= Logger.new("#{Rails.root}/log/custom_sql.log")
#custom_log.debug(msg) unless msg.nil?
end
end
Then go to the rake task where you would like to debug updated fields for example lib/task/calculate_avarages.rake and call your custom debugger:
CustomSqlLogger.debug "The field was successfully updated into DB"
Example from my project:
require 'rake'
task :calculate_averages => :environment do
products = Product.all
products.each do |product|
puts "Calculating average rating for #{product.name}..."
product.update_attribute(:average_rating, product.reviews.average("rating"))
CustomSqlLogger.debug "#{product.name} was susscefully updated into DB"
end
end
Custom debugger will create the new file custom_sql.log into log folder: log/custom_sql.log and saved all information there. Beware of a log file size after a while.

Inconsistent results from active-record query

This is my Person model which has the following query in a method:
def get_uniq_person_ids
uniq_person_ids = select('person_id').where(:state => '1').uniq
uniq_person_ids
end
My Test is as follows:
def test_uniqueness
Person.delete_all
assert_equal(0, Person.count)
# ..... Adding 8 rows to the database with 2 unique person_id.....
pids = Person.get_uniq_person_ids
assert_equal(pids.size, 2)
end
Test fails with the following:
Failure:
<8> expected but was
<2>.
There are 8 rows but only 2 unique person_id in the table.
This is what I tried:
puts pids before assert. It prints only 2 objects. Test fails with the above message.
binding.pry right before the query. Size is 2 which is expected and the test passes this time.
Why is the result so inconsistent? Is it a timing issue?
Note: I am using sqlite as my database.
Okay, so I am not sure what the actual issue was but the following solved it:
def get_uniq_person_ids
uniq_person_ids = select('person_id').where(:state => '1').uniq
uniq_person_ids = uniq_person_ids.map(&:person_id)
uniq_person_ids
end
I added the line uniq_person_ids = uniq_person_ids.map(&:person_id) to create the array of person_ids.

Resources