Handling a massive query in Rails - ruby-on-rails

What's the best way to handle a large result set with Rails and Postgres? I didn't have a problem until today, but now I'm trying to return a 124,000 record object of #network_hosts, which has effectively DoS'd my development server.
My activerecord orm isn't the prettiest, but I'm pretty sure cleaning it up isn't going to help in relation to performance.
#network_hosts = []
#host_count = 0
#company.locations.each do |l|
if l.grace_enabled == nil || l.grace_enabled == false
l.network_hosts.each do |h|
#host_count += 1
#network_hosts.push(h)
#network_hosts.sort! { |x,y| x.ip_address <=> y.ip_address }
#network_hosts = #network_hosts.first(5)
end
end
end
In the end, I need to be able to return #network_hosts to the controller for processing into the view.
Is this something that Sidekiq would be able to help with, or is it going to be just as long? If Sidekiq is the path to take, how do I handle not having the #network_hosts object upon page load since the job is running asyncronously?

I believe you want to (1) get rid of all that looping (you've got a lot of queries going on) and (2) do your sorting with your AR query instead of in the array.
Perhaps something like:
NetworkHost.
where(location: Location.where.not(grace_enabed: true).where(company: #company)).
order(ip_address: :asc).
tap do |network_hosts|
#network_hosts = network_hosts.limit(5)
#host_count = network_hosts.count
end
Something like that ought to do it in a single DB query.
I had to make some assumptions about how your associations are set up and that you're looking for locations where grace_enabled isn't true (nil or false).
I haven't tested this, so it may well be buggy. But, I think the direction is correct.

Something to remember, Rails won't execute any SQL queries until the result of the query is actually needed. (I'll be using User instead of NetworkHost so I can show you the console output as I go)
#users = User.where(first_name: 'Random');nil # No query run
=> nil
#users # query is now run because the results are needed (they are being output to the IRB window)
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
#users = User.where(first_name: 'Random') # query will be run because the results are needed for the output into the IRB window
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
Why is this important? It allows you to store the query you want to run in the instance variable and not execute it until you get to a view where you can use some of the nice methods of ActiveRecord::Batches. In particular, if you have some view (or export function, etc.) where you are iterating the #network_hosts, you can use find_each.
# Controller
#users = User.where(first_name: 'Random') # No query run
# view
#users.find_each(batch_size: 1) do |user|
puts "User's ID is #{user.id}"
end
# User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 1
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 1) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 2
# User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 2) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# => nil
Your query is not executed until the view, where it will now load only 1,000 records (configurable) into memory at a time. Once it reaches the end of those 1,000 records, it will automatically run another query to fetch the next 1,000 records. So your memory is much more sane, at the cost of extra database queries (which are usually pretty quick)

Related

Connections breaks includes

I have a following setup:
class Product < ApplicationRecord
has_many :variants
end
class Variant < ApplicationRecord
belongs_to :product
end
Types::QueryType = GraphQL::ObjectType.define do
connection :products, Types::ProductType.connection_type do
resolve -> (obj, _, _) do
Product.all.includes(:variants)
end
end
end
Types::ProductType = GraphQL::ObjectType.define do
connection :variants, Types::VariantType.connection_type do
resolve -> (obj, _, _) { obj.variants }
end
end
And running a following query:
{
products {
edges {
nodes {
variants {
edges {
node {
id
}
}
}
}
}
}
}
produces following SQL queries:
Product Load (2.7ms) SELECT "products".* FROM "products" LIMIT $1 [["LIMIT", 25]]
Variant Load (8.6ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" IN (1, 2, 3)
Variant Load (19.0ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 1], ["LIMIT", 25]]
Variant Load (13.6ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 2], ["LIMIT", 25]]
Variant Load (2.4ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 3], ["LIMIT", 25]]
As we can see in the sql output, includes works but graphql don't care and makes a n+1 anyway. Is that normal behaviour and i'm forced to use solutions like graphql-batch to fix that or something is not right with my setup? As far as i have seen all over the internet, using includes should be enough for such simple scenario and graphql should use the eager loaded data instead of producing the n+1. Have i done anything wrong in here?
I'm on graphql-ruby 1.7.9
I just received a reply on graphql-ruby issue tracker :
Hey, I noticed that LIMIT 25 is being applied to those queries. Do you
know where that's being applied? If you want to use the result from
the initial query, you should remove the LIMIT clause. (I'm guessing
that if you ask for .limit(25), ActiveRecord won't use a cached
relation.) Maybe you have a default_max_page_size? What happens if you
remove it?
So, long story short, i removed the default_max_page_size config from my schema and it resolved the issue.

Is there any way to access the Parent object whom called a singleton method?

Given the following functional snippet I'm having trouble reducing the database queries:
class User < ApplicationRecord
belongs_to :account
def self.do_something
self.find_each do |user|
puts "#{self.new.account.name}:#{user.name} did something"
end
end
end
class Account < ApplicationRecord
has_many :users
end
a = Account.first
puts 'starting'
a.users.do_something
Account Load (0.4ms) SELECT "accounts".* FROM "accounts" WHERE
"accounts"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
starting
Account Load (0.3ms) SELECT "accounts".* FROM "accounts" WHERE
"accounts"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Test:User did something
Account Load (0.3ms) SELECT "accounts".* FROM "accounts" WHERE
"accounts"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Test:User did something
Account Load (0.3ms) SELECT "accounts".* FROM "accounts" WHERE
"accounts"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Test:User did something
Account Load (0.3ms) SELECT "accounts".* FROM "accounts" WHERE
"accounts"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
Test:User did something
You can see that the Account model is being fetched from the database per user!
I was hoping to use something like self.account in the Singleton method to reference the original account, but the relationship obviously doesn't exist by default which is why I'm currently using self.new.account.
Is there anywhere else I can fetch the original Account model saved in a from inside self.do_something? I can obviously pass the account in a parameter, but that seems tedious especially if I may add arguments later...
Inside your find_each loop, you should be able to use user.account.
Outside that loop, I don't believe there's a documented / supported / won't-disappear-without-warning way to find the object. Depending on your Rails version, something like self.current_scope.proxy_association.owner might give you the answer you need... but do prefer user.account if at all possible: the more you use private APIs, the harder future upgrades can be.
Alternatively, consider using association extensions to define your do_something method inside the has_many association only -- if it's not suited to be called as User.do_something or User.where(name: "Bob").do_something (because those don't have an associated account), maybe it shouldn't be a top-level method after all.

How to delete another record in same table using after_destroy callback

I have a table in below columns
id, user_id, friend_user_id
If A is a friend of B then, I want to insert two records such as
1. user_id: 1, friend_user_id: 2
2. user_id: 2, friend_user_id: 1
I've did this using the after_create callback as below
after_create do
Friend.create(user_id: friend_user_id,friend_user_id: user_id)
end
I want to delete both records, if any one of the record has been deleted.
I've tried the after_destroy callback as below.
after_destroy do
Friend.where(friend_user_id: user_id,user_id: friend_user_id).first.destroy
end
But I'm getting the below error.
2.3.0 :002 > Friend.first.destroy
Friend Load (0.4ms) SELECT "friends".* FROM "friends" ORDER BY "friends"."id" ASC LIMIT 1
(0.2ms) begin transaction
SQL (0.7ms) DELETE FROM "friends" WHERE "friends"."id" = ? [["id", 10]]
Friend Load (0.3ms) SELECT "friends".* FROM "friends" WHERE "friends"."friend_user_id" = ? AND "friends"."user_id" = ? ORDER BY "friends"."id" ASC LIMIT 1 [["friend_user_id", 1], ["user_id", 2]]
SQL (0.1ms) DELETE FROM "friends" WHERE "friends"."id" = ? [["id", 11]]
Friend Load (0.1ms) SELECT "friends".* FROM "friends" WHERE "friends"."friend_user_id" = ? AND "friends"."user_id" = ? ORDER BY "friends"."id" ASC LIMIT 1 [["friend_user_id", 2], ["user_id", 1]]
(0.3ms) rollback transaction
NoMethodError: undefined method `destroy' for nil:NilClass
from /home/ubuntu/workspace/app/models/friend.rb:39:in `block in <class:Friend>'
I'm new to RoR. Any help would be appreciated.
Your code is quite wrong here, you will end up getting SystemStackError and that happens in both the places in after_create and after_destroy. What is happening is that when you destroy an element, then your after_destroy callback is executed which will again destroy an element and since it has deleted an element it will again execute a callback, but since you have deleted that element that is why you are getting this error NoMethodError: undefined methoddestroy' for nil:NilClass` because that element has already been deleted, to get a more clear picture use the same code and do a Friend.create and you will see what I am saying.

Ruby on Rails: dependent object destroyed when transfered from guest user to registered user

Here is my problem:
I'm using Devise's guest_user, that contains a logging_in method to transfer guest_user parameters to the registered user when he logs in. So in my case, the user has_many periods, dependent: :destroy, so here is the logging_in method:
def logging_in
guest_periods = guest_user.periods.all
guest_periods.each do |p|
p.user_id = current_user.id
p.save!
end
current_user.latest_entry = guest_user.latest_entry
current_user.is_in_zone = guest_user.is_in_zone
current_user.save
end
However, when a guest_user logs in, his periods gets destroyed instead of being transfered. Here is the log:
Started GET "/" for ::1 at 2015-05-11 00:18:03 +0300
Processing by WelcomeController#index as HTML
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 24]]
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 [["id", 23]]
Period Load (0.3ms) SELECT "periods".* FROM "periods" WHERE "periods"."user_id" = $1 [["user_id", 23]]
(0.2ms) BEGIN
CACHE (0.0ms) SELECT "periods".* FROM "periods" WHERE "periods"."user_id" = $1 [["user_id", 23]]
SQL (0.8ms) UPDATE "periods" SET "user_id" = $1, "updated_at" = $2 WHERE "periods"."id" = $3 [["user_id", 24], ["updated_at", "2015-05-10 21:18:03.863162"], ["id", 170]]
(0.9ms) COMMIT
(0.2ms) BEGIN
SQL (2.1ms) UPDATE "users" SET "is_in_zone" = $1, "latest_entry" = $2, "updated_at" = $3 WHERE "users"."id" = $4 [["is_in_zone", "t"], ["latest_entry", "2015-05-04"], ["updated_at", "2015-05-10 21:18:03.875572"], ["id", 24]]
(15.8ms) COMMIT
(0.5ms) BEGIN
SQL (0.3ms) DELETE FROM "periods" WHERE "periods"."id" = $1 [["id", 170]]
SQL (0.7ms) DELETE FROM "users" WHERE "users"."id" = $1 [["id", 23]]
(1.2ms) COMMIT
So we can see that the transfer is done, but then in the end, the periods are destroyed anyway. They should not be, as they are not belonging to the user to be destroyed any more.
Why is it happening?
Even though Period#user_id has changed, guest_user.periods is still loaded in memory and is what gets destroyed when you destroy the guest user. If you guest_user.reload, its associations will clear out and it becomes safe to destroy. You could also guest_user.periods(true) to force reload of just the periods.
Another option is:
guest_user.periods.update_all(user_id: current_user.id)
This executes a single query to perform the update, which will be nice if there are a lot of periods, and also doesn't load the guest_user.periods association, so it will load fresh during the destroy and find the correct empty set.

Ruby where with find_each

I am looking at the official Rails documentation which shows how to use the "find_each" method. Here is an example they gave
Person.where("age > 21").find_each do |person|
person.party_all_night!
end
This processes 1000 records at a time. However, I am still confused. How does this translate to SQL? What happens behind the scenes that allows Ruby to only process 1000 records at a time?
The reason I am sort of confused is because it seems Person.where("age > 21") would execute first, which would return ALL results.
For instance:
Person.where("age > 21").limit(10)
would return all persons in memory first, then give you the first 10, right?
Person.where("age > 21") returns an ActiveRecord relation only. It doesn't return all the results.
Person.where("age > 21").limit(10) does NOT load all the models in memory, that would be awful and unusable. It just loads 10.
find_each doesn't really process 1000 records at a times. It loads 1000 records, and then process each one of them.
I'd suggest running this from the console and looking at the SQL or reading the source code.
For example:
User.find_each(:batch_size => 40) do |user| end
User Load (1.0ms) SELECT "users".* FROM "users" WHERE ("users"."id" >= 0) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 96) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 156) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 219) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 272) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 314) ORDER BY "users"."id" ASC LIMIT 40
User Load (0.8ms) SELECT "users".* FROM "users" WHERE ("users"."id" > 355) ORDER BY "users"."id" ASC LIMIT 40
Or
bundle show activerecord
point your favorite code editor at that location and find the source
There is a cute lovely feature of Ruby, called codeblocks. What makes it really great, that every method is assuming to 〈silently〉 receive a codeblock as the last parameter. There is a possibility to dynamically check if the codeblock was given with if block_given?.
I guess you wonder why Ruby returns data with where alone and just prepares it with where.whatever chain? Well, ActiveRecord implicitly checks, whether the codeblock was given and either executes the underlying SQL statement and iterates through result or returns an iterator with prepared but not yet executed SQL statement. The latter will be lazy executed and cached on demand. The same practice is used in, say, Array.each. Behind the scene something like that is being performed:
sql_prepare
if block_given?
#cache = sql_execute_and_cache
#cache.each { yield #cache }
end
Hope it helps.

Resources