Avoid n+1 query when indexing on elasticsearch rails - ruby-on-rails

I have a Genre model which have it's name translated in a genre_translations table (using the globalize gem)
I'm trying to indexing the model using the elasticsearch-rails gem
def as_indexed_json(options = {})
as_json(
only: %i(type available),
methods: %i(name),
)
end
but when I do Genre.import I get the following on my rails console:
[1] pry(main)> Genre.import
Genre Load (27.1ms) SELECT "genres".* FROM "genres" ORDER BY "genres"."id" ASC LIMIT 1000
Genre::Translation Load (23.9ms) SELECT "genre_translations".* FROM "genre_translations" WHERE "genre_translations"."genre_id" = $1 [["genre_id", 1]]
Genre::Translation Load (0.3ms) SELECT "genre_translations".* FROM "genre_translations" WHERE "genre_translations"."genre_id" = $1 [["genre_id", 2]]
Genre::Translation Load (0.3ms) SELECT "genre_translations".* FROM "genre_translations" WHERE "genre_translations"."genre_id" = $1 [["genre_id", 3]]
...
Any suggestion on how to index all the Genre items with a join to avoid the N+1 behaviour?

From the doc here
# #example Pass an ActiveRecord query to limit the imported records
#
# Article.import query: -> { where(author_id: author_id) }
So you could do:
Genre.import query: -> { includes(:translations) }

Related

ActiveRecord::StatementInvalid: PG::UndefinedTable in many to many relation but table exists

I have an easy many to many relation and It doesn't work and I cannot understand why. I'm sure that is something so obvious... but..
class Content < ApplicationRecord
has_many :content_brands
has_many :brands, through: :content_brands
end
class ContentBrand < ApplicationRecord
belongs_to :content
belongs_to :brand
end
class Brand < ApplicationRecord
establish_connection Rails.application.config.brands_database_configuration
has_many :content_brands
has_many :contents, through: :content_brands
end
But
irb(main):002:0> Content.first.brands
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERRORE: la relazione "content_brands" non esiste
LINE 1: SELECT "brands".* FROM "brands" INNER JOIN "content_brands"...
^
: SELECT "brands".* FROM "brands" INNER JOIN "content_brands" ON "brands"."id" = "content_brands"."brand_id" WHERE "content_brands"."content_id" = $1 ORDER BY "brands"."name" ASC LIMIT $2
The table exists, I can query it
irb(main):006:0> ContentBrand.first.brand
ContentBrand Load (0.5ms) SELECT "content_brands".* FROM "content_brands" ORDER BY "content_brands"."id" ASC LIMIT $1 [["LIMIT", 1]]
Brand Load (27.4ms) SELECT "brands".* FROM "brands" WHERE "brands"."id" = $1 ORDER BY "brands"."name" ASC LIMIT $2 [["id", 1], ["LIMIT", 1]]
=> #<Brand id: 1, name: "Nokia", logo: "nokia.jpeg", created_at: "2016-12-08 15:50:48", updated_at: "2017-02-02 15:51:43", web_site: "http://www.nokia.it">
Why?
I'm getting crazy because the inverse relation works
Brand.first.contents
Brand Load (25.8ms) SELECT "brands".* FROM "brands" ORDER BY "brands"."name" ASC LIMIT $1 [["LIMIT", 1]]
Content Load (0.7ms) SELECT "contents".* FROM "contents" INNER JOIN "content_brands" ON "contents"."id" = "content_brands"."content_id" WHERE "content_brands"."brand_id" = $1 ORDER BY "contents"."published_at" DESC LIMIT $2 [["brand_id", 391], ["LIMIT", 11]]
=> #<ActiveRecord::Associations::CollectionProxy []>
irb(main):011:0>
Update: I forgot to tell you that Brand is on another database...
You can't setup associations to a model that is stored in another database in ActiveRecord. Which makes sense since you can't join another database in a single query in Postgres without jumping through some pretty serious hoops (Postgres_FDW). And with the polyglot nature of ActiveRecord this would just be too much complexity for a very limited use case.
If its in any way possible I would switch to a single database setup even if it means that you have to duplicate data.
If you look at the "inverse query" you can see that it works because its not a single query:
# queries the "brands" database
Brand Load (25.8ms) SELECT "brands".* FROM "brands" ORDER BY "brands"."name" ASC LIMIT $1 [["LIMIT", 1]]
# queries your main database
Content Load (0.7ms) SELECT "contents".* FROM "contents" INNER JOIN "content_brands" ON "contents"."id" = "content_brands"."content_id" WHERE "content_brands"."brand_id" = $1 ORDER BY "contents"."published_at" DESC LIMIT $2 [["brand_id", 391], ["LIMIT", 11]]
However this does not mean that the concept is feasible.

Connections breaks includes

I have a following setup:
class Product < ApplicationRecord
has_many :variants
end
class Variant < ApplicationRecord
belongs_to :product
end
Types::QueryType = GraphQL::ObjectType.define do
connection :products, Types::ProductType.connection_type do
resolve -> (obj, _, _) do
Product.all.includes(:variants)
end
end
end
Types::ProductType = GraphQL::ObjectType.define do
connection :variants, Types::VariantType.connection_type do
resolve -> (obj, _, _) { obj.variants }
end
end
And running a following query:
{
products {
edges {
nodes {
variants {
edges {
node {
id
}
}
}
}
}
}
}
produces following SQL queries:
Product Load (2.7ms) SELECT "products".* FROM "products" LIMIT $1 [["LIMIT", 25]]
Variant Load (8.6ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" IN (1, 2, 3)
Variant Load (19.0ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 1], ["LIMIT", 25]]
Variant Load (13.6ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 2], ["LIMIT", 25]]
Variant Load (2.4ms) SELECT "variants".* FROM "variants" WHERE "variants"."product_id" = $1 LIMIT $2 [["product_id", 3], ["LIMIT", 25]]
As we can see in the sql output, includes works but graphql don't care and makes a n+1 anyway. Is that normal behaviour and i'm forced to use solutions like graphql-batch to fix that or something is not right with my setup? As far as i have seen all over the internet, using includes should be enough for such simple scenario and graphql should use the eager loaded data instead of producing the n+1. Have i done anything wrong in here?
I'm on graphql-ruby 1.7.9
I just received a reply on graphql-ruby issue tracker :
Hey, I noticed that LIMIT 25 is being applied to those queries. Do you
know where that's being applied? If you want to use the result from
the initial query, you should remove the LIMIT clause. (I'm guessing
that if you ask for .limit(25), ActiveRecord won't use a cached
relation.) Maybe you have a default_max_page_size? What happens if you
remove it?
So, long story short, i removed the default_max_page_size config from my schema and it resolved the issue.

Handling a massive query in Rails

What's the best way to handle a large result set with Rails and Postgres? I didn't have a problem until today, but now I'm trying to return a 124,000 record object of #network_hosts, which has effectively DoS'd my development server.
My activerecord orm isn't the prettiest, but I'm pretty sure cleaning it up isn't going to help in relation to performance.
#network_hosts = []
#host_count = 0
#company.locations.each do |l|
if l.grace_enabled == nil || l.grace_enabled == false
l.network_hosts.each do |h|
#host_count += 1
#network_hosts.push(h)
#network_hosts.sort! { |x,y| x.ip_address <=> y.ip_address }
#network_hosts = #network_hosts.first(5)
end
end
end
In the end, I need to be able to return #network_hosts to the controller for processing into the view.
Is this something that Sidekiq would be able to help with, or is it going to be just as long? If Sidekiq is the path to take, how do I handle not having the #network_hosts object upon page load since the job is running asyncronously?
I believe you want to (1) get rid of all that looping (you've got a lot of queries going on) and (2) do your sorting with your AR query instead of in the array.
Perhaps something like:
NetworkHost.
where(location: Location.where.not(grace_enabed: true).where(company: #company)).
order(ip_address: :asc).
tap do |network_hosts|
#network_hosts = network_hosts.limit(5)
#host_count = network_hosts.count
end
Something like that ought to do it in a single DB query.
I had to make some assumptions about how your associations are set up and that you're looking for locations where grace_enabled isn't true (nil or false).
I haven't tested this, so it may well be buggy. But, I think the direction is correct.
Something to remember, Rails won't execute any SQL queries until the result of the query is actually needed. (I'll be using User instead of NetworkHost so I can show you the console output as I go)
#users = User.where(first_name: 'Random');nil # No query run
=> nil
#users # query is now run because the results are needed (they are being output to the IRB window)
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
#users = User.where(first_name: 'Random') # query will be run because the results are needed for the output into the IRB window
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2 [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
Why is this important? It allows you to store the query you want to run in the instance variable and not execute it until you get to a view where you can use some of the nice methods of ActiveRecord::Batches. In particular, if you have some view (or export function, etc.) where you are iterating the #network_hosts, you can use find_each.
# Controller
#users = User.where(first_name: 'Random') # No query run
# view
#users.find_each(batch_size: 1) do |user|
puts "User's ID is #{user.id}"
end
# User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 1
# User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 1) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# User's ID is 2
# User Load (0.3ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 2) ORDER BY "users"."id" ASC LIMIT $2 [["first_name", "Random"], ["LIMIT", 1]]
# => nil
Your query is not executed until the view, where it will now load only 1,000 records (configurable) into memory at a time. Once it reaches the end of those 1,000 records, it will automatically run another query to fetch the next 1,000 records. So your memory is much more sane, at the cost of extra database queries (which are usually pretty quick)

Ruby on Rails query yielding unexpected results

This is a followup to an earlier thread: Ruby on Rails query not working properly.
As noted, I have several listings. In particular, a listing has_many :spaces, through: :designations and has_many :amenities, through: :offerings.
I define filters to restrict the listings that get shown.
The two main ones are:
# filter by amenities
if params[:search][:amenity_ids].present? && params[:search][:amenity_ids].reject(&:blank?).size > 0
#listings = #listings.joins(:amenities).where(amenities: { id: params[:search][:amenity_ids].reject(&:blank?) }).group('listings.id').having('count(*) >= ?', params[:search][:amenity_ids].reject(&:blank?).size)
end
# filter by space type
if params[:search][:space_ids].present? && params[:search][:space_ids].reject(&:blank?).size > 0
#listings = #listings.joins(:spaces).where('space_id IN (?)', params[:search][:space_ids].reject(&:blank?)).uniq
end
(Note that these reflect the solution indicated in the earlier thread.)
The first filter says: get all of the listings that have ALL of the selected amenities.
The second filter says: get all of the listings that match ANY of the selected space types.
But one issue remains. If I filter for space types 1 and 2 and amenities 1 and 2, I get listing A (which has space types 1 and 2 and amenity 2).
But I should presumably get [] since no listing has both amenities 1 and 2.
What is going on with these queries? Should they not be independent, but chainable?
Here is the output (I disabled the other filters for clarity):
Started GET "/listings/search?utf8=%E2%9C%93&search%5Baddress%5D=London%2C+United+Kingdom&search%5Bprice_min%5D=0&search%5Bprice_max%5D=1000.0&search%5Bprice_lower%5D=0&search%5Bprice_upper%5D=1000&search%5Bsize_min%5D=0&search%5Bsize_max%5D=1000&search%5Bsize_lower%5D=0&search%5Bsize_upper%5D=1000&search%5Bspace_ids%5D%5B%5D=1&search%5Bspace_ids%5D%5B%5D=2&search%5Bspace_ids%5D%5B%5D=&search%5Bamenity_ids%5D%5B%5D=1&search%5Bamenity_ids%5D%5B%5D=2&search%5Bamenity_ids%5D%5B%5D=&search%5Bsort_by%5D=Distance&commit=Apply+Filters" for ::1 at 2015-10-31 14:25:58 +0000
ActiveRecord::SchemaMigration Load (0.4ms) SELECT "schema_migrations".* FROM "schema_migrations"
Processing by ListingsController#search as HTML
Parameters: {"utf8"=>"✓", "search"=>{"address"=>"London, United Kingdom", "price_min"=>"0", "price_max"=>"1000.0", "price_lower"=>"0", "price_upper"=>"1000", "size_min"=>"0", "size_max"=>"1000", "size_lower"=>"0", "size_upper"=>"1000", "space_ids"=>["1", "2", ""], "amenity_ids"=>["1", "2", ""], "sort_by"=>"Distance"}, "commit"=>"Apply Filters"}
(1.5ms) SELECT MAX("listings"."price") FROM "listings"
(0.6ms) SELECT MAX("listings"."size") FROM "listings"
Listing Load (4.4ms) SELECT DISTINCT "listings".* FROM "listings" INNER JOIN "offerings" ON "offerings"."listing_id" = "listings"."id" INNER JOIN "amenities" ON "amenities"."id" = "offerings"."amenity_id" INNER JOIN "designations" ON "designations"."listing_id" = "listings"."id" INNER JOIN "spaces" ON "spaces"."id" = "designations"."space_id" WHERE "amenities"."id" IN (1, 2) AND (space_id IN ('1','2')) GROUP BY listings.id HAVING count(*) >= 2 LIMIT 24 OFFSET 0
Image Load (0.5ms) SELECT "images".* FROM "images" WHERE "images"."listing_id" = $1 ORDER BY "images"."id" ASC LIMIT 1 [["listing_id", 1]]
Space Load (0.6ms) SELECT "spaces".* FROM "spaces" INNER JOIN "designations" ON "spaces"."id" = "designations"."space_id" WHERE "designations"."listing_id" = $1 [["listing_id", 1]]
Rendered listings/_map_infowindow.html.erb (56.1ms)
Rendered listings/_price_slider.html.erb (0.7ms)
Rendered listings/_size_slider.html.erb (0.6ms)
Space Load (0.4ms) SELECT "spaces".* FROM "spaces"
Amenity Load (0.4ms) SELECT "amenities".* FROM "amenities"
Rendered scripts/_checkbox_toggle.html.erb (0.5ms)
Rendered listings/_search_filters.html.erb (75.5ms)
(0.4ms) SELECT "spaces"."name" FROM "spaces" INNER JOIN "designations" ON "spaces"."id" = "designations"."space_id" WHERE "designations"."listing_id" = $1 [["listing_id", 1]]
CACHE (0.0ms) SELECT "images".* FROM "images" WHERE "images"."listing_id" = $1 ORDER BY "images"."id" ASC LIMIT 1 [["listing_id", 1]]
User Load (0.7ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 [["id", 3]]
Avatar Load (0.7ms) SELECT "avatars".* FROM "avatars" WHERE "avatars"."user_id" = $1 ORDER BY "avatars"."id" ASC LIMIT 1 [["user_id", 3]]
Rendered listings/_listing_grid.html.erb (80.8ms)
(3.1ms) SELECT DISTINCT COUNT(DISTINCT "listings"."id") AS count_id, listings.id AS listings_id FROM "listings" INNER JOIN "offerings" ON "offerings"."listing_id" = "listings"."id" INNER JOIN "amenities" ON "amenities"."id" = "offerings"."amenity_id" INNER JOIN "designations" ON "designations"."listing_id" = "listings"."id" INNER JOIN "spaces" ON "spaces"."id" = "designations"."space_id" WHERE "amenities"."id" IN (1, 2) AND (space_id IN ('1','2')) GROUP BY listings.id HAVING count(*) >= 2
Rendered scripts/_map.html.erb (2.9ms)
Rendered scripts/_shuffle.html.erb (0.3ms)
Rendered listings/search.html.erb within layouts/application (178.7ms)
Rendered layouts/_head.html.erb (475.7ms)
Rendered scripts/_address_autocomplete.html.erb (0.3ms)
Rendered listings/_search_address.html.erb (13.7ms)
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 3]]
(0.5ms) SELECT DISTINCT "conversations"."id" FROM "conversations" WHERE (sender_id = 3 OR recipient_id = 3)
(0.5ms) SELECT DISTINCT "messages"."conversation_id" FROM "messages" WHERE ("messages"."user_id" != $1) AND "messages"."read" = $2 [["user_id", 3], ["read", "false"]]
CACHE (0.0ms) SELECT "avatars".* FROM "avatars" WHERE "avatars"."user_id" = $1 ORDER BY "avatars"."id" ASC LIMIT 1 [["user_id", 3]]
Rendered layouts/_navbar.html.erb (32.5ms)
Rendered scripts/_fade_error.html.erb (0.4ms)
Rendered scripts/_transparent_navbar.html.erb (0.3ms)
Completed 200 OK in 1045ms (Views: 688.6ms | ActiveRecord: 30.6ms)
I have also tried adding raise 'test' in order to do some testing in the better_errors live shell. I discovered:
>> #listings
=> #<ActiveRecord::Relation []>
>> #listings = #listings.joins(:spaces).where('space_id IN (?)', params[:search][:space_ids].reject(&:blank?)).uniq
=> #<ActiveRecord::Relation [#<Listing id: 1, title: "Test 1", address: "New Inn Passage, London WC2A 2AE, UK", latitude: 51.5139664, longitude: -0.1167323, size: 1000, min_lease: 1, price: #<BigDecimal:7f89ec245c98,'0.1E4',9(18)>, description: "Test 1", user_id: 3, state: "public", created_at: "2015-10-30 17:37:04", updated_at: "2015-10-30 17:37:04">]>
>>
Why is this happening and how can I fix it?
Any help would be greatly appreciated.
The issue is with how you are determining that all of the amenities have been matched.
When you are only joining the amenities then the count of the rows (prior to grouping) for a listing is the number of matched amenities, so the having clause does what you want.
When you join the spaces table too, then the number of rows (again prior to grouping) for a listing is the number of matches amenities times the number of matched rows. In your example there are 2 spaces and 1 amenity, so the count is 2 and your having clause is satisfied.
If instead of filtering on count(*) you filtered on count(distinct amenities.id) then you should be counting the number of amenity rows that were joined, which should produce the desired result.
I may have figured out the issue. I did the following in the console to test:
Set #listings = Listing.all.
Set #listings = #listings.joins(:amenities).where(amenities: { id: ['1', '2'].reject(&:blank?) }).group('listings.id').having('count(*) >= ?', ['1', '2'].reject(&:blank?).size).
This produces: => #<ActiveRecord::Relation []>, as desired.
I then checked to see what would happen if I were to do: #listings.joins(:spaces).
This produces: => #<ActiveRecord::Relation [#<Listing id: 1, title: "Test 1", address: "New Inn Passage, London WC2A 2AE, UK", latitude: 51.5139664, longitude: -0.1167323, size: 1000, min_lease: 1, price: #<BigDecimal:7ffcb02ce890,'0.1E4',9(18)>, description: "Test 1", user_id: 3, state: "public", created_at: "2015-10-30 17:37:04", updated_at: "2015-10-30 17:37:04">]>, even though #listings was initially [].
So the problem has to do with the joins(:spaces) in the second filter.
In order to make sure that #listings remains [] in the event that that is the result of the first filter, I added the extra condition && #listings.present? to the second filter, yielding:
if params[:search][:space_ids].present? && params[:search][:space_ids].reject(&:blank?).size > 0 && #listings.present?
That extra condition prevents the second filter from being executed and returning results that should not be returned.
This feels like an ugly hack, and I would welcome better solutions, but it seems to work.

How do I stop these PG::ProtocolViolation errors?

My method is:
class Survey < ActiveRecord::Base
def create_matching_batteries
unless inactive?
update_column :battery_id, nil unless battery
#battery = Battery.where(:review_id => review.id, :question_id => question.id).first_or_create
#battery.surveys << self
end
end
end
When I run #survey.create_matching_batteries, I get this:
Survey Load (5.2ms) SELECT "surveys".* FROM "surveys" WHERE "surveys"."competitor_id" = $1 AND "surveys"."id" = $1 ORDER BY "surveys"."id" ASC LIMIT 1 [["competitor_id", 248], ["id", 15183]]
D, [2014-01-21T22:28:08.830446 #3392] DEBUG -- : Survey Load (5.2ms) SELECT "surveys".* FROM "surveys" WHERE "surveys"."competitor_id" = $1 AND "surveys"."id" = $1 ORDER BY "surveys"."id" ASC LIMIT 1 [["competitor_id", 248], ["id", 15183]]
PG::ProtocolViolation: ERROR: bind message supplies 2 parameters, but prepared statement "a15" requires 1
: SELECT "surveys".* FROM "surveys" WHERE "surveys"."competitor_id" = $1 AND "surveys"."id" = $1 ORDER BY "surveys"."id" ASC LIMIT 1
E, [2014-01-21T22:28:08.830528 #3392] ERROR -- : PG::ProtocolViolation: ERROR: bind message supplies 2 parameters, but prepared statement "a15" requires 1
: SELECT "surveys".* FROM "surveys" WHERE "surveys"."competitor_id" = $1 AND "surveys"."id" = $1 ORDER BY "surveys"."id" ASC LIMIT 1
(0.3ms) ROLLBACK
D, [2014-01-21T22:28:08.833655 #3392] DEBUG -- : (0.3ms) ROLLBACK
ActiveRecord::StatementInvalid: PG::ProtocolViolation: ERROR: bind message supplies 2 parameters, but prepared statement "a15" requires 1
: SELECT "surveys".* FROM "surveys" WHERE "surveys"."competitor_id" = $1 AND "surveys"."id" = $1 ORDER BY "surveys"."id" ASC LIMIT 1
from /Users/steven/.rvm/gems/ruby-2.1.0/gems/activerecord-4.0.2/lib/active_record/connection_adapters/postgresql_adapter.rb:786:in `get_last_result'
In "Railspeak" not "Postgrespeak", what does ActiveRecord::StatementInvalid: PG::ProtocolViolation: ERROR: bind message supplies 2 parameters, but prepared statement "a15" requires 1 mean? And how can it help me to debug my method?
My environment:
$ rails -v
Rails 4.0.2
$ ruby -v
ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]
$ psql --version
psql (PostgreSQL) 9.3.1
Disable Prepared Statements
production:
adapter: postgresql
prepared_statements: false
checkout http://edgeguides.rubyonrails.org/configuring.html
I experienced the same problem, but after counter checking my code, I found out that I had used the same variables in two different fields that is, I had written fieldA=$2 and also fieldB=$2. After correction everything worked fine.
My problem was I used quotes around variable
quiz_subject = subject_id WHERE subject_name = '$1';
I removed those quotes
quiz_subject = subject_id WHERE subject_name = $1;

Resources