Speeding up large data graph query and rendering with AMS - ruby-on-rails

I have a query that spans multiple tables which in the end uses Active Model Serializers to render the data. Currently a large portion of the time is spent in the serializers since I am forced to query some of the data from within the serializer itself. I want to find a way to speed this up, and that may be not using AMS (this is okay).
My data model is as follows:
Location
-> Images
-> Recent Images
-> Days Images
Image
-> User
The recent_images and days_images are the same as the images but with a scope to do a where to filter by days and limit to 6.
Currently this whole process takes about 15 seconds locally and 2-4 seconds in production. I feel like I can perform this much quicker but am not entirely sure how I can modify my code.
The query to fetch the Locations is:
ids = #company
.locations
.active
.has_image_taken
.order(last_image_taken: :desc)
.page(page)
.per(per_page)
.pluck(:id)
Location.fetch_multi(ids)
fetch_multi is from the identity_cache gem. These results then hit the serializer which is:
class V1::RecentLocationSerializer < ActiveModel::Serializer
has_many :recent_images, serializer: V1::RecentImageSerializer do
if scope[:view_user_photos]
object.fetch_recent_images.take(6)
else
ids = object.recent_images.where(user_id: scope[:current_user].id).limit(6).pluck(:id)
Image.fetch_multi(ids)
end
end
has_many :days_images do
if scope[:view_user_photos]
object.fetch_days_images
else
ids = object.days_images.where(user_id: scope[:current_user].id).pluck(:id)
Image.fetch_multi(ids)
end
end
end
The scopes for recent and days images is:
scope :days_images, -> { includes(:user).active.where('date_uploaded > ?', DateTime.now.beginning_of_day).ordered_desc_by_date }
scope :recent_images, -> { includes(:user).active.ordered_desc_by_date }
My question is if you think I need to ditch AMS so I don't have to query in the serializer, and if so, how would you recommend to render this?

I may have missed the point here - what part is slow? What do the logs look like? Are you missing a db index? I don't see a lot of joins in there, so maybe you just need an index on date_uploaded (and maybe user_id). I don't see anything in there that is doing a bunch of serializing.
You can easily speed up the sql in many ways - like, a trigger (in ruby or sql) that updates the image with a user_active boolean so you can dump that join. (you'd include that in the index)
OR build a cached table just with the IDs in it. Kind of like an inverted index (I'd also do it in redis, but that's me) that has a row for each user/location that is updated when an image is uploaded.
Push the work to the image upload rather than where it is now, viewing the list.

Related

How to add attribute/property to each record/object in an array? Rails

I'm not sure if this is just a lacking of the Rails language, or if I am searching all the wrong things here on Stack Overflow, but I cannot find out how to add an attribute to each record in an array.
Here is an example of what I'm trying to do:
#news_stories.each do |individual_news_story|
#user_for_record = User.where(:id => individual_news_story[:user_id]).pluck('name', 'profile_image_url');
individual_news_story.attributes(:author_name) = #user_for_record[0][0]
individual_news_story.attributes(:author_avatar) = #user_for_record[0][1]
end
Any ideas?
If the NewsStory model (or whatever its name is) has a belongs_to relationship to User, then you don't have to do any of this. You can access the attributes of the associated User directly:
#news_stories.each do |news_story|
news_story.user.name # gives you the name of the associated user
news_story.user.profile_image_url # same for the avatar
end
To avoid an N+1 query, you can preload the associated user record for every news story at once by using includes in the NewsStory query:
NewsStory.includes(:user)... # rest of the query
If you do this, you won't need the #user_for_record query — Rails will do the heavy lifting for you, and you could even see a performance improvement, thanks to not issuing a separate pluck query for every single news story in the collection.
If you need to have those extra attributes there regardless:
You can select them as extra attributes in your NewsStory query:
NewsStory.
includes(:user).
joins(:user).
select([
NewsStory.arel_table[Arel.star],
User.arel_table[:name].as("author_name"),
User.arel_table[:profile_image_url].as("author_avatar"),
]).
where(...) # rest of the query
It looks like you're trying to cache the name and avatar of the user on the NewsStory model, in which case, what you want is this:
#news_stories.each do |individual_news_story|
user_for_record = User.find(individual_news_story.user_id)
individual_news_story.author_name = user_for_record.name
individual_news_story.author_avatar = user_for_record.profile_image_url
end
A couple of notes.
I've used find instead of where. find returns a single record identified by it's primary key (id); where returns an array of records. There are definitely more efficient ways to do this -- eager-loading, for one -- but since you're just starting out, I think it's more important to learn the basics before you dig into the advanced stuff to make things more performant.
I've gotten rid of the pluck call, because here again, you're just learning and pluck is a performance optimization useful when you're working with large amounts of data, and if that's what you're doing then activerecord has a batch api you should look into.
I've changed #user_for_record to user_for_record. The # denote instance variables in ruby. Instance variables are shared and accessible from any instance method in an instance of a class. In this case, all you need is a local variable.

Removing an item from an in-memory collection

I have a collection that contains a class like:
locations = Location.all
class Location < ActiveRecord::Base
end
The location class has a property: code
I wan to remove an item from the collection if code == "unused".
How many different ways can I do this in ruby?
I am currently doing this:
locations = Location.all.select { |l| l.code != "unused" }
This works great but just wondering what other ways I could do this just for learning purposes (if there big performance advantages in another way that would be good to know also).
Update
Please ignore the fact that I am loading my collection initially from the database, that wasn't the point. I want to learn how to remove things in-memory not simple where clauses :)
You can simply fetch records from your database what you need:
Rails 4 onwards:
locations = Location.where.not(code: "unused")
Before Rails 4:
locations = Location.where("code != ?", "unused")
If you have a collection and you want to reject some items from it, then you can try this:
locations.reject! {|location| location.code != "unused"}
You are doing this the wrong way. In your case, you are retrieving all records from DB and getting an array of records. Then you are looking for records you need in the array. Instead, you should get the records directly from DB:
Location.where("code != 'unused'")
# or in Rails 4 and latest
Location.where.not(code: "unused")
If you need to remove records from DB, you can do it like this:
Location.where.not(code: "unused").destroy_all
If you just want to know what is the best way to remove elements from an existing array, I think you are on the right track. Besides select there are reject, reject!, delete_if methods. You can learn more about them in the documentation http://ruby-doc.org/core-2.3.1/Array.html
There is a related post that might give more information: Ruby .reject! vs .delete_if

Rails / Postgres Lookup Performance

I have a status dashboard that shows the status of remote hardware devices that 'ping' the application every minute and log their status.
class Sensor < ActiveRecord::Base
has_many :logs
def most_recent_log
logs.order("id DESC").first
end
end
class Log < ActiveRecord::Base
belongs_to :sensor
end
Given I'm only interested in showing the current status, the dashboard only shows the most recent log for all sensors. This application has been running for a long time now and there are tens of millions of Log records.
The problem I have is that the dashboard takes around 8 seconds to load. From what I can tell, this is largely because there is an N+1 Query fetching these logs.
Completed 200 OK in 4729.5ms (Views: 4246.3ms | ActiveRecord: 480.5ms)
I do have the following index in place:
add_index "logs", ["sensor_id", "id"], :name => "index_logs_on_sensor_id_and_id", :order => {"id"=>:desc}
My controller / lookup code is the following:
class SensorsController < ApplicationController
def index
#sensors = Sensor.all
end
end
How do I make the load time reasonable?
Is there a way to avoid the N+1 and reload this?
I had thought of putting a latest_log_id reference on to Sensor and then updating this every time a new log for that sensor is posted - but something in my head is telling me that other developers would say this is a bad thing. Is this the case?
How are problems like this usually solved?
There are 2 relatively easy ways to do this:
Use ActiveRecord eager loading to pull in just the most recent logs
Roll your own mini eager loading system (as a Hash) for just this purpose
Basic ActiveRecord approach:
subquery = Log.group(:sensor_id).select("MAX('id')")
#sensors = Sensor.eager_load(:logs).where(logs: {id: subquery}).all
Note that you should NOT use your most_recent_log method for each sensor (that will trigger an N+1), but rather logs.first. Only the latest log for each sensor will actually be prefetched in the logs collection.
Rolling your own may be more efficient from a SQL perspective, but more complex to read and use:
#sensors = Sensor.all
logs = Log.where(id: Log.group(:sensor_id).select("MAX('id')"))
#sensor_logs = logs.each_with_object({}){|log, hash|
hash[log.sensor_id] = log
}
#sensor_logs is a Hash, permitting a fast lookup for the latest log by sensor.id.
Regarding your comment about storing the latest log id - you are essentially asking if you should build a cache. The answer would be 'it depends'. There are many advantages and many disadvantages to caching, so it comes down to 'is the benefit worth the cost'. From what you are describing, it doesn't appear that you are familiar with the difficulties they introduce (Google 'cache invalidation') or if they are applicable in your case. I'd recommend against it until you can demonstrate that a) it is adding real value over a non-cache solution, and b) it can be safely applied for your scenario.
There's 3 options:
eager loading
joining
caching the current status
--
is explained by PinnyM
You can do a join from the Sensor just to the latest Log record for each row, so everything gets fetched in the one query. Not sure off hand how that'll perform with the number of rows you have, likely it'll still be slower than you want.
The thing you mentioned - caching the latest_log_id (or even caching just the latest_status if that's all you need for the dashboard) is actually OK. It's called denormalization and it's a useful thing if used carefully. You've likely come across "counter cache" plugins for rails which are in the same vein - duplicating data, in the interests of being able to optimise read performance.

ActiveRecord scope cache for remote attribute?

I have a large project, there are several separated db instance, database name and different table names, all need to be processed in single one Sinatra controller.
Something like this
class ItemLog < ActiveRecord::Base
def address
http_get('http://x.com/get_address_by_item_type?id=' + self.type_id).to_json['address']
end
end
so if I iterated through a long list of ItemLog, the http get called multiple times per each ItemLog.
But since ItemLog.type_id is a limited set, but it's saved in a remote db, It's ideal that for each unique type_id, we only need to issue exactly one http request.
I know how to declare a standalone hash for caching like this:
address_cache = Hash[ItemLog.where(...).limit(500).pluck(:type_id).uniq.map {|type_id| [type_id, get_address(type_id)] }]
ItemLog.where(...).limit(500).each { |item| item.address = address_cache[item.type_id }
My question is is there anyway to write more elegant code for this? Something like "scope cache" works like a {item_type_id: address_string} pairs, but directly in a encapsulated ActiveRecord model

Ruby on Rails - ActiveRecord::Relation count method is wrong?

I'm writing an application that allows users to send one another messages about an 'offer'.
I thought I'd save myself some work and use the Mailboxer gem.
I'm following a test driven development approach with RSpec. I'm writing a test that should ensure that only one Conversation is allowed per offer. An offer belongs_to two different users (the user that made the offer, and the user that received the offer).
Here is my failing test:
describe "after a message is sent to the same user twice" do
before do
2.times { sending_user.message_user_regarding_offer! offer, receiving_user, random_string }
end
specify { sending_user.mailbox.conversations.count.should == 1 }
end
So before the test runs a user sending_user sends a message to the receiving_user twice. The message_user_regarding_offer! looks like this:
def message_user_regarding_offer! offer, receiver, body
conversation = offer.conversation
if conversation.nil?
self.send_message(receiver, body, offer.conversation_subject)
else
self.reply_to_conversation(conversation, body)
# I put a binding.pry here to examine in console
end
offer.create_activity key: PublicActivityKeys.message_received, owner: self, recipient: receiver
end
On the first iteration in the test (when the first message is sent) the conversation variable is nil therefore a message is sent and a conversation is created between the two users.
On the second iteration the conversation created in the first iteration is returned and the user replies to that conversation, but a new conversation isn't created.
This all works, but the test fails and I cannot understand why!
When I place a pry binding in the code in the location specified above I can examine what is going on... now riddle me this:
self.mailbox.conversations[0] returns a Conversation instance
self.mailbox.conversations[1] returns nil
self.mailbox.conversations clearly shows a collection containing ONE object.
self.mailbox.conversations.count returns 2?!
What is going on there? the count method is incorrect and my test is failing...
What am I missing? Or is this a bug?!
EDIT
offer.conversation looks like this:
def conversation
Conversation.where({subject: conversation_subject}).last
end
and offer.conversation_subject:
def conversation_subject
"offer-#{self.id}"
end
EDIT 2 - Showing the first and second iteration in pry
Also...
Conversation.all.count returns 1!
and:
Conversation.all == self.mailbox.conversations returns true
and
Conversation.all.count == self.mailbox.conversations.count returns false
How can that be if the arrays are equal? I don't know what's going on here, blown hours on this now. Think it's a bug?!
EDIT 3
From the source of the Mailboxer gem...
def conversations(options = {})
conv = Conversation.participant(#messageable)
if options[:mailbox_type].present?
case options[:mailbox_type]
when 'inbox'
conv = Conversation.inbox(#messageable)
when 'sentbox'
conv = Conversation.sentbox(#messageable)
when 'trash'
conv = Conversation.trash(#messageable)
when 'not_trash'
conv = Conversation.not_trash(#messageable)
end
end
if (options.has_key?(:read) && options[:read]==false) || (options.has_key?(:unread) && options[:unread]==true)
conv = conv.unread(#messageable)
end
conv
end
The reply_to_convesation code is available here -> http://rubydoc.info/gems/mailboxer/frames.
Just can't see what I'm doing wrong! Might rework my tests to get around this. Or ditch the gem and write my own.
see this Rails 3: Difference between Relation.count and Relation.all.count
In short Rails ignores the select columns (if more than one) when you apply count to the query. This is because
SQL's COUNT allows only one or less columns as parameters.
From Mailbox code
scope :participant, lambda {|participant|
select('DISTINCT conversations.*').
where('notifications.type'=> Message.name).
order("conversations.updated_at DESC").
joins(:receipts).merge(Receipt.recipient(participant))
}
self.mailbox.conversations.count ignores the select('DISTINCT conversations.*') and counts the join table with receipts, essentially counting number of receipts with duplicate conversations in it.
On the other hand, self.mailbox.conversations.all.count first gets the records applying the select, which gets unique conversations and then counts it.
self.mailbox.conversations.all == self.mailbox.conversations since both of them query the db with the select.
To solve your problem you can use sending_user.mailbox.conversations.all.count or sending_user.mailbox.conversations.group('conversations.id').length
I have tended to use the size method in my code. As per the ActiveRecord code, size will use a cached count if available and also returns the correct number when models have been created through relations and have not yet been saved.
# File activerecord/lib/active_record/relation.rb, line 228
def size
loaded? ? #records.length : count
end
There is a blog on this here.
In Ruby, #length and #size are synonyms and both do the same thing: they tell you how many elements are in an array or hash. Technically #length is the method and #size is an alias to it.
In ActiveRecord, there are several ways to find out how many records are in an association, and there are some subtle differences in how they work.
post.comments.count - Determine the number of elements with an SQL COUNT query. You can also specify conditions to count only a subset of the associated elements (e.g. :conditions => {:author_name => "josh"}). If you set up a counter cache on the association, #count will return that cached value instead of executing a new query.
post.comments.length - This always loads the contents of the association into memory, then returns the number of elements loaded. Note that this won't force an update if the association had been previously loaded and then new comments were created through another way (e.g. Comment.create(...) instead of post.comments.create(...)).
post.comments.size - This works as a combination of the two previous options. If the collection has already been loaded, it will return its length just like calling #length. If it hasn't been loaded yet, it's like calling #count.
It is also worth mentioning to be careful if you are not creating models through associations, as the related model will not necessarily have those instances in its association proxy/collection.
# do this
mailbox.conversations.build(attrs)
# or this
mailbox.conversations << Conversation.new(attrs)
# or this
mailbox.conversations.create(attrs)
# or this
mailbox.conversations.create!(attrs)
# NOT this
Conversation.new(mailbox_id: some_id, ....)
I don't know if this explains what's going on, but the ActiveRecord count method queries the database for the number of records stored. The length of the Relation could be different, as discussed in http://archive.railsforum.com/viewtopic.php?id=6255, although in that example, the number of records in the database was less than the number of items in the Rails data structure.
Try
self.mailbox.conversations.reload; self.mailbox.conversations.count
or perhaps
self.mailbox.reload; self.mailbox.conversations.count
or, if neither of those work, just try reloading as many of the objects as possible to see if you can get it to work (self, mailbox, conversations, etc.).
My guess is that something is messed up between memory and the DB. This is definitely a really weird error though, might wanna put in an issue on Rails to see why this would be the case.
The result of mailbox.conversations is cached after the first call. To reload it write mailbox.conversations(true)

Resources