I am building a Rails backend to an iPhone app.
After profiling my application, I have found the following call to be especially expensive in terms of performance:
#messages.as_json
This call returns about 30 message objects, each including many child records. As you can see, a single message json response may make many DB calls to be composed:
def as_json(options={})
super(:only => [...],
:include => {
:user => {...},
:checkin => {...}
}},
:likes => {:only => [...],
:include => { :user => {...] }}},
:comments => {:only => [...],
:include => { :user => {:only => [...] }}}
},
:methods => :top_highlight)
end
On average the #messages.as_jsoncall (all 30 objects) takes almost 1100ms.
Wanting to optimize I've employed memcached. With the solution below, when all my message objects are in cache, average response is now 200-300ms. I'm happy with this, but the issue I have is that this has made cache miss scenarios even slower. In cases where nothing is in cache, it now takes over 2000ms to compute.
# Note: #messages has the 30 message objects in it, but none of the child records have been grabbed
#messages.each_with_index do |m, i|
#messages[i] = Rails.cache.fetch("message/#{m.id}/#{m.updated_at.to_i}") do
m.as_json
end
end
I understand that there will have to be some overhead to check the cache for each object. But I'm guessing there is a more efficient way to do it than the way I am now, which is basically serially, one-by-one. Any pointers on making this more efficient?
I believe Rails.cache uses the ActiveSupport::Cache::Store interface, which has a read_multi method for this exact purpose. [1]
I think swapping out fetch for read_multi will improve your performance because ActiveSupport::Cache::MemCacheStore has an optimized implementation of read_multi. [2]
Code
Here's the updated implementation:
keys = #messages.collect { |m| "message/#{m.id}/#{m.updated_at.to_i}" }
hits = Rails.cache.read_multi(*keys)
keys.each_with_index do |key, i|
if hits.include?(key)
#messages[i] = hits[key]
else
Rails.cache.write(key, #messages[i] = #messages[i].as_json)
end
end
The cache writes are still performed synchronously with one round trip to the cache for each miss. If you want to cut down on that overhead, look into running background code asynchronously with something like workling.
Be careful that the overhead of starting the asynchronous job is actually less than the overhead of Rails.cache.write before you start expanding your architecture.
Memcached Multi-Set
It looks like the Memcached team has at least considered providing Multi-Set (batch write) commands, but there aren't any ActiveSupport interfaces for it yet and it's unclear what level of support is provided by implementations. [3]
As of Rails 4.1, you can now do fetch_multi and pass in a block.
http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch_multi
keys = #messages.collect { |m| "message/#{m.id}/#{m.updated_at.to_i}" }
hits = Rails.cache.fetch_multi(*keys) do |key|
#messages[i] = #messages[i].as_json
end
Note: if you're setting many items, you may want to consider writing to the cache in some sort of background worker.
Related
We're recently upgraded our stack from Rails 3.22.5 to Rails 4.2.8. We have both a big CRUD application in Rails, and a Sinatra API app. They both are on the same Ruby versions (2.4.1), the same ActiveRecord and ActiveSupport versions, and share the same database (Postgresql) and all of the model files.
We've also upgraded the Ruby versions, but switching back to the old Ruby version did not change anything, so I'll keep talking about the setup as-is.
It looks like, since the upgrade, our API responses have become 40%-50% slower in the worst cases (the Rails app responses have also slowed, but not as much). This example looks at the worst case, where we query for 100-200 "Message" records which contain a quite deep eager loaded hierarchy of associated models, using also ActsAsList and STI relations. Here is how they're queried:
messages = property.
messages.
visible(property, opts[:days]).
includes(:translations).
includes(:gallery_images).
includes(:place).
includes(:property => :places).
includes(:address => [:translations, template: :translations]).
includes(:tags).
includes(:interests => :interest_disablings).
includes(:template => [
:translations, {:interests => :interest_disablings}, :gallery_images, :property, :tags, {:address => [:translations, template: :translations]},
{:template => [:translations, {:interests => :interest_disablings}, :gallery_images, :property, :tags, {:address => :translations}]}
]).
includes({current_availability: [:property, :timing, :capacity, { custom_booking_fields: :translations }]}).
includes(:availabilities).all.to_a
(The two nested :template references go to the same "messages" table, messages have two levels of "parent" templates. The "translations" come in via the Globalize gem)
Both this part (after which no more SQL calls are being made) and later processing (Hash/Array and Time/Timezone manipulation to build up an API response to be serialized to JSON) have slowed down.
When looking at RubyProf profiles before and after, I can see no obvious bottlenecks, except that it seems like there is more Arel/Association code happening now, but can't he sure of it.
I can also see similar slowdown by simply benchmarking this:
rails3:
>> Benchmark.measure { 1000.times { Message.find(187812) } }.total
=> 0.29000000000000026
rails4:
>> Benchmark.measure { 1000.times { Message.find(187812) } }.total
=> 0.5200000000000005
I can provide the profilings and any benchmarks, if needed.
Is this something that is known, i.e. were there tradeoffs being made in Rails and I'm now hitting a too-many-records case?
Update: I created blank Ruby on Rails apps each for the old (Ruby 2.3.1/Rails 3.2.22.5) and the new setup (Ruby 2.4.1/Rails 4.2.8), and copied the database.yml over to it so the blank apps use the same DB. Using also bare-bones model files (the originals have a lot of code), I still see a difference:
2.3.1 :009 > ActiveRecord::Base.logger = nil; Benchmark.measure { 1000.times { Message.find(133147) } }.total
=> 0.29000000000000004
2.4.1 :009 > ActiveRecord::Base.logger = nil; Benchmark.measure { 1000.times { Message.find(133147) } }.total
=> 0.41000000000000014
I'm migrating our app from 3.0 to 3.2.x. Earlier the streaming was done by the assigning the response_body a proc. Like so:
self.response_body = proc do |response, output|
target_obj = StreamingOutputWrapper.new(output)
lib_obj.xml_generator(target_obj)
end
As you can imagine, the StreamingOutputWrapper responds to <<.
This way is deprecated in Rails 3.2.x. The suggested way is to assign an object that responds to each.
The problem I'm facing now is in making the lib_obj.xml_generator each-aware.
The current version of it looks like this:
def xml_generator(target, conditions = [])
builder = Builder::XmlMarkup.new(:target => target)
builder.root do
builder.elementA do
Model1.find_each(:conditions => conditions) { |model1| target << model1.xml_chunk_string }
end
end
end
where target is a StreamingOutputWrapper object.
The question is, how do I modify the code - the xml_generator, and the controller code, to make the response xml stream properly.
Important stuff: Building the xml in memory is not an option as the model records are huge. The typical size of the xml response is around 150MB.
What you are looking for is SAX Parsing. SAX reads files "chunks" at a time instead of loading the whole file into DOM. This is super convenient and fortunately there are a lot of people before you who have wanted to do the same thing. Nokogiri offers XML::SAX methods, but it can get really confusing in the disastrous documentation and syntactically, it's a mess. I would suggest looking into something that sits on top of Nokogiri and makes getting your job done, a lot more simple.
Here are a few options -
SAX_stream:
Mapping out objects in sax_stream is super simple:
require 'sax_stream/mapper'
class Product
include SaxStream::Mapper
node 'product'
map :id, :to => '#id'
map :status, :to => '#status'
map :name_confirmed, :to => 'name/#confirmed'
map :name, :to => 'name'
end
and calling the parser in is also simple:
require 'sax_stream/parser'
require 'sax_stream/collectors/naive_collector'
collector = SaxStream::Collectors::NaiveCollector.new
parser = SaxStream::Parser.new(collector, [Product])
parser.parse_stream(File.open('products.xml'))
However, working with the collectors (or writing your own) and end up being slightly confusing, so I would actually go with:
Saxerator:
Saxerator gets the job doen and has some really handy methods for traversing into nodes that can be a little less complex than sax_stream. Saxerator also has a few really great configuration options that are well documented. Simple Saxerator example below:
parser = Saxerator.parser(File.new("rss.xml"))
parser.for_tag(:item).each do |item|
# where the xml contains <item><title>...</title><author>...</author></item>
# item will look like {'title' => '...', 'author' => '...'}
puts "#{item['title']}: #{item['author']}"
end
# a String is returned here since the given element contains only character data
puts "First title: #{parser.for_tag(:title).first}"
If you end up having to pull the XML from an external source (or it is getting updated frequently and do you don't want to have to update the version on your server manually, check out THIS QUESTION and the accepted answer, it works great.
You could always monkey-patch the response object:
response.stream.instance_eval do
alias :<< :write
end
builder = Builder::XmlMarkup.new(:target => response.stream)
...
So I am developing a rails app, and I am working on paginating the feed. While I was doing it I wondered if I was doing it the right way because my load times were over 1500ms. My code was:
stories = Story.feed
#stories = Kaminari.paginate_array(stories).page(params[:page]).per(params[:pageSize])
I have a few questions about this:
Should I be paginating Story.feed, or is there some sort of method
that only returns some the stories I need?
Is this load time normal?
What are other things I can be doing to optimize this
(Also, Story.feed returns an array of story objects. The code for that is here:
def self.feed
rawStories = Story.includes([:likes, :viewers, :user, :storyblocks]).all
newFeaturedStories = rawStories.where(:featured => true).where(:updated_at.gte => (Date.today - 3)).desc(:created_at).entries
normalStories = rawStories.not_in(:featured => true, :or => [:updated_at.gte => (Date.today - 3)]).desc(:created_at).entries
newFeaturedStories.entries.concat(normalStories.entries)
end
I am using mongoid and mongodb
The issue is that you get all feeds from db in an array and this takes long time.
I suggest you use the any_of query from this great gem.
From there, do:
def self.feed_stories
newFeaturedStories = Story.where(:featured => true).where(:updated_at.gte => (Date.today - 3.days))
normalStories = Story.not_in(:featured => true, :or => [:updated_at.gte => (Date.today - 3.days)])
Story.includes([:likes, :viewers, :user, :storyblocks]).any_of(newFeaturedStories, normalStories).desc(:created_at)
end
Then paginate this:
selected_stories = Story.feed_stories.per(page_size).page(page)
Dont really understand what are your entries but get them at this moment.
To sum up: the idea s to make a unique paginated db query.
I suspect that when you call Kaminari.paginate_array on an ActiveRecord::Relation, it causes the whole result set to be fetched from DB and loaded in memory similar to calling Model.all.to_a.
To avoid this, I'd first find a way to turn Story.feed into a scope, rather than a class method. Superficially they'll seem the same—the differences are subtle but deep. See Active Record scopes vs class methods.
Next, ditch paginate_array in favor of chain Kaminari's page() and per() scopes.
For example (simplified version of yours):
class Article < ActiveRecord::Base
scope :featured, -> { where(featured: true) }
scope :last_3_days, -> { where(:updated_at.gte => (Date.today - 3)).desc(:created_at) }
scope :feed, -> { featured.last_3_days }
And then paginate simply by going:
Article.feed.per(page_size).page(page)
The biggest advantage of this is that Kaminari can chain into the generated SQL inserting the proper LIMIT and OFFSET clauses thereby reducing the size of the result set returned to only what needs to be displayed, as opposed to returning every matching record.
I think Will Paginate will help you out here -> mislav/will_paginate.
From there you can simply give your controller action .per_page(20) for example and after 20 objects (you can define the objects, see the wiki) there will be pagination
What's the best way to test scopes in Rails 3. In rails 2, I would do something like:
Rspec:
it 'should have a top_level scope' do
Category.top_level.proxy_options.should == {:conditions => {:parent_id => nil}}
end
This fails in rails 3 with a "undefined method `proxy_options' for []:ActiveRecord::Relation" error.
How are people testing that a scope is specified with the correct options? I see you could examine the arel object and might be able to make some expectations on that, but I'm not sure what the best way to do it would be.
Leaving the question of 'how-to-test' aside... here's how to achieve similar stuff in Rails3...
In Rails3 named scopes are different in that they just generate Arel relational operators.
But, investigate!
If you go to your console and type:
# All the guts of arel!
Category.top_level.arel.inspect
You'll see internal parts of Arel. It's used to build up the relation, but can also be introspected for current state. You'll notice public methods like #where_clauses and such.
However, the scope itself has a lot of helpful introspection public methods that make it easier than directly accessing #arel:
# Basic stuff:
=> [:table, :primary_key, :to_sql]
# and these to check-out all parts of your relation:
=> [:includes_values, :eager_load_values, :preload_values,
:select_values, :group_values, :order_values, :reorder_flag,
:joins_values, :where_values, :having_values, :limit_value,
:offset_value, :readonly_value, :create_with_value, :from_value]
# With 'where_values' you can see the whole tree of conditions:
Category.top_level.where_values.first.methods - Object.new.methods
=> [:operator, :operand1, :operand2, :left, :left=,
:right, :right=, :not, :or, :and, :to_sql, :each]
# You can see each condition to_sql
Category.top_level.where_values.map(&:to_sql)
=> ["`categories`.`parent_id` IS NULL"]
# More to the point, use #where_values_hash to see rails2-like :conditions hash:
Category.top_level.where_values_hash
=> {"parent_id"=>nil}
Use this last one: #where_values_hash to test scopes in a similar way to #proxy_options in Rails2....
Ideally your unit tests should treat models (classes) and instances thereof as black boxes. After all, it's not really the implementation you care about but the behavior of the interface.
So instead of testing that the scope is implemented in a particular way (i.e. with a particular set of conditions), try testing that it behaves correctly—that it returns instances it should and doesn't return instances it shouldn't.
describe Category do
describe ".top_level" do
it "should return root categories" do
frameworks = Category.create(:name => "Frameworks")
Category.top_level.should include(frameworks)
end
it "should not return child categories" do
frameworks = Category.create(:name => "Frameworks")
rails = Category.create(:name => "Ruby on Rails", :parent => frameworks)
Category.top_level.should_not include(rails)
end
end
end
If you write your tests in this way, you'll be free to re-factor your implementations as you please without needing to modify your tests or, more importantly, without needing to worry about unknowingly breaking your application.
This is how i check them. Think of this scope :
scope :item_type, lambda { |item_type|
where("game_items.item_type = ?", item_type )
}
that gets all the game_items where item_type equals to a value(like 'Weapon') :
it "should get a list of all possible game weapons if called like GameItem.item_type('Weapon'), with no arguments" do
Factory(:game_item, :item_type => 'Weapon')
Factory(:game_item, :item_type => 'Gloves')
weapons = GameItem.item_type('Weapon')
weapons.each { |weapon| weapon.item_type.should == 'Weapon' }
end
I test that the weapons array holds only Weapon item_types and not something else like Gloves that are specified in the spec.
Don't know if this helps or not, but I'm looking for a solution and ran across this question.
I just did this and it works for me
it { User.nickname('hello').should == User.where(:nickname => 'hello') }
FWIW, I agree with your original method (Rails 2). Creating models just for testing them makes your tests way too slow to run in continuous testing, so another approach is needed.
Loving Rails 3, but definitely missing the convenience of proxy_options!
Quickly Check the Clauses of a Scope
I agree with others here that testing the actual results you get back and ensuring they are what you expect is by far the best way to go, but a simple check to ensure that a scope is adding the correct clause can also be useful for faster tests that don't hit the database.
You can use the where_values_hash to test where conditions. Here's an example using Rspec:
it 'should have a top_level scope' do
Category.top_level.where_values_hash.should eq {"parent_id" => nil}
end
Although the documentation is very slim and sometimes non-existent, there are similar methods for other condition-types, such as:
order_values
Category.order(:id).order_values
# => [:id]
select_values
Category.select(:id).select_values
# => [:id]
group_values
Category.group(:id).group_values
# => [:id]
having_values
Category.having(:id).having_values
# => [:id]
etc.
Default Scope
For default scopes, you have to handle them a little differently. Check this answer out for a better explanation.
This is a snippet of code from an update method in my application. The method is POSTed an array of user id's in params[:assigned_ users_ list_ id]
The idea is to synchronise the DB associations entries with the ones that were just submitted, by removing the right ones (those that exist in the DB but not the list) and adding the right ones (vise-versa).
#list_assigned_users = User.find(:all, :conditions => { :id => params[:assigned_users_list_id]})
#assigned_users_to_remove = #task.assigned_users - #list_assigned_users
#assigned_users_to_add = #list_assigned_users - #task.assigned_users
#assigned_users_to_add.each do |user|
unless #task.assigned_users.include?(user)
#task.assigned_users << user
end
end
#assigned_users_to_remove.each do |user|
if #task.assigned_users.include?(user)
#task.assigned_users.delete user
end
end
It works - great!
My first questions is, are those 'if' and 'unless' statements totally redundant, or is it prudent to leave them in place?
My next question is, I want to repeat this exact code immediately after this, but with 'subscribed' in place of 'assigned'... To achieve this I just did a find & replace in my text editor, leaving me with almost this code in my app twice. That's hardly in keeping with the DRY principal!
Just to be clear, every instance of the letters 'assigned' becomes 'subscribed'. It is passed params[:subscribed_ users_ list_ id], and uses #task.subscribed_ users.delete user etc...
How can I repeat this code without repeating it?
Thanks as usual
You don't need if and unless statements.
As for the repetition you can make array of hashes representing what you need.
Like this:
[
{ :where_clause => params[:assigned_users_list_id], :user_list => #task.assigned_users} ,
{ :where_clause => params[:subscribed_users_list_id], :user_list => #task.subscribed_users}
] each do |list|
#list_users = User.find(:all, :conditions => { :id => list[:where_clause] })
#users_to_remove = list[:user_list] - #list_users
#users_to_add = #list_users - list[:user_list]
#users_to_add.each do |user|
list[:user_list] << user
end
#users_to_remove.each do |user|
list[:user_list].delete user
end
end
My variable names are not the happiest choice so you can change them to improve readability.
I seem to be missing something here, but aren't you just doing this?
#task.assigned_users = User.find(params[:assigned_users_list_id])