I'm doing some optimisation on my Rails (2.3.5) app, and can't seem to find an elegant way of benchmarking the filter chain. I'm ab testing the site with something like:
ab -n 200 -c 3 -i -k http://localtestingserver:80/test
/test is setup with nothing in the controller and nothing in the page, so it's just loading our default filter chain plus rendering the layout. I get an average of 86ms per request, fine.
When I disable the filters (skip_filter filter_chain) it drops to 37ms, and without the layout (render :layout => false) it drops to 16ms. Is there a way I can benchmark, perhaps with Benchmark.realtime, each function being loaded in the filter chain, before the controller is called (or indeed after)? Can I output even a list of all filters being called on a request?
Thanks,
Dan
Edit
I'm using the Hodel3000 logger and Oink so get output per request like:
Jan 27 17:56:55 testing rails[19611]: Memory usage: 98748 | PID: 19611
Jan 27 17:56:55 testing rails[19611]: Instantiation Breakdown: Total: 2 | Room: 1 | User: 1
Jan 27 17:56:55 testing rails[19611]: Completed in 240ms (View: 28, DB: 0) | 200 OK [/test]
I'd just like to better understand and profile what happens before the controller is called - I can profile the controllers themselves fine. Like where the extra 212ms is from in the above request. Obviously I could drop code into each of my own before_filters, but hoped there was a way to wrap every filter in one go (like ones from included gems, etc.).
The Performance Testing Rails Applications guide looks like a good place to start.
Related
I have a status dashboard that shows the status of remote hardware devices that 'ping' the application every minute and log their status.
class Sensor < ActiveRecord::Base
has_many :logs
def most_recent_log
logs.order("id DESC").first
end
end
class Log < ActiveRecord::Base
belongs_to :sensor
end
Given I'm only interested in showing the current status, the dashboard only shows the most recent log for all sensors. This application has been running for a long time now and there are tens of millions of Log records.
The problem I have is that the dashboard takes around 8 seconds to load. From what I can tell, this is largely because there is an N+1 Query fetching these logs.
Completed 200 OK in 4729.5ms (Views: 4246.3ms | ActiveRecord: 480.5ms)
I do have the following index in place:
add_index "logs", ["sensor_id", "id"], :name => "index_logs_on_sensor_id_and_id", :order => {"id"=>:desc}
My controller / lookup code is the following:
class SensorsController < ApplicationController
def index
#sensors = Sensor.all
end
end
How do I make the load time reasonable?
Is there a way to avoid the N+1 and reload this?
I had thought of putting a latest_log_id reference on to Sensor and then updating this every time a new log for that sensor is posted - but something in my head is telling me that other developers would say this is a bad thing. Is this the case?
How are problems like this usually solved?
There are 2 relatively easy ways to do this:
Use ActiveRecord eager loading to pull in just the most recent logs
Roll your own mini eager loading system (as a Hash) for just this purpose
Basic ActiveRecord approach:
subquery = Log.group(:sensor_id).select("MAX('id')")
#sensors = Sensor.eager_load(:logs).where(logs: {id: subquery}).all
Note that you should NOT use your most_recent_log method for each sensor (that will trigger an N+1), but rather logs.first. Only the latest log for each sensor will actually be prefetched in the logs collection.
Rolling your own may be more efficient from a SQL perspective, but more complex to read and use:
#sensors = Sensor.all
logs = Log.where(id: Log.group(:sensor_id).select("MAX('id')"))
#sensor_logs = logs.each_with_object({}){|log, hash|
hash[log.sensor_id] = log
}
#sensor_logs is a Hash, permitting a fast lookup for the latest log by sensor.id.
Regarding your comment about storing the latest log id - you are essentially asking if you should build a cache. The answer would be 'it depends'. There are many advantages and many disadvantages to caching, so it comes down to 'is the benefit worth the cost'. From what you are describing, it doesn't appear that you are familiar with the difficulties they introduce (Google 'cache invalidation') or if they are applicable in your case. I'd recommend against it until you can demonstrate that a) it is adding real value over a non-cache solution, and b) it can be safely applied for your scenario.
There's 3 options:
eager loading
joining
caching the current status
--
is explained by PinnyM
You can do a join from the Sensor just to the latest Log record for each row, so everything gets fetched in the one query. Not sure off hand how that'll perform with the number of rows you have, likely it'll still be slower than you want.
The thing you mentioned - caching the latest_log_id (or even caching just the latest_status if that's all you need for the dashboard) is actually OK. It's called denormalization and it's a useful thing if used carefully. You've likely come across "counter cache" plugins for rails which are in the same vein - duplicating data, in the interests of being able to optimise read performance.
I have an AngularJS app and there's one page in my application, only one, that is taking 2 minutes to load. It is loading a bit of data, but the data itself is only 700KB and I benchmarked the entire rails action starting from the beginning until right before the render and it only takes 15-20 seconds. But when I look at the actual network call, or I put a timer before the angular http post call and then one in the success, they both show the call taking almost 2 minutes. I can't figure out what's going on between the render and the success on angular that would be causing this extreme time difference. Does anyone know how I could further debug this or possibly know what could be causing this?
The rails action just does a couple big database calls, all optimized, then does some work on the data, then the data (which is already JSONified with to_json) is rendered out.
Rails action ends with Completed 200 OK in 20458ms (Views: 913.8ms | ActiveRecord: 139.6ms)
Edit: If I put a limit on my data it's almost instant, so it definitely has to do with the data. But I'm not sure what could be causing the minute and a half disconnect between when the rails action finished and the http post success begins.
Edit2: An ajax call takes an equal amount of time. So there must be an issue with how the data is being parses on the front end, not sure the best way to do this. Since there's an obvious issue between the render and the page getting it.
Turns out the issue was somehow the extremely complex hash my old coworker wrote. I think the whole thing was pretty unnecessary so I deleted all 90 lines of code where he built the hash from scratch and instead 3 lines.
I now have the two activerecord queries with the proper includes, and then wrote one render state on those activerecord objects with as_json with the proper include and only parameters and the only thing now loads in 25 seconds on development. I can only imagine it'll be faster in production/staging. I don't know why the hashes were so hard to render as json, but calling as_json on the active record objects within the render statement completely fixed my issue.
So I've been using MongoDB (with the Mongoid Ruby Gem) for a while now, and as our app has grown I've noticed requests taking longer and longer as my data has grown, here is what a typical request for my app looks like, but it takes about 500ms, just for the DB stuff.
Nothing special here just some controller stuff:
Started GET "/cities/san-francisco?date_range=past_week" for 127.0.0.1 at 2011-11-15 11:13:04 -0800
Processing by CitiesController#show as HTML
Parameters: {"date_range"=>"past_week", "id"=>"san-francisco"}
Then the queries run, but what I don't understand is that for every query that runs it performs a MONGODB dashboard_development['system.namespaces'].find({}) before actually running it! Why?
MONGODB dashboard_development['system.namespaces'].find({})
MONGODB dashboard_development['users'].find({:_id=>BSON::ObjectId('4e80e0090f6d2e306f000001')})
MONGODB dashboard_development['system.namespaces'].find({})
MONGODB dashboard_development['cities'].find({:slug=>"san-francisco"})
MONGODB dashboard_development['system.namespaces'].find({})
MONGODB dashboard_development['accounts'].find({:_id=>BSON::ObjectId('4e80e0090f6d2e306f000002')})
MONGODB dashboard_development['system.namespaces'].find({})
MONGODB dashboard_development['neighborhoods'].find({"city_id"=>BSON::ObjectId('4e80e00a0f6d2e306f000005')})
Then the views get rendered, they are pretty slow too... but that is a seperate problem all together, I'll address that at another time.
Rendered cities/_title_and_scope.html.erb (109.3ms)
Rendered application/_dropdown.html.erb (0.1ms)
Rendered application/_date_range_selector.html.erb (6.2ms)
Rendered cities/show.html.erb within layouts/application (122.7ms)
Rendered application/_user_dropdown.html.erb (0.9ms)
Rendered application/_main_navigation.html.erb (5.8ms)
So minus the views the request took about 500ms, thats too long for a really simple query, additionally the app is going to grow and that time is going to grow as well. Also this example is faster than the requests usually take, sometimes it takes 1000ms or more!
Completed 200 OK in 628ms (Views: 144.9ms)
Additionally I wanted to ask what fields are most appropriate for indexes? Maybe this is my problem, as I'm not really using them at all. Any help understanding this would be really really appreciated. Thanks!
You need to use indexes -- otherwise, your mongo queries are executing what is best described as a full table scan. It is loading the entirety of your collection's json documents into memory and then evaluating each one to determine if it should be including in the response.
Strings, Date, Numbers can all be used as index -- the trick is, have an index on each attribute you are doing a "where" on.
You can turn off table-scans in your mongo config to help find table scans and destroy them!
I have a controller that returns JSON or XML from a fairly complex relational query with some controller logic as well.
I've tuned on the DB side by refining my query and making sure my indexes are correct for my query.
In my log I see items like this:
Completed in 740ms (View: 1, DB: 50)
So if I understand correctly this means the view took 1 second to render and the DB query was 50ms. Is all the remaining time in the controller? I've tried bypassing my controller logic and just leaving my to_json and to_xml in there and it is just as slow. As a point of reference my average returned JSON result set is 168k.
Are there other steps that go into the Completed in time? Does it include time until last byte for the network transfer?
Update: I wrapped various parts of my controller in benchmarking blocks:
self.class.benchmark("Active Record Find") do
#my query here
end
What I found was that even though the log line says DB: 50 my active record find is taking almost all of the remaining time. So now I'm confused as to what that DB number means and why the benchmark line will say ~ 600ms but the DB: time will be ~50.
Thanks
Your DB number is time actually spent in the database, but not loading ActiveRecord objects.
So if you're loading 168,000 ruby active_record objects to render then as JSON, this would explain your 550 ms (or more!)
If these times are observed in the development environment, the additional time is probably because the application classes are not being cached. The application files are being reloaded on every request.
As suggested in this answer, try setting config.cache_classes = true in environment/development.rb and restarting your server to see what effect this has on your response times. Be sure to change this back to config.cache_classes = false and restart your server once you're done.
I am working on an "optimization" on my application and I am trying to understand the output that rails (version 2.2.2) gives at the end of the render.
Here is the "old" way:
Rendered user/_old_log (25.7ms)
Completed in 466ms (View: 195, DB: 8) | 200 OK
And the "new" way:
Rendered user/_new_log (48.6ms)
Completed in 337ms (View: 192, DB: 33) | 200 OK
These queries were exactly the same, the difference is the old way is parsing log files while the new way is querying the database log table.
The actual speed of the page is not the issue (the user understands that this is a slow request) ... but I would like the page to respond as quickly as possible even though it is a "slow" page.
So, my question is, what do the numbers represent/mean? In other words, which way was the faster method and why?
This:
Rendered user/_old_log (25.7ms)
is the time to render just the _old_log partial template, and comes from an ActiveSupport::Notification getting processed by ActionView::LogSubscriber
This:
Completed 200 OK in 466ms
Is the http status returned, as well as the total time for the entire request. It comes from ActionController::LogSubscriber.
Also, note those parenthetical items at the end:
(Views: 124.6ms | ActiveRecord: 10.8ms)
Those are the total times for rendering the entire view (partials & everything) and all database requests, respectively, and come from ActionController::LogSubscriber as well.
Jordan's answer is correct. To paraphrase, the first number is the time the page took to load. The second is how long the view took to generate. The last number is how long it took for your database to handle all queries you sent to it.
You can also get an estimation of how long your Controller and Model code took by subtracting the last two numbers from the first number, but a better way would be to use the Benchmark.measure method (http://www.ruby-doc.org/stdlib/libdoc/benchmark/rdoc/classes/Benchmark.html).
Your new way appears to have improved because code in the Controller/Model completes faster.
Your new way is spending less time overall but more time rendering the template.