How do I get a more detailed transaction trace with the Ruby New Relic agent - ruby-on-rails

I'm running a rails 3.0 application on Heroku and using the New Relic addon/service.
I have been looking at the transaction traces feature (available in the pro version) to understand a little more about the performance characteristics of the application. However, a significant portion of time (30-50%) is "uninstrumented time". After making a few stabs by putting method_tracers in some places and going through the reasonably slow cycle to test whether I get more info, I'm feeling this is going nowhere fast.
It seems in the PHP new relic agent they have a great feature to get very detailed traces without needing to guess where to put method tracers: http://newrelic.com/docs/php/php-agent-faq#top100
Is there anything similar to this for ruby?
Note: I'm already using rpm_contrib to get some more info and have garbage collection stats enabled. Also, this is not about fixing a performance problem, just understanding how to better use the performance tools available and scratch a niggling itch about that uninstrumented time.

There isn't currently anything similar for Ruby. I'll mention it to the Ruby engineer when I get a chance. My guess is unless a lot of requests come in for it, it won't be at the top of the list for a while, though. In the meantime, you can use the method tracers to figure out the uninstrumented time.
Hope that helps.

Method tracers can work well, but if you have a lot of code in your controller, try a binary search using trace_execution_scoped, which records the time spent in a block of code:
http://newrelic.github.com/rpm/NewRelic/Agent/MethodTracer/InstanceMethods/TraceExecutionScoped.html#method-i-trace_execution_scoped
Add a couple calls to this, give each metric a sensible name like "Custom/MySlowControllerAction/block0" (first argument to trace_execution_scoped), and repeat.
The metrics you name will show up not just in Transaction Traces, but also in the Performance Breakdown for the controller action under the Web Transactions tab, so you'll see average time in that block of code across all requests, not just the slow ones.

Related

Frequently refreshing web page during long-running process

I've been hunting around my issue for a while, probably the best I've come up with is another Stack Overflow question: How should I perform a long-running task in ASP.NET 4?
I'm in a similar place in that I'm wanting to understand what my options are, but I don't feel I know enough specifically about MVC to come to a view. I'm using MVC 5 but with the 4.8 framework, plus I note that technologies such as SignalR have become available since this question was asked. I was wondering if any experienced MVC'ers could give me a view?
I too have a long running process. More specifically, the user is importing a file. The file is delimited so the import happens line by line. The file might be thousands of lines long. Each line will be parsed and imported in a fraction of a second but the whole operation might take several minutes.
I don't particularly need behaviour to be asynchronous, but because of the length of the entire process I want to regularly update the user on progress. I'm wondering what options I have?
I've got a vague recollection that I might have looked at this problem 20-odd years ago (Classic ASP), and solved it by regular flushes, sending a bit more of the page to the client every few seconds, but I'm trying also to use a _Layout page now, so I've sent the page back already. So I don't think I have that option, even assuming such a mechanism still exists. A bit more recently, but still a while ago, I might have used javascript to poll but everything I'm reading now seems to point me to newer technologies which I'm not sure I fully understand yet.
I'm just wondering how would you solve this problem?
I would not be performing any of the file parsing on the web server, especially if it's thousands of rows long. I would delegate this to a background service of sorts, whether that be a Lambda service in the cloud or a Windows service or a scheduled task. You could then call your SignalR hub from the background task (whatever that might be) to update the progress of the import.

Rails app: Trouble shooting frequent Handling RequestTimeOut errors

I have a large webb app of which I have recently been working hard to reduce load times. I have two controllers Generator (some 20.000 items) and Product (some 1.500 items) that have been slow for a while but I have worked with indexes and smart queries. On my dev app the app response time is about 500 ms.
From time to time I still get RequestTimeOut on the app and I need help trouble shooting this error. I understand what it means (a request has taken too much time) and I have installed the 'rack-timeout' gem and set it to 15 seconds (which works fine).
I have gone through the entire app (and especially the two slowest: Generator & Product) in search for time to save. I have had some issues with caching that I am currently trying to fix (caching would help quite a bit).
It seems that these timeouts happens mostly when bots (Yandex.ru especially) spiders through my site and especially goes through one generator after another. They may not be very slow any more but loading so many after another causes a lot of requests.
Now I am out of ideas and need some help in order to know what and how to continue my trouble shooting:
Is there anything else outside of response time that cause this
error? E.g. memory leakage or something? Or is it just a matter of
lots of requests on slow controllers?
I haven't been able to test it on my development platform. Is
there a way to benchmark and see how the app would handle
requests like from the bots? I seem to remember there was an
"Apache-thing" one could use to simulate traffic like this.
Any other ways of looking at the problem or trouble shoot this
issue from a high level point of view? Any ideas and
thoughts are welcome!

Memory constantly increasing in Rails app

I recently launched a new Ruby on Rails application that worked well in development mode. After the launch I have been experiencing the memory being used is constantly increasing:
UPDATED: When this screen dump (the one below) from New Relic was taken. I have scheduled a web dyno restart every hour (one out of two web dynos). Thus, it does not reach the 500Mb-crash level and it actually gets a bit of a sig saw pattern. The problem is not at all resolved by this though, only some of the symptoms. As you can see the morning is not so busy but the afternoon is more busy. I made an upload at 11.30 for a small detail, it could not have affected the problem even though it appears as such in the stats.
It could be noted as well that it is the MIN memory that keeps on increasing even though the graph shows AVG memory. Even when the graph seems to go down temporarily in the graph, the min memory stays the same or increases. The MIN memory never decreases!
The app would (without dyno restarts) increase in memory until it reached the maximum level at Heroku and the app crashes with execution expired-types of errors.
I am not a great programmer but I have made a few apps before without having this type of problem.
Troubleshooting performed
A. I thought the problem would lie within the before_filter in the application_controller (Will variables in application controller cause a memory leak in Rails?) but that wasn't the problem.
B. I installed oink but it does not give any results (at all). It creates an oink.log but does not give any results when I run "heroku run oink -m log/oink.log", no matter what threshold.
C. I tried bleak_house but it was deprecated and could not be installed
D. I have googled and read most articles in the topic but I am none the wiser.
E. I would love to test memprof but I can't install it (I have Ruby 1.9x and don't really know how to downgrade it to 1.8x)
My questions:
Q1. What I really would love to know is the name(s) of the variable(s) that are increasing for each request, or at least which controller is using the most memory.
Q2. Will a controller as the below code increase in memory?
related_feed_categories = []
#gift.tags.each do |tag|
tag.category_connections.each do |cc|
related_feed_categories << cc.category_from_feed
end
end
(sorry, SO won't re-format the code to be easily readable for some reason).
Do I need to "kill off" related_feed_categories with "related_feed_categories = nil" afterwards or does the Garbage Collector handle that?
Q3. What would be my major things to look for? Right now I can't narrow it down AT ALL. I don't know which part of the code to look deeper into, and I don't really know what to look for.
Q4. In case I really cannot solve the problem. Are there any online consulting service where I can send my code and get them to find the problem?
Thanks!
UPDATED. After receiving comments it could have to do with sessions. This is a part of the code that I guess could be bad:
# Create sessions for last generation
friend_data_arr = [#generator.age, #generator.price_low, #generator.price_high]
friend_positive_tags_arr = []
friend_negative_tags_arr = []
friend_positive_tags_arr << #positive_tags
friend_negative_tags_arr << #negative_tags
session["last_generator"] = [friend_data_arr, friend_positive_tags_arr, friend_negative_tags_arr]
# Clean variables
friend_data_arr = nil
friend_positive_tags_arr = nil
friend_negative_tags_arr = nil
it is used in the generator#show controller. When some gifts have been generated through my gift-generating-engine I save the input in a session (in case they want to use that info in a later stage). I never kill or expire these sessions so in case this could cause memory increase.
Updated again: I removed this piece of code but the memory still increases, so I guess this part is not it but similar code might causing the error?
That's unlikely our related_feed_categories provoke this.
Are you using a lot of files ?
How long do you keep sessions datas ? Looks like you have an e-commerce site, are you keeping objects in sessions ?
Basically, i think it is files, or session, or an increase in temporary datas flushed when the server crash(memcache ?).
In the middle of the night, i guess that ou have fewer customer. Can you post the same memory chart, in peak hours ?
It may be related to this problem : Memory grows indefinitely in an empty Rails app
UPDATE :
Rails don't store all the datas on client side. I don't remember the default store, bu unless you choose the cookie::store, rails send only datas like session_id.
They are few guidelines about sessions, the ActiveRecord::SessionStore seem to be the best choice for performance purpose. And you shouldn't keep large objects nor secrets datas in sessions. More on session here : http://guides.rubyonrails.org/security.html#what-are-sessions
In the 2.9 part, you have an explanation to destroy sessions, unused for a certain time.
Instead of storing objects in sessions, i suggest you store the url giving the search results. You may even store it in database, offering the possibility to save few research to your customer, and/or by default load the last used.
But at these stage we are still, not totally sure that sessions are the culprits. In order to be sure, you may try on a test server, to stress test your application, with expiring sessions. So basically, you create a large number of sessions, and maybe 20 min later rails has to suppress them. If you find any difference in memory consumption, it will narrow things.
First case : memory drop significantly when sessions expires, you know that's is session related.
Second case : The memory increase at a faster rate, but don't drop when sessions expires, you know that it is user related, but not session related.
Third case : nothing change(memory increase at usual), so you know it do not depend on the number of user. But i don't know what could cause this.
When i said stress tests, i mean a significant number of sessions, not really a stress test. The number of sessions you need, depends on your average numbers of users. If you had 50 users, before your app crashed, 20 -30 sessions may be sginificant. So if you had them by hand, configure a higher expire time limit. We are just looking for differences in memory comsuption.
Update 2 :
So this is most likely a memory leak. So use object space, it has a count_objects method, which will display all the objets currently used. It should narrow things. Use it when memory have already increased a lot.
Otherwise, you have bleak_house, a gem able to find memory leaks, still ruby tools for memory leaks are not as efficient as java ones, but it's worth a try.
Github : https://github.com/evan/bleak_house
Update 3 :
This may be an explanation, this is not really memory leak, but it grows memory :
http://www.tricksonrails.com/2010/06/avoid-memory-leaks-in-ruby-rails-code-and-protect-against-denial-of-service/
In short, symbols are keep in memory until your restart ruby. So if symbols are created with random name, memory will grow, until your app crash. This don't happen with Strings, the are GCed.
Bit old, but valid for ruby 1.9.x Try this : Symbol.all_symbols.size
Update 4:
So, your symbols are probably the memory leak. Now we still have to find where it occurs. Use Symbol.all_symbols. It gives you the list. I guess you may store this somewhere, and make a diff with the new array, in order to see what was added.
It may be i18n, or it may be something else generating in an implicit way like i18n. But anyway, this is probably generating symbols with random data in the name. And then these symbols are never used again.
Assuming category_from_feed returns a string (or a symbol perhaps), a magnitude of 300MB increase is quite unlikely. You can roughly arrive at this by profiling this:
4_000_000.times {related_feed_categories << "Loooooooooooooong string" }
This snippet would shoot the memory usage up by about 110MB.
I'd look at DB connections or methods that read a file and then don't close it. I can see that it's related to feeds which probably means you might be using XML. That can be a starting point too.
Posting this as answer because this looks bad in comments :/

Ruby on Rails: what performance can I realistically aim for?

I've been building an application in Ruby on Rails 3, and I'm starting to worry about performance optimization. Now I hope that my question is not too subjective for this site, but I'm interested in facts, not a discussion, so here goes:
While I'm trying to get my views to render faster, there is one thing I simply do not know: What should I aim for? Given a reasonably complex page, what load time is realistic? I simply don't have any reference.
What I'm typically seeing for my application is something like this:
Completed 200 OK in 397ms (Views: 341.1ms | ActiveRecord: 17.7ms)
This is on my production server, running Apache/Passenger.
I am the only one (!) making requests on that server, it's a root server (not virtual), running Ubuntu, AMD Athlon 64 X2 5600+, 4 GB RAM
That is, for most of my more complicated actions (not unusually complicated, just assume it's a paginated listing of 20 objects with 5 computed properties each or something) the ActiveRecord times are almost always fine (<20-30ms), but the "views" number is usually >200 ms.
Now, to my question: When I started using RoR my expectation (maybe unrealistic) was that for most consumer-oriented applications with average complexity (let's say something like Facebook, Twitter, etc. WITHOUT the millions of users) I would get < 20 ms load times as long as I was the only one making requests, and that for a single server load times would only approach 100ms or more if there were lots of people making requests at the same time.
My expectation was also that database requests would be the major bottleneck, since all the rest is just relatively simple computations without any real complexity. I thought that it might take 10ms to get all the objects from the database, and then maybe another 5 ms to run the controller code, build the view, etc.
Since I've never been in charge of any production app, I don't know if this expectation was in any way realistic. So I would like somebody with experience point out to me what my realistic expectation should be.
(e.g. "pretty much everything but really nasty stuff should render in 50 ms tops as long as you are the only one making requests")
or ("actually 300 ms is not unusual for RoR applications, even if you're the only user")
or ("Are you kidding? I get < 10 ms with 150 concurrent requests on a smaller server than yours. There must be something very wrong with your app)
Again, I hope this is not too subjective, but I'm not really interested in an opinion of whether or not RoR is fast, I want facts from someone with more experience on what numbers are average and to be expected from production RoR applications. Otherwise I simply have no clue at what point I should stop optimizing and just accept that I'll never get 10 ms load times.
Gosh, I'm not sure I'm the one to answer this, but since I've been around these waters enough times, I may have an incomplete idea of things to look at.
First of all, the response times is pretty subjective. Meaning, it's good enough if it's good enough for you. From my experience, pages resembling your description seem to take about as much time as what you're describing. So, you're not orders of magnitude off in either direction.
If you want to optimize your view renders with your current architecture, your next step is here, I think. Greg Pollack does a great job breaking this stuff down for you and will make sure you're on track. You'll be sure to get your assets cached and your stack fine-tuned. That'll be your most practical general advice.
If you're willing to look at your deployment architecture, Ilya Grigorik raises some great questions in this article and then answers them with Goliath. If your bottlenecks are speeding up your server-client round trip, that's probably the approach to do.
I try to pay attention to anything Aaron Patterson says about performance, like in this talk. He's going to teach general optimization ideas, most of them for your server-side code. You may catch a few things that relate to your current problem.
I was pulled aside by a former co-worker at MWRC this year and told that I'm absolutely nuts if I'm not building with JRuby these days. It's a bit of a commitment, and I've resisted making major changes like that until I have truly painful response times, which I don't, and it doesn't sound like you're having either. However, JRuby's a very mainstream thing to do now, and you and I will likely embrace this for some projects at some point in the future.
So, bottom line, I think you're in the realm of a spry app as you are. I think I'd work down these resources in the order I presented them.
Not knowing what you're rendering, it's hard to comment on the performance, but I would venture to say that 200ms is very high. Don't forget that the debug information in your logs can be a little misleading: if you're querying your DB or some external resource from within a view, as opposed to preloading that data in your controller, then that time will be attributed to view rendering.
Common culprits: you load Model X in your model, but then access an association in your view which triggers a bunch of selects under the hood. The time to fetch Model x is low, but the associated records will show up as "view time".
In other words, dig into the logs and if its actually your view code, then bring up a profiler.
I'm getting view times < 20ms on a $20/month linode server. That's well-optimized code, for a request of medium complexity, running on JRuby. You haven't hit Rails' performance limits by any means. Time to use a profiler and see what's taking so long.
I don't think your 200 ms view time is abnormal, or even high in any way.
However, you have room for improvement. You say " (not unusually complicated, just assume it's a paginated listing of 20 objects with 5 computed properties each or something)"
To me, that's 100 operations that could be pre-calculated, and would speed up your view rendering time.
Finally -- Rendering time doesn't usually have a direct correlation to number of users. Under most deployments, as a request comes in, it is handled by a process and then responded to. Other requests wait until the first is completed before they are processed.
Use static content where possible. Outside of that, use caching where possible, at the highest possible level, preferably at the page level. When content can't be cached, try to get -something- static or cacheable back to the user quickly. You might, for instance, serve up a static page with the basic layout, and an animated busy-image where the content belongs, and then use JavaScript to load the dynamic content.

How to prepare to be tech crunched

There is a good chance that we will be tech crunched in the next few days. Unfortunately, we have not gone live yet so we don't have a good estimation of how our system handles a production audience.
Our production setup consists of 2 EngineYard slices each with 3 mongrel instances, using Postgres as the database server.
Obviously a huge portion of how our app will hold up is to do with our actual code and queries etc. However, it would be good to see if there are any tips/pointers on what kind of load to expect or experiences from people who have been through it. Does 6 mongrel instances (possibly 8 if the servers can take it) sound like it will handle the load, or are at least most of it?
I have worked on several rails applications that experienced high load due to viral growth on Facebook.
Your mongrel count should be based on several factors. If your mongrels make API calls or deliver email and must wait for responses, then you should run as many as possible. Otherwise, try to maintain one mongrel per CPU core, with maybe a couple extra left over.
Make sure your server is using a Fair Proxy Balancer (not round robin). Here is the nginx module that does this: http://github.com/gnosek/nginx-upstream-fair/tree/master
And here are some other tips on improving and benchmarking your application performance to handle the load:
ActiveRecord
The most common problem Rails applications face is poor usage of ActiveRecord objects. It can be quite easy to make 100's of queries when only one is necessary. The easiest way to determine if this could be a problem with your application is to set up New Relic. After making a request to each major page on your site, take a look at the newrelic SQL overview. If you see a large number of very similar queries sequentially (select * from posts where id = 1, select * from posts where id = 2, select * from posts...) this may be a sign that you need to use a :include in one of your ActiveRecord calls.
Some other basic ActiveRecord tips (These are just the ones I can think of off the top of my head):
If you're not doing it already, make sure to correctly use indexes on your database tables.
Avoid making database calls in views, especially partials, it can be very easy to lose track of how much you are making database queries in views. Push all queries and calculations into your models or controllers.
Avoid making queries in iterators. Usually this can be done by using an :include.
Avoid having rails build ActiveRecord objects for large datasets as much as possible. When you make a call like Post.find(:all).size, a new class is instantiated for every Post in your database (and it could be a large query too). In this case you would want to use Post.count(:all), which will make a single fast query and return an integer without instantiating any objects.
Associations like User..has_many :objects create both a user.objects and user.object_ids method. The latter skips instantiation of ActiveRecord objects and can be much faster. Especially when dealing with large numbers of objects this is a good way to speed things up.
Learn and use named_scope whenever possible. It will help you keep your code tiny and makes it much easier to have efficient queries.
External APIs & ActionMailer
As much as you can, do not make API calls to external services while handling a request. Your server will stop executing code until a response is received. Not only will this add to load times, but your mongrel will not be able to handle new requests.
If you absolutely must make external calls during a request, you will need to run as many mongrels as possible since you may run into a situation where many of them are waiting for an API response and not doing anything else. (This is a very common problem when building Facebook applications)
The same applies to sending emails in some cases. If you expect many users to sign up in a short period of time, be sure to benchmark the time it takes for ActionMailer to deliver a message. If it's not almost instantaneous then you should consider storing emails in your database an using a separate script to deliver them.
Tools like BackgroundRB have been created to solve this problem.
Caching
Here's a good guide on the different methods of caching in rails.
Benchmarking (Locating performance problems)
If you suspect a method may be slow, try benchmarking it in console. Here's an example:
>> Benchmark.measure { User.find(4).pending_invitations }
=> #<Benchmark::Tms:0x77934b4 #cutime=0.0, #label="", #total=0.0, #stime=0.0, #real=0.00199985504150391, #utime=0.0, #cstime=0.0>
Keep track of methods that are slow in your application. Those are the ones you want to avoid executing frequently. In some cases only the first call will be slow since Rails has a query cache. You can also cache the method yourself using Memoization.
NewRelic will also provide a nice overview of how long methods and SQL calls take to execute.
Good luck!
Look into some load testing software like WEBLoad or if you have money, Quick Test Pro. This will help give you some idea. WEBLoad might be the best test in your situation.
You can generate thousands of virtual nodes hitting your site and you can inspect the performance of your servers from that load.
In my experience having watched some of our customers absorb a crunching, the traffic was fairly modest- not the bone crushing spike people seem to expect. Now, if you get syndicated and make on Yahoo's page or something, things may be different.
Search for the experiences of Facestat.com if you want to read about how they handled it (the Yahoo FP.)
My advise is just be prepared to turn off signups or go to a more static version of your site if your servers get too hot. Using a monitoring/profiling tool is a good idea as well, I like FiveRuns Manage tool for ease of setup.
Since you're using EngineYard, you should be able to allocate more machines to handle the load if necessary
Your big problems will probably not be the number of incoming requests, but will be the amount of data in your database showing you where your queries aren't using the indexes your expecting, or are returning too much data, e.g. The User List page works with 10 users, but dies when you try to show 10,000 users on that one page because you didn't add pagination (will_paginate plugin is almost your friend - watch out for 'select count(*)' queries that are generated for you)
So the two things to watch:
Missing indexes
Too much data per page
For #1, there's a plugin that runs an 'explain ...' query after every query so you can check index usage manually
There is a plugin that can generate data for you for various types of data that may help you fill your database up to test these queries too.
For #2, use will_paginate plugin or some other way to reduce data per page.
We've got basically the same setup as you, 2 prod slices and a staging slice at EY. We found ab to be a great load testing tool - just write a bash script with the urls that you expect to get hit and point it at your slice. Watch NewRelic stats and it should give you some idea of the load your app can handle and where you might need to optimise.
We also found query_reviewer to be very useful as well. It is great for finding those un-indexed tables and n+1 queries.

Resources