NewRelic is showing Avg. CPU usage as 11200% for my app. What could be the issue. My app seems to work fine on my iPhone and no user ever reported any battery degradation because of my app. Is there anyone else facing the same issue? How to debug?
Infrastructure Hosts page calculate CPU average using several attributes
CPU percentage is not collected by New Relic, but derived from several other metrics. Specifically, the cpuPercent attribute is an aggregation of cpuUserPercent, cpuSystemPercent, cpuIoWaitPercent and cpuStealPercent
Probably at least one of them is wrong
You can ask this question (or report a bug) in discuss newrelic infrastructure
Or write NRQL query in Insights to check these attributes using SystemSample
To query Infrastructure event data, use the NRQL syntax with the Insights Data Explorer:
Go to insights.newrelic.com > Data Explorer.
From the query command line, use FROM before the event type.
select cpuUserPercent , cpuSystemPercent , cpuIOWaitPercent , cpuStealPercent from SystemSample
It means there is more than one CPU or core.
Newrelic-Docs
Thanks.
Related
I recently launched a new Ruby on Rails application that worked well in development mode. After the launch I have been experiencing the memory being used is constantly increasing:
UPDATED: When this screen dump (the one below) from New Relic was taken. I have scheduled a web dyno restart every hour (one out of two web dynos). Thus, it does not reach the 500Mb-crash level and it actually gets a bit of a sig saw pattern. The problem is not at all resolved by this though, only some of the symptoms. As you can see the morning is not so busy but the afternoon is more busy. I made an upload at 11.30 for a small detail, it could not have affected the problem even though it appears as such in the stats.
It could be noted as well that it is the MIN memory that keeps on increasing even though the graph shows AVG memory. Even when the graph seems to go down temporarily in the graph, the min memory stays the same or increases. The MIN memory never decreases!
The app would (without dyno restarts) increase in memory until it reached the maximum level at Heroku and the app crashes with execution expired-types of errors.
I am not a great programmer but I have made a few apps before without having this type of problem.
Troubleshooting performed
A. I thought the problem would lie within the before_filter in the application_controller (Will variables in application controller cause a memory leak in Rails?) but that wasn't the problem.
B. I installed oink but it does not give any results (at all). It creates an oink.log but does not give any results when I run "heroku run oink -m log/oink.log", no matter what threshold.
C. I tried bleak_house but it was deprecated and could not be installed
D. I have googled and read most articles in the topic but I am none the wiser.
E. I would love to test memprof but I can't install it (I have Ruby 1.9x and don't really know how to downgrade it to 1.8x)
My questions:
Q1. What I really would love to know is the name(s) of the variable(s) that are increasing for each request, or at least which controller is using the most memory.
Q2. Will a controller as the below code increase in memory?
related_feed_categories = []
#gift.tags.each do |tag|
tag.category_connections.each do |cc|
related_feed_categories << cc.category_from_feed
end
end
(sorry, SO won't re-format the code to be easily readable for some reason).
Do I need to "kill off" related_feed_categories with "related_feed_categories = nil" afterwards or does the Garbage Collector handle that?
Q3. What would be my major things to look for? Right now I can't narrow it down AT ALL. I don't know which part of the code to look deeper into, and I don't really know what to look for.
Q4. In case I really cannot solve the problem. Are there any online consulting service where I can send my code and get them to find the problem?
Thanks!
UPDATED. After receiving comments it could have to do with sessions. This is a part of the code that I guess could be bad:
# Create sessions for last generation
friend_data_arr = [#generator.age, #generator.price_low, #generator.price_high]
friend_positive_tags_arr = []
friend_negative_tags_arr = []
friend_positive_tags_arr << #positive_tags
friend_negative_tags_arr << #negative_tags
session["last_generator"] = [friend_data_arr, friend_positive_tags_arr, friend_negative_tags_arr]
# Clean variables
friend_data_arr = nil
friend_positive_tags_arr = nil
friend_negative_tags_arr = nil
it is used in the generator#show controller. When some gifts have been generated through my gift-generating-engine I save the input in a session (in case they want to use that info in a later stage). I never kill or expire these sessions so in case this could cause memory increase.
Updated again: I removed this piece of code but the memory still increases, so I guess this part is not it but similar code might causing the error?
That's unlikely our related_feed_categories provoke this.
Are you using a lot of files ?
How long do you keep sessions datas ? Looks like you have an e-commerce site, are you keeping objects in sessions ?
Basically, i think it is files, or session, or an increase in temporary datas flushed when the server crash(memcache ?).
In the middle of the night, i guess that ou have fewer customer. Can you post the same memory chart, in peak hours ?
It may be related to this problem : Memory grows indefinitely in an empty Rails app
UPDATE :
Rails don't store all the datas on client side. I don't remember the default store, bu unless you choose the cookie::store, rails send only datas like session_id.
They are few guidelines about sessions, the ActiveRecord::SessionStore seem to be the best choice for performance purpose. And you shouldn't keep large objects nor secrets datas in sessions. More on session here : http://guides.rubyonrails.org/security.html#what-are-sessions
In the 2.9 part, you have an explanation to destroy sessions, unused for a certain time.
Instead of storing objects in sessions, i suggest you store the url giving the search results. You may even store it in database, offering the possibility to save few research to your customer, and/or by default load the last used.
But at these stage we are still, not totally sure that sessions are the culprits. In order to be sure, you may try on a test server, to stress test your application, with expiring sessions. So basically, you create a large number of sessions, and maybe 20 min later rails has to suppress them. If you find any difference in memory consumption, it will narrow things.
First case : memory drop significantly when sessions expires, you know that's is session related.
Second case : The memory increase at a faster rate, but don't drop when sessions expires, you know that it is user related, but not session related.
Third case : nothing change(memory increase at usual), so you know it do not depend on the number of user. But i don't know what could cause this.
When i said stress tests, i mean a significant number of sessions, not really a stress test. The number of sessions you need, depends on your average numbers of users. If you had 50 users, before your app crashed, 20 -30 sessions may be sginificant. So if you had them by hand, configure a higher expire time limit. We are just looking for differences in memory comsuption.
Update 2 :
So this is most likely a memory leak. So use object space, it has a count_objects method, which will display all the objets currently used. It should narrow things. Use it when memory have already increased a lot.
Otherwise, you have bleak_house, a gem able to find memory leaks, still ruby tools for memory leaks are not as efficient as java ones, but it's worth a try.
Github : https://github.com/evan/bleak_house
Update 3 :
This may be an explanation, this is not really memory leak, but it grows memory :
http://www.tricksonrails.com/2010/06/avoid-memory-leaks-in-ruby-rails-code-and-protect-against-denial-of-service/
In short, symbols are keep in memory until your restart ruby. So if symbols are created with random name, memory will grow, until your app crash. This don't happen with Strings, the are GCed.
Bit old, but valid for ruby 1.9.x Try this : Symbol.all_symbols.size
Update 4:
So, your symbols are probably the memory leak. Now we still have to find where it occurs. Use Symbol.all_symbols. It gives you the list. I guess you may store this somewhere, and make a diff with the new array, in order to see what was added.
It may be i18n, or it may be something else generating in an implicit way like i18n. But anyway, this is probably generating symbols with random data in the name. And then these symbols are never used again.
Assuming category_from_feed returns a string (or a symbol perhaps), a magnitude of 300MB increase is quite unlikely. You can roughly arrive at this by profiling this:
4_000_000.times {related_feed_categories << "Loooooooooooooong string" }
This snippet would shoot the memory usage up by about 110MB.
I'd look at DB connections or methods that read a file and then don't close it. I can see that it's related to feeds which probably means you might be using XML. That can be a starting point too.
Posting this as answer because this looks bad in comments :/
I'm running a rails 3.0 application on Heroku and using the New Relic addon/service.
I have been looking at the transaction traces feature (available in the pro version) to understand a little more about the performance characteristics of the application. However, a significant portion of time (30-50%) is "uninstrumented time". After making a few stabs by putting method_tracers in some places and going through the reasonably slow cycle to test whether I get more info, I'm feeling this is going nowhere fast.
It seems in the PHP new relic agent they have a great feature to get very detailed traces without needing to guess where to put method tracers: http://newrelic.com/docs/php/php-agent-faq#top100
Is there anything similar to this for ruby?
Note: I'm already using rpm_contrib to get some more info and have garbage collection stats enabled. Also, this is not about fixing a performance problem, just understanding how to better use the performance tools available and scratch a niggling itch about that uninstrumented time.
There isn't currently anything similar for Ruby. I'll mention it to the Ruby engineer when I get a chance. My guess is unless a lot of requests come in for it, it won't be at the top of the list for a while, though. In the meantime, you can use the method tracers to figure out the uninstrumented time.
Hope that helps.
Method tracers can work well, but if you have a lot of code in your controller, try a binary search using trace_execution_scoped, which records the time spent in a block of code:
http://newrelic.github.com/rpm/NewRelic/Agent/MethodTracer/InstanceMethods/TraceExecutionScoped.html#method-i-trace_execution_scoped
Add a couple calls to this, give each metric a sensible name like "Custom/MySlowControllerAction/block0" (first argument to trace_execution_scoped), and repeat.
The metrics you name will show up not just in Transaction Traces, but also in the Performance Breakdown for the controller action under the Web Transactions tab, so you'll see average time in that block of code across all requests, not just the slow ones.
We are experiencing some serious scaling challenges for our intelligent search engine/aggregator. Our database holds around 200k objects. From profiling and newrelic it seems most of our troubles may come from the database. We are using the smallest dedicated database Heroku provide (Ronin).
We have been looking into indexing and caching. So far we managed to solve our problems by reducing database calls and caching content intelligently, but now even this seems to reach an end. We are constantly asking ourselves if our code/configuration is good enough or if we are simply not using enough "hardware".
We suspect that the database solution we buy from Heroku may be performing insufficiently. For example, just doing a simple count (no joins, no nothing) on the 200k items takes around 250ms. This seems like a long time, even though postgres is known for its bad performance on counts?
We have also started to use geolocation lookups based on latitude/longitude. Both columns are indexed floats. Doing a distance calculation involves pretty complicated math, but we are using the very well recommended geocoder gem that is suspected to run very optimized queries. Even geocoder still takes 4-10 seconds to perform a lookup on, say, 40.000 objects, returning only a limit of the first nearest 10. This again sounds like a long time, and all the experienced people we consult says that it sound very odd, again hinting at the database performance.
So basically we wonder: What can we expect from the database? Might there be a problem? And what can we expect if we decide to upgrade?
An additional question I have is: I read here that we can improve performance by loading the entire database into memory. Are we supposed to configure this ourselves and if so how?
UPDATE ON THE LAST QUESTION:
I got this from the helpful people at Heroku support:
"What this means is having enough memory (a large enough dedicated
database) to store your hot data set in memory. This isn't something
you have to do manually, Postgres is configured automatically use all
available memory on our dedicated databases.
I took a look at your database and it looks like you're currently
using about 1.25 GB of RAM, so you haven't maxed your memory usage
yet."
UPDATE ON THE NUMBERS AND FIGURES
Okay so now I've had time to look into the numbers and figures, and I'll try to answer the questions below as follows:
First of all, the db consists of around 29 tables with a lot of relations. But in reality most queries are done on a single table (some additional resources are joined in, to provide all needed information for the views).
The table has 130 columns.
Currently it holds around 200k records but only 70k are active - hence all indexes are made as partial-indexes on this "state".
All columns we search are indexed correctly and none is of text-type, and many are just booleans.
Answers to questions:
Hmm the baseline performance it's kind of hard to tell, we have sooo many different selects. The time it takes varies typically from 90ms to 250ms selecting a limit of 20 rows. We have a LOT of counts on the same table all varying from 250ms to 800ms.
Hmm well, that's hard to say cause they wont give it a shot.
We have around 8-10 users/clients running requests at the same time.
Our query load: In new relic's database reports it says this about the last 24 hours: throughput: 9.0 cpm, total time: 0.234 s, avg time: 25.9 ms
Yes we have examined the query plans of our long-running queries. The count queries are especially slow, often over 500ms for a pretty simple count on the 70k records done on indexed columns with a result around 300
I've tuned a few Rails apps hosted on Heroku, and also hosted on other platforms, and usually the problems fall into a few basic categories:
Doing too much in ruby that could be done at the db level (sorting, filtering, join data, etc)
Slow queries
Inefficient use of indexes (not enough, or too many)
Trying too hard to do it all in the db (this is not as common in rails, but does happen)
Not optimizing cacheable data
Not effectively using background processing
Right now its hard to help you because your question doesn't contain any specifics. I think you'll get a better response if you pinpoint the biggest issue you need help with and then ask.
Some info that will help us help you:
What is the average response time of your actions? (from new relic, request-log-analyzer, logs)
What is the slowest request that you want help with?
What are the queries and code in that request?
Is the site's performance different when you run it locally vs. heroku?
In the end I think you'll find that it is not an issue specific to Heroku, and if you had your app deployed on amazon, engineyard, etc you'd have the same performance. The good news is I think that your problems are common, and shouldn't be too hard to fix once you've done some benchmarking and profiling.
-John McCaffrey
We are constantly asking...
...this seems a lot...
...that is suspected...
...What can we expect...
Good news! You can put and end to seeming, suspecting wondering and expecting through the magic of measurement!!!
Seriously though, you've not mentioned any of the basic points you'd need to get a useful answer:
What's the baseline performance of the DB running a sequential scan and single-row index fetches? You say Heroku say your DB fits in RAM, so you shouldn't see disk I/O issues when you measure.
Does this performance match whatver Heroku say it should be?
How many concurrent clients?
What's your query load - what queries and how often?
Have you checked the query plans for any of your suspiciously long-running queries?
Once you've got this sort of information, maybe someone can say something useful. As it stands anything you read here is just guesswork.
First: you should check your postgres configuration. (show all from within psql or another client, or just look at postgres.conf in the data directory) The parameter with the largest impact on performance is effective_cache_size, which should be set to about (total_physical_ram - memory_in_use_by_kernel_and_all_processes). For a 4GB machine, this often is around 3GB (4-1). (this is very course tuning, but will give the best results for a first step)
Second: why do you want all the counts? Better use a typical query: just ask for what is needed, not what is available. (reason: there is no possible optimisation for a COUNT(*): eiither the whole table, or a whole index needs to be scanned)
Third: start gathering and analysing some queryplans (for typical queries that perform badly). You can get a query plan by putting EXPLAIN ANALYZE before the actual query. (another way is to increase the logging level, and obtain them from the logfile) A bad queryplan can point you at missing statistics or indexes, or even at bad data-modelling.
Newrelic monitoring can be included as an add-on for heroku (http://devcenter.heroku.com/articles/newrelic). At the very least this should give you a lot of insight into what is happening behind the scenes, and may help you pinpoint some issues.
I'm working on web app (Rails 3 based). And I really don't like the time it takes to generate the page - depending on the displayed data it takes up to 2.5 and even 4 seconds.
So I just was wondering what is the average reasonable time for generating page in your apps. Saying you check the generation time, e.g. it's 750ms and think "Ok, that should be fine even without caching". Or when you see 1.5sec you think "Oh my God, the user won't wait so long and leave the site"
There's a huge amount of research data regarding the time from query to rendering and user's experience. I'd recommend reading this useit.com article. After all Google integrated page speed in its results for a reason ;)
The 3 response-time limits are the
same today as when I wrote about them
in 1993 (based on 40-year-old research
by human factors pioneers):
0.1 seconds gives the feeling of instantaneous response — that is, the
outcome feels like it was caused by
the user, not the computer. This level
of responsiveness is essential to
support the feeling of direct
manipulation (direct manipulation is
one of the key GUI techniques to
increase user engagement and control —
for more about it, see our Principles
of Interface Design seminar).
1 second keeps the user's flow of thought seamless. Users can sense a
delay, and thus know the computer is
generating the outcome, but they still
feel in control of the overall
experience and that they're moving
freely rather than waiting on the
computer. This degree of
responsiveness is needed for good
navigation.
10 seconds keeps the user's attention. From 1–10 seconds, users
definitely feel at the mercy of the
computer and wish it was faster, but
they can handle it. After 10 seconds,
they start thinking about other
things, making it harder to get their
brains back on track once the computer
finally does respond.
A 10-second delay will often make
users leave a site immediately. And
even if they stay, it's harder for
them to understand what's going on,
making it less likely that they'll
succeed in any difficult tasks.
As a rule of thumb, think that you always should aim for a balance of optimization time vs time gained. Don't spend days optimizing the hell out of one routine when your images aren't compressed correctly, or your scripts/css not combined. Yes, faster is better, but a 90% gain in generating the page by setting up a smart cache beats a 10% gain after one week tweaking the algorithm.
Also don't look too much into the first-render-time when the framework has to load everything, but use stress-testing, cached or not, to simulate various situations.
Now, some data; some of the latest sites i worked on used DotNetNuke, a huge open-source CMS, and Asp.Net MVC where you nearer to the metal. Average page time with average db queries was 600-700 milliseconds for DotNetNuke. For Asp.net MVC, it's 70-100 milliseconds... Users really like the second one :)
There's no 'right' answer to this - the faster the better. Personally I normally aim for < 200ms, although I know from experience that it can be quite difficult to achieve this in Rails on anything but simple apps. Try and figure out where your bottlenecks are and cache what you can.
Edit: There seems to be some confusion between page generation time and page render time. Obviously a quick page render is the goal, and on most sites doing things like reducing HTTP requests, gzipping CSS/JS are where you can get most of your quick wins. But if the page itself can take 4-5 seconds to generate, then you're probably right that your app is where you should start.
It depends on whether nothing is displayed for 2.5-4 seconds, or that the user already sees (a part of) the page from the start, and it finishes loading completely after 2.5-4 seconds. In that case the user doesn't experience a 2.5-4 second load. Take the http://www.nytimes.com/ website; I see most of it right away, but according to the Web Inspector it takes 1.94 seconds for it to be loaded completely.
And keep in mind that the speed will also depend on the browser, computer, internet connection. What's fast for you might be slower for others.
Measure your apdex score and see how it is performing. That will give you a rough indiciation. From there, you can decide how you want to increase performance.
It also depends on what your site is; an system application for a business or software as a service (SaaS)? If it's a system application, the users are forced to use it to performance can be negotiated. If it is a SaaS, then the higher your apdex score, the more chance you have of losing your user's interest.
There are a few gems out there that measure performance and report on what your apdex is.
Here's a little more info: http://apdex.org/blog/?p=630
My personal rule - no page should take more than 0.05 seconds, or you are in troubles.
As long as you write proper code, you don't need to spend much time on optimization to stay under 0.05.
If you stick to giant frameworks, then you are out of luck.
I have a site with several pages for each company and I want to show how their page is performing in terms of number of people coming to this profile.
We have already made sure that bots are excluded.
Currently, we are recording each hit in a DB with either insert (for the first request in a day to a profile) or update (for the following requests in a day to a profile). But, given that requests have gone from few thousands per days to tens of thousands per day, these inserts/updates are causing major performance issues.
Assuming no JS solution, what will be the best way to handle this?
I am using Ruby on Rails, MySQL, Memcache, Apache, HaProxy for running overall show.
Any help will be much appreciated.
Thx
http://www.scribd.com/doc/49575/Scaling-Rails-Presentation-From-Scribd-Launch
you should start reading from slide 17.
i think the performance isnt a problem, if it's possible to build solution like this for website as big as scribd.
Here are 4 ways to address this, from easy estimates to complex and accurate:
Track only a percentage (10% or 1%) of users, then multiply to get an estimate of the count.
After the first 50 counts for a given page, start updating the count 1/13th of the time by a count of 13. This helps if it's a few page doing many counts while keeping small counts accurate. (use 13 as it's hard to notice that the incr isn't 1).
Save exact counts in a cache layer like memcache or local server memory and save them all to disk when they hit 10 counts or have been in the cache for a certain amount of time.
Build a separate counting layer that 1) always has the current count available in memory, 2) persists the count to it's own tables/database, 3) has calls that adjust both places