How to know if I have to do memory profiling too? - memory-profiling

I currently do CPU sampling of an ASP.NET Core application where I send huge number of requests(> 500K) to it. I see that the peak working set of the application is around ~300 MB which in my opinion is not huge considering the number of requests being made to the application. But what I have been observing is huge drop in requests per second when I enable certain pieces of functionality in my application.
Question:
Should I do memory profiling too? I ask this because even though the peak working set is ~300MB, there could be large number of short lived objects that could be created & collected by GC and since work by GC also counts as CPU, should I do memory profiling too to see if I allocate too much?

I will answer this question myself based on new information that I found out.
This is based on the tool PerfView, which provides information about GC and allocations.
When you open the GCStats view navigate the links to the process you care and you should see information like below:
Notice that view has the information has the % CPU Time spent Garbage Collecting. If you see this to be > 5% then it should be a cause of concern and you should start memory profiling.

Related

Apple Instruments slows down app when analyzing memory allocations

When running my app in the simulator and analyzing its memory allocations using Instruments, the App runs very slow, it runs at less than 1/30 of its normal speed.
The app uses about 50 MB RAM and has approximately 900,000 life objects (according to Instruments).
Could this be the reason for the slow performance?
When running in the app on the device or in the simulator without using Instruments, it performs well (except the memory issue I am trying to debug).
Do you have any idea on how to solve this issue?
Did you encounter slow performance using the Memory Allocation
instruments?
Would you consider having more than 900,000 life
objects "concerning"?
Considering your Analyzer performance issue
In your specific case monitoring the app over a long period of time will not be necessary, as you reach the state of high memory consumption very soon. You could simply stop recording at this point. Then you won't have problems navigating through the different views and statistics to find the cause of the memory issue.
Analyzing the memory issue
Slowing down is normal. 1/30 sounds quite alarming.
You probably should track how the amount of life objects and the memory usage change while you use the app.
It is difficult to decide if a certain amount of life objects at a specific point in time is critical (though 900,000 seems very high).
In general: if life objects and memory usage grow continuously and don't shrink, that is a bad sign.
If you take a look Statistics -> Object Summary (Screenshot), Live Bytes should be a lot smaller than Overall Bytes and the amount of #Living objects should be a lot smaller than the amount of #Transitory objects.
The second thing you can look at, is the Call Tree view.
It gives you a nice overview of which parts of the application are responsible for reserving large amount of memory:
Possible solutions
Once you detect the parts of your code that are responsible for reserving the large memory amount you can look for retain-cycles or you could try to use more autorelease pools in that spot.
Check that you have enough available disk space. I had 8gb left and it seems like that was too little. Instruments was extreeemely slow. Used a minute just to start and didn't quite get around at all.
I cleared out more disk space and then it suddenly went back to being fast as before.

Any good Profiler with CPU details for ASP.net MVC 3 Web application?

What are some good Profiler for asp.net mvc3 application with CPU usage details?
I have tried New Relic, Slimtune and Mini Profiler. All of them just provide what process is taking longer, but none of them actually give any insights as what is causing high cpu?
All of the above profiler did good job and we improved the process and now it no longer take long time to respond, it loads page in 300-500 ms, but now we're more concerned on CPU usage. Because sometimes application takes lots of high CPU very randomly, and we are trying to find what is causing this behaviour.
High CPU spikes can be for a number of reasons.
First thing I would do is determine if the 'random' high CPU is due to a Generation 2 garbage collections.
ASP.NET Case Study: High CPU in GC - Large objects and high allocation rates
Investigating Memory Issues
Memory Performance Counters
Garbage Collection and Performance
There are quite a few built-in performance counters that might be helpful:
ASP.NET Performance Monitoring, and When to Alert Administrators
Sam Saffron (one of the StackoverFlow authors by the way) has created a great command-line tool a while ago, but unfortunately has abandoned it.
A friend of mine forked the code to make it work in 2015:
https://github.com/jitbit/cpu-analyzer
(the page has a link to Sam's post explaining how to use it)

Should a process always consume the same amount of memory if executed in the same way?

Hi folks and thanks for your time in advance.
I'm currently extending our C# test framework to monitor the memory consumed by our application. The intention being that a bug is potentially raised if the memory consumption significantly jumps on a new build as resources are always tight.
I'm using System.Diagnostics.Process.GetProcessByName and then checking the PrivateMemorySize64 value.
During developing the new test, when using the same build of the application for consistency, I've seen it consume differing amounts of memory despite supposedly executing exactly the same code.
So my question is, if once an application has launched, fully loaded and in this case in it's idle state, hence in an identical state from run to run, can I expect the private bytes consumed to be identical from run to run?
I need to clarify that I can expect memory usage to be consistent as any degree of varience starts to reduce the effectiveness of the test as a degree of tolerance would need to be introduced, something I'd like to avoid.
So...
1) Should the memory usage be 100% consistent presuming the application is behaving consistenly? This was my expectation.
or
2) Is there is any degree of variance in the private byte usage returned by windows or in the memory it allocates when requested by an app?
Currently, if the answer is memory consumed should be consistent as I was expecteding, the issue lies in our app actually requesting a differing amount of memory.
Many thanks
H
Almost everything in .NET uses the runtime's garbage collector, and when exactly it runs and how much memory it frees depends on a lot of factors, many of which are out of your hands. For example, when another program needs a lot of memory, and you have a lot of collectable memory at hand, the GC might decide to free it now, whereas when your program is the only one running, the GC heuristics might decide it's more efficient to let collectable memory accumulate a bit longer. So, short answer: No, memory usage is not going to be 100% consistent.
OTOH, if you have really big differences between runs (say, a few megabytes on one run vs. half a gigabyte on another), you should get suspicious.
If the program is deterministic (like all embedded programs should be), then yes. In an OS environment you are very unlikely to get the same figures due to memory fragmentation and numerous other factors.
Update:
Just noted this a C# app, so no, but the numbers should be relatively close (+/- 10% or less).

cooperative memory usage across threads?

I have an application that has multiple threads processing work from a todo queue. I have no influence over what gets into the queue and in what order (it is fed externally by the user). A single work item from the queue may take anywhere between a couple of seconds to several hours of runtime and should not be interrupted while processing. Also, a single work item may consume between a couple of megabytes to around 2GBs of memory. The memory consumption is my problem. I'm running as a 64bit process on a 8GB machine with 8 parallel threads. If each of them hits a worst case work item at the same time I run out of memory. I'm wondering about the best way to work around this.
plan conservatively and run 4 threads only. The worst case shouldn't be a problem anymore, but we waste a lot of parallelism, making the average case a lot slower.
make each thread check available memory (or rather total allocated memory by all threads) before starting with a new item. Only start when more than 2GB memory are left. Recheck periodically, hoping that other threads will finish their memory hogs and we may start eventually.
try to predict how much memory items from the queue will need (hard) and plan accordingly. We could reorder the queue (overriding user choice) or simply adjust the number of running worker threads.
more ideas?
I'm currently tending towards number 2 because it seems simple to implement and solve most cases. However, I'm still wondering what standard ways of handling situations like this exist? The operating system must do something very similar on a process level after all...
regards,
Sören
So your current worst-case memory usage is 16GB. With only 8GB of RAM, you'd be lucky to have 6 or 7GB left after the OS and system processes take their share. So on average you're already going to be thrashing memory on a moderately loaded system. How many cores does the machine have? Do you have 8 worker threads because it is an 8-core machine?
Basically you can either reduce memory consumption, or increase available memory. Your option 1, running only 4 threads, under-utilitises the CPU resources, which could halve your throughput - definitely sub-optimal.
Option 2 is possible, but risky. Memory management is very complex, and querying for available memory is no guarantee that you will be able to go ahead and allocate that amount (without causing paging). A burst of disk I/O could cause the system to increase the cache size, a background process could start up and swap in its working set, and any number of other factors. For these reasons, the smaller the available memory, the less you can rely on it. Also, over time memory fragmentation can cause problems too.
Option 3 is interesting, but could easily lead to under-loading the CPU. If you have a run of jobs that have high memory requirements, you could end up running only a few threads, and be in the same situation as option 1, where you are under-loading the cores.
So taking the "reduce consumption" strategy, do you actually need to have the entire data set in memory at once? Depending on the algorithm and the data access pattern (eg. random versus sequential) you could progressively load the data. More esoteric approaches might involve compression, depending on your data and the algorithm (but really, it's probably a waste of effort).
Then there's "increase available memory". In terms of price/performance, you should seriously consider simply purchasing more RAM. Sometimes, investing in more hardware is cheaper than the development time to achieve the same end result. For example, you could put in 32GB of RAM for a few hundred dollars, and this would immediately improve performance without adding any complexity to the solution. With the performance pressure off, you could profile the application to see just where you can make the software more efficient.
I have continued the discussion on Herb Sutter's blog and provoced some very helpful reader comments. Head over to Sutter's Mill if you are interested.
Thanks for all the suggestions so far!
Sören
Difficult to propose solutions without knowing exactly what you're doing, but how about considering:
See if your processing algorithm can access the data in smaller sections without loading the whole work item into memory.
Consider developing a service-based solution so that the work is carried out by another process (possibly a web service). This way you could scale the solution to run over multiple servers, perhaps using a load balancer to distribute the work.
Are you persisting the incoming work items to disk before processing them? If not, they probably should be anyway, particularly if it may be some time before the processor gets to them.
Is the memory usage proportional to the size of the incoming work item, or otherwise easy to calculate? Knowing this would help to decide how to schedule processing.
Hope that helps?!

What will happen if a application is large enough to be loaded into the available RAM memory?

There is chance were a heavy weight application that needs to be launched in a low configuration system.. (Especially when the system has too less memory)
Also when we have already opened lot of application in the system & we keep on trying opening new new application what would happen?
I have only seen applications taking time to process or hangs up for sometime when I try operating with it in low config. system with low memory and old processors..
How it is able to accomodate many applications when the memory is low..? (like 128 MB or lesser..)
Does it involves any paging or something else..?
Can someone please let me know the theory behind this..!
"Heavyweight" is a very vague term. When the OS loads your program, the EXE is mapped in your address space, but only the code pages that run (or data pages that are referenced) are paged in as necessary.
You will likely get horrible performance if pages need to constantly be swapped as the program runs (aka many hard page faults), but it should work.
Since your commit charge is near the commit limit, and the commit limit will likely have no room to grow, you will also likely recieve many malloc()/VirtualAlloc(..., MEM_COMMIT)/HeapAlloc()/{Local|Global}Alloc() failures so you need to watch the return codes in your program.
Some keywords for search engines are: paging, swapping, virtual memory.
Wikipedia has an article called Paging (Redirected from Swap space).
There is often the use of virtual memory. Virtual memory pages are mapped to physical memory if they are used. If a physical page is needed and no page is available, another is written to disk. This is called swapping and that explains why crowded systems get slow and memory upgrades have positive effects on performance.

Resources