Why is Application Insights Performance Counter Collection causing high CPU usage

Why is Application Insights Performance Counter Collection causing high CPU usage - monitoring

Logging performed by our Performance Team has indicated that this line specifically, is killing our CPUs
Microsoft.AI.PerfCounterCollector!Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.Implementation.PerformanceCounterUtility.ExpandInstanceName()
One theory was the the regexes used to identify Perf Counters in the library are recursing
https://adtmag.com/blogs/dev-watch/2016/07/stack-overflow-crash.aspx
I've inspected the Perf Counter names and nothing looks particularly out of kilter regarding the names and the regexes should have no trouble chewing over them. Certainly for large periods of time there are no issues whatsoever.
I've now turned on Applications Insights Diagnostic Logging in an attempt to observe the issue (in a test environment)
Has anyone else observed this, how can we mitigate this?
We have ensured DeveloperMode is NOT set to on.

this answer most likely won`t be usable now as AppInsights 2 code is much improved, but in my case it was double call AddApplicationInsightsTelemetry(). It adds collector each time this method is called, and due to lack of synchronization inside perf counter collector code it creates CPU spikes.
So avoid calling multiple times AddApplicationInsightsTelemetry, or use AppInsights 2.x. (and the best thing is to do both).

Do the counters you're collecting utilize instance placeholders in their names? If the instance name is known at build time, getting rid of placeholders may significantly improve performance. For instance, instead of
\Process(??APP_WIN32_PROC??)\% Processor Time
try using
\Process(w3wp)\% Processor Time
Also, how many counters are you collecting overall?

Related

How do I figure out why a large chunk of memory is not being garbage collected in Rails?

I'm pretty new to Ruby, Rails, and everything else in that ecosystem. I've joined a team that has a Ruby 3.1.2 / Rails 6.1.7 app backed by a Postgres database.
We have a scenario where, sometimes, memory usage on one of our running instances jumps up significantly and is never relinquished (we've waited days). Until today, we didn't know what was causing it or how to reproduce it.
It turns out that it's caused by an internal tool which was running an unbounded ActiveRecord query -- no limit and no paging. When pointing this tool at a more active customer, it takes many seconds, returns thousands of records, and memory usage increases by tens of MB. No amount of waiting will lead to the memory usage going back down again.
Having discovered that, we recently added paging to that particular tool, and in the ~week since, we have not seen usage increasing in giant chunks anymore. However, there are other scenarios which have similar behavior but with smaller payloads; these cause memory usage to increase gradually over time. We deploy this application often enough that it hasn't been a big deal, but I am looking to gain a better understanding of what's happening and to determine if there's a problem here, because that's not what we should see from a stable application that's free of memory leaks.
My first suspicion was a memoized instance variable on the controller, but a quick check indicates that Rails controllers are discarded as soon as the request finishes processing, so I don't think that's it.
My next suspicion was that ActiveRecord was caching my resultset, but I've done a bunch of research on how this works and my understanding is that any cached queries/relations should be released when the request completes. Even if I have that wrong, a subsequent identical request takes just as long and causes another jump in memory usage, so either that's not it, or caching is broken on our system.
My Google searches turn up lots of results about various caching capabilities in Rails 2, 3, and 5 -- but not much about 6.x, so maybe something significant has changed and I just haven't found it.
I did find ruby-prof, memory-profiler, and get_process_mem -- these all seem like they are only suitable for high-level analysis and wouldn't help me here.
Can I explore the contents of the object graph currently in memory on an existing, live instance of my app? I'm imagining that this would happen in the Rails console, but that's not a constraint on the question. If not, is there some other way that I could find out what is currently in memory, and whether it's just a bunch of fragmented pages or if there's actually something that isn't getting garbage collected?
EDIT
#engineersmnky pointed out in the comments that maybe everything is fine and that perhaps Ruby is just still holding on to the OS page due to some other still-valid object therein. However, if this is the case, it strikes me as unlikely that memory usage would not go back down to the previous baseline after several days of production usage.
Loading tens of MB worth of resultset into memory should result in the allocation of >1000 16kb memory pages in just a handful of seconds. It seems reasonable to assume that the vast majority of those would contain exclusively this resultset, and could therefore be released as soon as the resultset is garbage collected.
Furthermore, I can reproduce the increased memory usage by running the same unbounded ActiveRecord query in the Rails console, and when I close that console, the memory goes down almost immediately -- exactly what I was expecting to see when the web request completes. I don't fully understand how the Rails console works when connecting to a running application, though, so this may not be relevant.

Ruby: What can cause execution of same codeblock to slowdown over time when ran over and over again?

I have a background worker in my rails project that executes a lot of complicated data aggregation in-memory in ruby. I'm seeing a strange behavior. When I boot up a process for executing the jobs (thousands), I see a strange performance decrease over time. In the beginning a job completion takes around 300ms but after processing around 10.000 jobs the execution time will gradually have decreased to around 2000ms. This is a big problem for me and I'm puzzled about how this can possibly happen. I see no memory leaks (RAM usage is pretty stable), and I see no errors. What might cause this on a low level, and where should I start looking?
Background facts:
Among the things the job does, it does a lot of regexp comparisons of a lot of strings. There is no external database calls made except for read/write operations to a redis instance.
I have tried to execute the same on different servers/computers, and the symptoms are all the same.
If I restart the process when it starts to perform too bad, the performance turns good again immediately after.
I'm running ruby 1.9.3p194 and rails 3.2 and sidekiq 2.9.0 for job processor

It is difficult to tell from the limited description of your service, but the behaviour is consistent with a small (i.e. not leaky) cache of data that either has poor lookup performance, or that you are relying on very heavily, and that is growing at just a modest rate. A contrived example might be a list of "jobs done so far by this worker" which is being sorted on demand at a few points in the code.
One such cache is out of your direct control: Ruby's symbol table. Finding a Symbol is something like O(log(n)) on number of symbols in the system, which is good. But this could still impact you if you handle a lot of symbols, and each iteration of your worker can generate new symbols (for instance if keys in an input hash can be arbitrary data, and you use a symbolize_keys method or call to_sym on a lot of varying strings). Symbols are cached permanently in the Ruby process. In theory a few million would not show up as a memory leak. But if your code can go from say 10,000 symbols to 1,000,000 in total, all the symbol generating and checking code would slow down by a small fixed amount. If you are doing that a lot, it could potentially explain a few hundred ms.
If hunting through suspect code is getting you nowhere, your best bet to find the problem is to use a profiler. You should collect a profile of the code behaving well, and behaving badly, and compare the two.

Memory related errors

I mostly work on C language for my work. I have faced many issues and spent lot time in debugging issues related to dynamically allocated memory corruption/overwriting. Like malloc(A) A bytes but use write more than A bytes. Towards that i was trying to read few things when i read about :-
1.) An approach wherein one allocates more memory than what is needed. And write some known value/pattern in that extra locations. Then during program execution that pattern should be untouched, else it indicated memory corruption/overwriting. But how does this approach work. Does it mean for every write to that pointer which is allocated using malloc() i should be doing a memory read of the additional sentinel pattern and read for its sanity? That would make my whole program very slow.
And to say that we can remove these checks from the release version of the code, is also not fruitful as memory related issues can happen more in 'real scenario'. So can we handle this?
2.) I heard that there is something called HEAP WALKER, which enables programs to detect memory related issues? How can one enable this.
thank you.
-AD.

If you're working under Linux or OSX, have a look at Valgrind (free, available on OSX via Macports). For Windows, we're using Rational PurifyPlus (needs a license).
You can also have a look at Dmalloc or even at Paul Nettle's memory manager which helps tracking memory allocation related bugs.

If you're on Mac OS X, there's an awesome library called libgmalloc. libgmalloc places each memory allocation on a separate page. Any memory access/write beyond the page will immediately trigger a bus error. Note however that running your program with libgmalloc will likely result in a significant slowdown.

Memory guards can catch some heap corruption. It is slower (especially deallocations) but it's just for debug purposes and your release build would not include this.
Heap walking is platform specific, but not necessarily too useful. The simplest check is simply to wrap your allocations and log them to a file with the LINE and FILE information for your debug mode, and most any leaks will be apparent very quickly when you exit the program and numbers don't tally up.
Search google for LINE and I am sure lots of results will show up.

cooperative memory usage across threads?

I have an application that has multiple threads processing work from a todo queue. I have no influence over what gets into the queue and in what order (it is fed externally by the user). A single work item from the queue may take anywhere between a couple of seconds to several hours of runtime and should not be interrupted while processing. Also, a single work item may consume between a couple of megabytes to around 2GBs of memory. The memory consumption is my problem. I'm running as a 64bit process on a 8GB machine with 8 parallel threads. If each of them hits a worst case work item at the same time I run out of memory. I'm wondering about the best way to work around this.
plan conservatively and run 4 threads only. The worst case shouldn't be a problem anymore, but we waste a lot of parallelism, making the average case a lot slower.
make each thread check available memory (or rather total allocated memory by all threads) before starting with a new item. Only start when more than 2GB memory are left. Recheck periodically, hoping that other threads will finish their memory hogs and we may start eventually.
try to predict how much memory items from the queue will need (hard) and plan accordingly. We could reorder the queue (overriding user choice) or simply adjust the number of running worker threads.
more ideas?
I'm currently tending towards number 2 because it seems simple to implement and solve most cases. However, I'm still wondering what standard ways of handling situations like this exist? The operating system must do something very similar on a process level after all...
regards,
Sören

So your current worst-case memory usage is 16GB. With only 8GB of RAM, you'd be lucky to have 6 or 7GB left after the OS and system processes take their share. So on average you're already going to be thrashing memory on a moderately loaded system. How many cores does the machine have? Do you have 8 worker threads because it is an 8-core machine?
Basically you can either reduce memory consumption, or increase available memory. Your option 1, running only 4 threads, under-utilitises the CPU resources, which could halve your throughput - definitely sub-optimal.
Option 2 is possible, but risky. Memory management is very complex, and querying for available memory is no guarantee that you will be able to go ahead and allocate that amount (without causing paging). A burst of disk I/O could cause the system to increase the cache size, a background process could start up and swap in its working set, and any number of other factors. For these reasons, the smaller the available memory, the less you can rely on it. Also, over time memory fragmentation can cause problems too.
Option 3 is interesting, but could easily lead to under-loading the CPU. If you have a run of jobs that have high memory requirements, you could end up running only a few threads, and be in the same situation as option 1, where you are under-loading the cores.
So taking the "reduce consumption" strategy, do you actually need to have the entire data set in memory at once? Depending on the algorithm and the data access pattern (eg. random versus sequential) you could progressively load the data. More esoteric approaches might involve compression, depending on your data and the algorithm (but really, it's probably a waste of effort).
Then there's "increase available memory". In terms of price/performance, you should seriously consider simply purchasing more RAM. Sometimes, investing in more hardware is cheaper than the development time to achieve the same end result. For example, you could put in 32GB of RAM for a few hundred dollars, and this would immediately improve performance without adding any complexity to the solution. With the performance pressure off, you could profile the application to see just where you can make the software more efficient.

I have continued the discussion on Herb Sutter's blog and provoced some very helpful reader comments. Head over to Sutter's Mill if you are interested.
Thanks for all the suggestions so far!
Sören

Difficult to propose solutions without knowing exactly what you're doing, but how about considering:
See if your processing algorithm can access the data in smaller sections without loading the whole work item into memory.
Consider developing a service-based solution so that the work is carried out by another process (possibly a web service). This way you could scale the solution to run over multiple servers, perhaps using a load balancer to distribute the work.
Are you persisting the incoming work items to disk before processing them? If not, they probably should be anyway, particularly if it may be some time before the processor gets to them.
Is the memory usage proportional to the size of the incoming work item, or otherwise easy to calculate? Knowing this would help to decide how to schedule processing.
Hope that helps?!

How to Test for Memory Leaks?

We have an application with hundreds of possible user actions, and think about how enhancing memory leak testing.
Currently, here's the way it happens: When manually testing the software, if it appears that our application consumes too much memory, we use a memory tool, find the cause and fix it. It's a rather slow and not efficient process: the problems are discovered late and it relies on the good will of one developer.
How can we improve that?
Internally check that some actions (like "close file") do recover some memory and log it?
Assert on memory state inside our unit tests (but it seems this would be a tedious task) ?
Manually regularly check it from time to time?
Include that check each time a new user story is implemented?

Which language?
I'd use a tool such as Valgrind, try to fully exercise the program and see what it reports.

first line of defense:
check list with common memory
allocation related errors for
developers
coding guidelines
second line of defense:
code reviews
static code analyis (as a part of build process)
memory profiling tools
If you work with unmanaged language (like C/C++) you can efficiently discover most of the memory leaks by hijacking memory management functions. For example you can track all memory allocations/deallocations.

It seems to me that the core of the problem is not so much finding memory leaks as knowing when to test for them. You say you have lots of user actions, but you don't say what sequences of user actions are meaningful. If you can generate meaningful sequences at random, I'd argue hard for random testing. On random tests you would measure
Code coverage (with gcov or valgrind)
Memory usage (with valgrind)
Coverage of the user actions themselves
By "coverage of user actions" I mean statements like the following:
For every pair of actions A and B, if there is a meaningful sequence of actions in which A is immediately followed by B, then we have tested such a sequence.
If that's not true, then you can ask for what fraction of pairs A and B it is true.
If you have the CPU cycles to afford it, you would probably also benefit from running valgrind or another memory-checking tool either before every commit to your source-code repository or during a nightly build.
Automate!

In my company we have programmed an endless action path for our application. The java garbage collector should clean all unused maps and list and something like that. So we let the application start with the endless action path and look, whether the memory use size is growing.
The check which fields are not deleted you can use JProfiler for Java.

Replace new and delete with your custom versions and log every act of allocation/deallocation.
Speaking generally (not about testing, rather to fight the issue in its origin), smartpointers help to avoid this problem. Fortunately, C++11 standard provides new convenient smart pointer classes (shared_ptr, unique_ptr).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart