Memory reported in Resource Monitor not showing in UMDH

Memory reported in Resource Monitor not showing in UMDH - working-set

I have a service which intermittently starts gobbling up server memory over time and needs to be restarted to free it. I turned +ust with gflags, restarted the service, and started taking scheduled UMDH snapshots. When the problem reoccurred, resource manager reported multiple GB under Working set and Private bytes, but the UMDH snapshots account only for a few MB allocations in the process' heaps.
At the top of UMDH snapshot files, it mentions "Only allocations for which the heap manager collected a stack are dumped".
How can an allocation in a process be without a trace when +ust flags were specified?
How can I find out where/how these GBs were allocated?

UMDH is short for User Mode Dump Heap. The term Heap is a key term here: it refers to the C++ heap manager only. This means that all memory which is allocated by other means than the C++ heap manager is not tracked by UMDH.
This can be
direct calls to VirtualAlloc()
memory used by .NET, since .NET has its own heap manager
But even for C++, there is the case that allocations larger than 512 kB are not efficiently manageable by the C++ heap manager, so it just redirects it to VirtualAlloc() and does not create a heap segment of such large allocations.
How can I find out where/how these GBs were allocated?
For direct calls to VirtualAlloc(), the WinDbg command !address -summary may give an answer. For .NET, the SOS extension and the !dumpheap -stat can give an answer.

Related

JVM Memory segments allocation

Alright so I have a question regarding the Memory segments of a JVM,
I know every JVM would choose to implement this a little bit different yet it is an overall concept that should remain the same within all JVM's
A standart C / C++ program that does not use a virtual machine to execute during runtime has four memory segments during runtime,
The Code / Stack / Heap / Data
all of these memory segments are automatically allocated by the Operating System during runtime.
However, When a JVM executes a Java compiled program, during runtime it has 5 Memory segments
The Method area / Heap / Java Stacks / PC Registers / Native Stacks
My question is this, who allocates and manages those memory segments?
The operating system is NOT aware of a java program running and thinks it is a part of the JVM running as a regular program on the computer, JIT compilation, Java stacks usage, these operations require run-time memory allocation, And what I'm failing to understand Is how a JVM divides it's memory into those memory segments.
It is definitely not done by the Operating System, and those memory segments (for example the java stacks) must be contiguous in order to work, so if the JVM program would simply use a command such as malloc in order to receive the maximum size of heap memory and divide that memory into segments, we have no promise for contiguous memory, I would love it if someone could help me get this straight in my head, it's all mixed up...

When the JVM starts it has hundreds if not thousand of memory regions. For example, there is a stack for every thread as well as a thread state region. There is a memory mapping for every shared library and jar. Note: Java 64-bit doesn't use segments like a 16-bit application would.
who allocates and manages those memory segments?
All memory mappings/regions are allocated by the OS.
The operating system is NOT aware of a java program running and thinks it is a part of the JVM running as a regular program on the computer,
The JVM is running as a regular program however memory allocation uses the same mechanism as a normal program would. The only difference is that in Java object allocation is managed by the JVM, but this is the only regions which work this way.
JIT compilation, Java stacks usage,
JIT compilation occurs in a normal OS thread and each Java stack is a normal thread stack.
these operations require run-time memory allocation,
It does and for the most part it uses malloc and free and map and unmap
And what I'm failing to understand Is how a JVM divides it's memory into those memory segments
It doesn't. The heap is for Java Objects only. The maximum heap for example is NOT the maximum memory usage, only the maximum amount of objects you can have at once.
It is definitely not done by the Operating System, and those memory segments (for example the java stacks) must be contiguous in order to work
You are right that they need to be continuous in virtual memory but the OS does this. On Linux at least there is no segments used, only one 32-bit or 64-bit memory region.
so if the JVM program would simply use a command such as malloc in order to receive the maximum size of heap memory and divide that memory into segments,
The heap is divided either into generations or in G1 multiple memory chunks, but this is for object only.
we have no promise for contiguous memory
The garbage collectors either defragment memory by copying it around or take steps to try to reduce it to ensure there is enough continuous memory for any object you allocate.
would love it if someone could help me get this straight in my head, it's all mixed up...
In short, the JVM runs like any other program except when Java code runs it's object are allocated in a managed region of memory. All other memory regions act just as they would in a C program, because the JVM is a C/C++ program.

Paged memory vs Pinned memory in memory copy [duplicate]

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;
In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.

CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).
As described here http://forums.nvidia.com/index.php?showtopic=164661
host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.
I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:
The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().

CUDA use DMA to transfer pinned memory to GPU. Pageable host memory cannot be used with DMA because they may reside on the disk.
If the memory is not pinned (i.e. page-locked), it's first copied to a page-locked "staging" buffer and then copied to GPU through DMA.
So using the pinned memory you save the time to copy from pageable host memory to page-locked host memory.

If the memory pages had not been accessed yet, they were probably never swapped in to begin with. In particular, newly allocated pages will be virtual copies of the universal "zero page" and don't have a physical instantiation until they're written to. New maps of files on disk will likewise remain purely on disk until they're read or written.

A verbose note on copying non-locked pages to locked pages.
It could be extremely expensive if non-locked pages are swapped out by OS on a busy system with limited CPU RAM. Then page fault will be triggered to load pages into CPU RAM through expensive disk IO operations.
Pinning pages can also cause virtual memory thrashing on a system where CPU RAM is precious. If thrashing happens, the throughput of CPU can be degraded a lot.

How is disk memory being used/consumed by programs?

A dummy question:
Recently my disk ran out of memory:
I kept getting java.OutOfMemoryError, java heap space, later my Virtual Box encountered "Not Enough Free Space available on disk" error.
Then it turned out that my 256GB SSD had been almost all consumed/used.
So I was wondering how running the programs could consume my memory/disk usage?
How does this work?
I know the basics behind this, allocating space on a heap/stack, then deallocating them after use. (Correct me if I'm wrong.)
But if this is the case, then the disk should not be used up, right? (if I don't add anything else onto my desktop, only using it to run a definite number of programs)
I really wanted to understand how the disk/memory is being consumed/used by running programs.
If this question has been asked before, please relate it to that one.
I apologize for dummy question, but I believe it will be helpful to fellow programmers like me.
Thanks for making it clearer. Q1: Why do programs consume disk space? A2: How does "java.OutOfMemoryError, java heap space" occur? related to memory, is it?

Why do programs consume disk space?
I know the basics behind this, allocating space on a heap/stack, then deallocating them after use. But if this is the case, then the disk should not be used up, right?
In fact, it can be used up. Memory allocations can consume hard-disk space if the allocation in your process's virtual memory happens to be mapped to a pagefile on disk, and your pagefile size is set to be managed by the operating system.
If you want to know more about memory mapping there's a great question here:
Understanding Virtual Address, Virtual Memory and Paging
The page-file grow won't actually be a direct response to your allocation, more a response to the new current commit size being close to the reserved size. If you want to know more about this process (commit vs reserved, stack expansions, etc) I recommend reading Pushing the Limits of Windows: Physical Memory.
Why does java.OutOfMemoryError occur?
http://docs.oracle.com/javase/7/docs/api/java/lang/OutOfMemoryError.html
Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector.
Generally this happens because your pagefile is too small or your disk is too full.
See also:
How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size)
java.lang.OutOfMemoryError: Java heap space

what is All heap Allocations and All Anonymous Allocations in Xcode Instruments allocations?

I hava an application . when I repeat some action , anonymous allocations memory continuously increase a lot while heap allocations increase a little. can some one help me ? Thanks

Focus on the Live Bytes column for All Heap Allocations to see how much memory your application is using. You cannot control your application's Anonymous VM size.
Focus on the heap allocations because your app has more control over
heap allocations. Most of the memory allocations your app makes are
heap allocations.
The VM in anonymous VM stands for virtual memory.
When your app launches, the operating system reserves a block of
virtual memory for your application. This block is usually much larger
than the amount of memory your app needs. When your app allocates
memory, the operating system allocates the memory from the block it
reserved.
Remember the second sentence in the previous paragraph. The operating
system determines the size of the virtual memory block, not your app.
That’s why you should focus on the heap allocations instead of
anonymous VM. Your app has no control over the size of the anonymous
VM.
Source: http://meandmark.com/blog/2014/01/instruments-heap-allocations-and-anonymous-vm/

How much memory your program takes? (FastMM vs Borland MM)

I have seen recently a strange behavior in my program. After creating large amounts of objects (500MB of RAM) then releasing them, the program's memory footprint does not return to its original size. It still shows a footprint of 160MB (Private working set).
Normal behavior?
Borland's memory manager does not behave like this, so if possible please confirm (or infirm) this is a normal behavior for FastMM: If you have a handy program in which you create a rather complex MDI child (containing several controls/objects), can you create in a loop 250 instances of that MDI child in memory (at the same time) then release them all and check the memory footprint. Please make sure that you consume at least 200-300MB or RAM with those MDI childs.
Especially those that still using Delphi 7 can see the difference by temporary disabling FastMM.
Thanks
If anybody is interested, especially if you want some proof this is not a memory leak (I hope it is not a mem leak in my code - this is also one of the points of this post: to check if it is my fault), here are the original discussions:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory

Dear Altar, I'm dazzled at how off the point you are in your guesses and how you don't listen to what people told you many times before.
Let's set some things straight. Memory management 101. Please read thoroughly.
When you allocate memory in Delphi, there are two memory managers involved.
System memory manager
First one is a system memory manager. This one is built into Windows and it gives memory in 4kb sized pages.
But it doesn't always give you memory in RAM (or physical memory). Your data can be kept on the hard drive, and read back every time you need to access it. This is awfully slow.
In other words, imagine you have 512Mb of physical memory. You run two programs, each requesting 1Gb of memory. What does OS do?
It grants both requests. Both apps get 1Gb of memory each. Both think all the memory is "in memory". But in fact, only 512Mb can be kept in RAM. The rest is stored in page file, although your app does not know that. It just works slow.
Working set size
Now, what is a "working set size" you are measuring?
It's the part of the allocated memory that is kept in RAM.
If you have an application which allocates 1Gb of memory, and you only have 512 Mb of RAM, then it's working set size will be 512Mb. Although it "uses" 1Gb of memory!
When you run another application which needs memory, OS will automatically free some RAM by moving rarely used blocks of "memory" to the hard drive.
Your virtual memory allocation will stay the same, but more pages will be on the hard drive and less in RAM. Working set size will decrease.
From this, you should have understood by this point, that it's pointless to try and minimize the working set size. You're achieving nothing. You're not freeing memory in any sense. You're just offloading the data to the hard drive.
But the system will do that automatically when it needs to. And there's no point making room in RAM until it's needed. You're just slowing down your application, that's all.
TLDR: "Working set size" is not "how much memory application uses". It's "how much is ready right now". Don't try to minimize it, you're just making things worse.
Delphi memory manager
OS gives you virtual memory in pages of 4Kb. But often you need it in much smaller chunks. For instance, 4 bytes for your integer, or 32 bytes for some structure. The solution?
Application memory manager, such as FastMM or BorlandMM or others.
It's job is to allocate memory in pages from the operating system, then give you small chunks of those pages when you need it.
In other words, when you ask for 14 bytes of memory, this is what happens:
You ask FastMM for 14 bytes of memory.
FastMM asks OS for 1 page of memory (4096 bytes).
OS grants one page of memory, backing it up with RAM (it's stored in actual RAM).
FastMM saves that page, cuts 14 bytes of it and gives to you.
When you ask for another 14 bytes, FastMM just cuts another 14 bytes from the same page.
What happens when you release memory? The same thing backwards:
You release 14 bytes to FastMM. Nothing happens.
You release another 14 bytes. FastMM sees that the 4096 byte page it allocated is now completely unused.
Therefore it releases the page, returning it to the system.
It's worth noting that FastMM cannot release just 14 bytes to the system. It has to release memory in pages. Until the whole page is free, FastMM cannot do a thing. Nobody can.
So, why is my working set size so big, even though I released everything?
First, your working set size is not what you should be measuring. Virtual memory consumption is. But if you have big working set size, your virtual memory consumption will be high too.
What's the problem? You should be able to figure out by this point.
Let's say you allocate 1kb, then 3kb of memory. How much virtual memory have you allocated? 4kb, 1 page.
Now you release 3Kb. How much virtual memory do you use now? 1Kb? No, it's still 1 page. You cannot allocate less than 1 page from the system. You're still using 4096 bytes of virtual memory.
Imagine if you do that 1000 times. 1kb, 3kb, 1kb, 3kb, 1kb, 3kb and so on. You allocate 1000 * 4kb = 4 mb like that, and then you release all the 3kb parts. How much virtual memory do you use now?
Still 4 mb. Because you allocated 1000 pages at first. Of every page you took 1kb and 3kb chunks. Even if you release 3kb chunks, 1kb chunks will continue to keep every single page you allocated in memory. And every page takes 4kb of virtual memory.
Memory manager cannot magically "move" all of your 1kb chunks together. This is impossible, because their virtual addresses can be referenced from somewhere in code. It's not a trait of FastMM.
But why with BorlandMM everything works better?
Coincidence. Maybe it just so happens that BorlandMM gives you memory in a slightly different way than FastMM does. Next thing you know, you change something in your app and BorlandMM acts just like FastMM did. It's impossible for a memory manager to completely prevent this effect, called memory fragmentation.
So what do I do?
Short answer is, not much until this bothers you.
You see, with modern operating systems, you're not really eating anyone's RAM. Per above, OS will automatically swap your pages out when it needs RAM for other applications. This should not be a concern.
And the "excessive" memory isn't lost. Although pages are allocated, 3kb of each is marked as "free". Next time your app needs memory, memory manager will use that space.
But if you really want to help it, you should reorganize your allocations so that the ones you're planning on keeping are done first, and the ones you will soon release are all allocated after that.
Like this: 1kb, 1kb, 1kb, ..., 3kb, 3kb, 3kb...
If you now release all the 3kb chunks, your virtual memory consumption will drop significantly.
This is not always possible. If it's impossible, then just do nothing. It's more or less alright like it is.
And P.S.
You shouldn't be allocating 500 forms in the first place. This is clearly not a way to go. Fix this, and you won't even have a need to think about memory allocation and releasing.
I hope this clears things up, because four posts on the same topic, frankly, is a bit too much.

IIRC, the Delphi memory manager does not immediately return free'd memory to the OS.
Memory is allocated in chunks of small, medium and large sizes, called blocks.
These blocks are kept for a while after their contents have been disposed to have them readyly available when another allocation is requested afterwards.
This limits the amount of system calls required for succesive allocation of multiple objects, and helps avoiding heap fragmentation.

Infirming: Delphi 2007, default memory manager (should be FastMM variation). Several tests on heavy objects:
Initial memory 2Mb, peak memory 30Mb, final memory 4Mb.
Initial memory 2Mb, peak memory 1Gb, final memory 5.5Mb.

What are the heapmanager stats (GetHeapStatus) on the point that 160MB is still allocated?

SOLVED
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I created a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
Problem solved. Special thanks to Barry Kelly, the only person that pointed to the real "problem".

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart