"Mem Usage" higher than "VM Size" in WinXP Task Manager - memory

In my Windows XP Task Manager, some processes display a higher value in the Mem Usage column than the VMSize. My Firefox instance, for example shows 111544 K as mem usage and 100576 K as VMSize.
According to the help file of Task Manager Mem Usage is the working set of the process and VMSize is the committed memory in the Virtual address space.
My question is, if the number of committed pages for a process is A and the number of pages in physical memory for the same process is B, shouldn't it always be B ≤ A? Isn't the number of pages in physical memory per process a subset of the committed pages?
Or is this something to do with sharing of memory among processes? Please explain. (Perhaps my definition of 'Working Set' is off the mark).
Thanks.

Virtual Memory
Assume that your program (eg Oracle) allocated 100 MB of memory upon startup - your VM size goes up by 100 MB though no additional physical / disk pages are touched. ie VM is nothing but memory book keeping.
The total available physical memory + paging file memory is the maximum memory that ALL the processes in the system can allocate. The system does this so that it can ensure that at any point time if the processes actually start consuming all that memory it allocated the OS can supply the actual physical pages required.
Private Memory
If the program copies 10 MB of data into that 100 MB, OS senses that no pages have been allocated to the process corresponding to those addresses and assigns 10 MB worth of physical pages into your process's private memory. (This process is called page fault)
Working Set
Definition : Working set is the set of memory pages that have been recently touched by a program.
At this point these 10 pages are added to the working set of the process. If the process then goes and copies this data into another 10 MB cache previously allocated, everything else remains the same but the Working Set goes up again by 10 Mb if those old pages where not in the working set. But if those pages where already in the working set, then everything is good and the programs working set remains the same.
Working Set behaviour
Imagine your process never touches the first 10 pages ever again, in which case these pages are trimmed off from your process's working set and possibly sent to the page file so that the OS can bring in other pages that are more frequently used. However if there are no urgent low memory requirements, then this act of paging need not be done and OS can act as if its rich in memory. In this case the working set simply lets these pages remain.
When is Working Set > Virtual Memory
Now imagine the same program de-allocates all the 100 Mb of memory. The programs VM size is immediately reduced by 100 MB (remember VM = book keeping of all memory allocation requests)
The working set need not be affected by this, since that doesn't change the fact that those 10 Mb worth of pages where recently touched. Therefore those pages still remain in the working set of the process though the OS can reclaim them whenever it requires.
This would effectively make the VM < working set. However this will rectify if you start another process that consumes more memory and the working set pages are reclaimed by the OS.

XP's Task Manager is simply wrong. EDIT: If you don't believe me (and someone doesn't, because they voted this down), read Firefox 3 Memory Usage. I quote:
If you’re looking at Memory Usage
under Windows XP, your numbers aren’t
going to be so great. The reason:
Microsoft changed the meaning of
“private bytes” between XP and Vista
(for the better).
Sounds like MS got confused. You only change something like that if it's broken.
Try Process Explorer instead. What Task Manager labels "VM Size", Process Explorer (more correctly) labels "Private Bytes". And in Process Explorer, Working Set (and Private Bytes) are always less than or equal to Virtual Size, as you would expect.

File mapping
Very common way how Mem Usage can be higher than VM Size is by using file mapping objects (hence it can be related to shared memory, as file mapping is used to share memory). With file mapping you can have a memory which is committed (either in page file or in physical memory, you do not know), but has no virtual address assigned to it. The committed memory appears in Mem Usage, while used virtual addresses usage is tracked by VM Size.
See also:
What does “VM Size” mean in the Windows Task Manager? on Stackoverflow
Breaking the 32 bit Barrier in my developer blog
Usenet discussion Still confused why working set larger than virtual memory

Memory usage is the amount of electronic memory currently allocated to the process.
VM Size is the amount of virtual memory currently allocated to the process.
so ...
A page that exists only electronically will increase only Memory Usage.
A page that exists only on disk will increase only VM Size.
A page that exists both in memory and on disk will increase both.
Some examples to illustrate:
Currently on my machine, iexplore has 16,000K Memory Usage and 194,916 VM Size. This means that most of the memory used by Internet Explorer is idle and has been swapped out to disk, and only a fraction is being kept in main memory.
Contrast with mcshield.exe with has 98,984K memory usage and 98,168K VM Size. My conclusion here is that McAfee AntiVirus is active, with at lot of memory in use. Since it's been running for quite some time (all day, since booting), I expect that most of the 98,168K VM Size is copies of the electronic memory - though there's nothing in Task Manager to confirm this.

You might find some explaination in The Memory Shell Game
Working Set (A) – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM. These pages fully belong to the process. A working set is like a "currently/recently working on these pages" list.
Virtual Memory – This is a memory that an operating system can address. Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture.
Committed Memory – When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page. It is now backed by a physical memory page. This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk. The memory manager handles the translations from the virtual memory page to the physical page. A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.
BUT: PF (Page File) Usage - This is the total number of committed pages on the system. It does not tell you how many are actually written to the page file. It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.
Hence B > A...
If we agree that B represents "mem usage" or also PF usage, the problem comes from the fact it actually represents potential page usages: in Xp, this potential file space can be used as a place to assign those virtual memory pages that programs have asked for, but never brought into use...

Memory fragmentation is probably the reason:
If the process allocates 1 octet, it counts for 1 octet in the VMSize, but this 1 octet requires a physical page (4K on windows operating system).
If after allocating/freeing memory, the process has a second octet that is separated by more than 4K from the first one, this second octet will always be stored on a separate physical page than the 1 one.
So the VM Size count is 2 octets but the Memory Usage is 2 pages== 8K
So the fact that MemUsage is greater than VMSize shows that process does a lot of allocation and deallocation and fragments the memory.
This could be because the process is started a long time ago.
Or else there is place for optimization ;-)

Related

Paged memory vs Pinned memory in memory copy [duplicate]

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;
In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.
CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).
As described here http://forums.nvidia.com/index.php?showtopic=164661
host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.
I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:
The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().
CUDA use DMA to transfer pinned memory to GPU. Pageable host memory cannot be used with DMA because they may reside on the disk.
If the memory is not pinned (i.e. page-locked), it's first copied to a page-locked "staging" buffer and then copied to GPU through DMA.
So using the pinned memory you save the time to copy from pageable host memory to page-locked host memory.
If the memory pages had not been accessed yet, they were probably never swapped in to begin with. In particular, newly allocated pages will be virtual copies of the universal "zero page" and don't have a physical instantiation until they're written to. New maps of files on disk will likewise remain purely on disk until they're read or written.
A verbose note on copying non-locked pages to locked pages.
It could be extremely expensive if non-locked pages are swapped out by OS on a busy system with limited CPU RAM. Then page fault will be triggered to load pages into CPU RAM through expensive disk IO operations.
Pinning pages can also cause virtual memory thrashing on a system where CPU RAM is precious. If thrashing happens, the throughput of CPU can be degraded a lot.

How much memory your program takes? (FastMM vs Borland MM)

I have seen recently a strange behavior in my program. After creating large amounts of objects (500MB of RAM) then releasing them, the program's memory footprint does not return to its original size. It still shows a footprint of 160MB (Private working set).
Normal behavior?
Borland's memory manager does not behave like this, so if possible please confirm (or infirm) this is a normal behavior for FastMM: If you have a handy program in which you create a rather complex MDI child (containing several controls/objects), can you create in a loop 250 instances of that MDI child in memory (at the same time) then release them all and check the memory footprint. Please make sure that you consume at least 200-300MB or RAM with those MDI childs.
Especially those that still using Delphi 7 can see the difference by temporary disabling FastMM.
Thanks
If anybody is interested, especially if you want some proof this is not a memory leak (I hope it is not a mem leak in my code - this is also one of the points of this post: to check if it is my fault), here are the original discussions:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory
Dear Altar, I'm dazzled at how off the point you are in your guesses and how you don't listen to what people told you many times before.
Let's set some things straight. Memory management 101. Please read thoroughly.
When you allocate memory in Delphi, there are two memory managers involved.
System memory manager
First one is a system memory manager. This one is built into Windows and it gives memory in 4kb sized pages.
But it doesn't always give you memory in RAM (or physical memory). Your data can be kept on the hard drive, and read back every time you need to access it. This is awfully slow.
In other words, imagine you have 512Mb of physical memory. You run two programs, each requesting 1Gb of memory. What does OS do?
It grants both requests. Both apps get 1Gb of memory each. Both think all the memory is "in memory". But in fact, only 512Mb can be kept in RAM. The rest is stored in page file, although your app does not know that. It just works slow.
Working set size
Now, what is a "working set size" you are measuring?
It's the part of the allocated memory that is kept in RAM.
If you have an application which allocates 1Gb of memory, and you only have 512 Mb of RAM, then it's working set size will be 512Mb. Although it "uses" 1Gb of memory!
When you run another application which needs memory, OS will automatically free some RAM by moving rarely used blocks of "memory" to the hard drive.
Your virtual memory allocation will stay the same, but more pages will be on the hard drive and less in RAM. Working set size will decrease.
From this, you should have understood by this point, that it's pointless to try and minimize the working set size. You're achieving nothing. You're not freeing memory in any sense. You're just offloading the data to the hard drive.
But the system will do that automatically when it needs to. And there's no point making room in RAM until it's needed. You're just slowing down your application, that's all.
TLDR: "Working set size" is not "how much memory application uses". It's "how much is ready right now". Don't try to minimize it, you're just making things worse.
Delphi memory manager
OS gives you virtual memory in pages of 4Kb. But often you need it in much smaller chunks. For instance, 4 bytes for your integer, or 32 bytes for some structure. The solution?
Application memory manager, such as FastMM or BorlandMM or others.
It's job is to allocate memory in pages from the operating system, then give you small chunks of those pages when you need it.
In other words, when you ask for 14 bytes of memory, this is what happens:
You ask FastMM for 14 bytes of memory.
FastMM asks OS for 1 page of memory (4096 bytes).
OS grants one page of memory, backing it up with RAM (it's stored in actual RAM).
FastMM saves that page, cuts 14 bytes of it and gives to you.
When you ask for another 14 bytes, FastMM just cuts another 14 bytes from the same page.
What happens when you release memory? The same thing backwards:
You release 14 bytes to FastMM. Nothing happens.
You release another 14 bytes. FastMM sees that the 4096 byte page it allocated is now completely unused.
Therefore it releases the page, returning it to the system.
It's worth noting that FastMM cannot release just 14 bytes to the system. It has to release memory in pages. Until the whole page is free, FastMM cannot do a thing. Nobody can.
So, why is my working set size so big, even though I released everything?
First, your working set size is not what you should be measuring. Virtual memory consumption is. But if you have big working set size, your virtual memory consumption will be high too.
What's the problem? You should be able to figure out by this point.
Let's say you allocate 1kb, then 3kb of memory. How much virtual memory have you allocated? 4kb, 1 page.
Now you release 3Kb. How much virtual memory do you use now? 1Kb? No, it's still 1 page. You cannot allocate less than 1 page from the system. You're still using 4096 bytes of virtual memory.
Imagine if you do that 1000 times. 1kb, 3kb, 1kb, 3kb, 1kb, 3kb and so on. You allocate 1000 * 4kb = 4 mb like that, and then you release all the 3kb parts. How much virtual memory do you use now?
Still 4 mb. Because you allocated 1000 pages at first. Of every page you took 1kb and 3kb chunks. Even if you release 3kb chunks, 1kb chunks will continue to keep every single page you allocated in memory. And every page takes 4kb of virtual memory.
Memory manager cannot magically "move" all of your 1kb chunks together. This is impossible, because their virtual addresses can be referenced from somewhere in code. It's not a trait of FastMM.
But why with BorlandMM everything works better?
Coincidence. Maybe it just so happens that BorlandMM gives you memory in a slightly different way than FastMM does. Next thing you know, you change something in your app and BorlandMM acts just like FastMM did. It's impossible for a memory manager to completely prevent this effect, called memory fragmentation.
So what do I do?
Short answer is, not much until this bothers you.
You see, with modern operating systems, you're not really eating anyone's RAM. Per above, OS will automatically swap your pages out when it needs RAM for other applications. This should not be a concern.
And the "excessive" memory isn't lost. Although pages are allocated, 3kb of each is marked as "free". Next time your app needs memory, memory manager will use that space.
But if you really want to help it, you should reorganize your allocations so that the ones you're planning on keeping are done first, and the ones you will soon release are all allocated after that.
Like this: 1kb, 1kb, 1kb, ..., 3kb, 3kb, 3kb...
If you now release all the 3kb chunks, your virtual memory consumption will drop significantly.
This is not always possible. If it's impossible, then just do nothing. It's more or less alright like it is.
And P.S.
You shouldn't be allocating 500 forms in the first place. This is clearly not a way to go. Fix this, and you won't even have a need to think about memory allocation and releasing.
I hope this clears things up, because four posts on the same topic, frankly, is a bit too much.
IIRC, the Delphi memory manager does not immediately return free'd memory to the OS.
Memory is allocated in chunks of small, medium and large sizes, called blocks.
These blocks are kept for a while after their contents have been disposed to have them readyly available when another allocation is requested afterwards.
This limits the amount of system calls required for succesive allocation of multiple objects, and helps avoiding heap fragmentation.
Infirming: Delphi 2007, default memory manager (should be FastMM variation). Several tests on heavy objects:
Initial memory 2Mb, peak memory 30Mb, final memory 4Mb.
Initial memory 2Mb, peak memory 1Gb, final memory 5.5Mb.
What are the heapmanager stats (GetHeapStatus) on the point that 160MB is still allocated?
SOLVED
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I created a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
Problem solved. Special thanks to Barry Kelly, the only person that pointed to the real "problem".

40 million page faults. How to fix this?

I have an application that loads 170 files (let’s say they are text files) from disk in individual objects and kept in memory all the time. The memory is allocated once when I load those files from disk. So, there is no memory fragmentation involved. I also use FastMM to make sure my applications never leaks memory.
The application compares all these files with each other to find similarities. Over-simplified we can say that we compare text strings but the algorithm is way more complex as I have to allow some differences between strings. Each file is about 300KB. Loaded in memory (the object that holds it) it takes about 0.4MB of RAM. So, the running app takes about 60MB or RAM (working set). It processes the data for about 15 minutes. The thing is that it generates over 40 million page faults.
Why? I have about 2GB of free RAM. From what I know Page Faults are slow. How much they are slowing down my program?
How can I optimize the program to reduce these page faults? I guess it has something to do with data locality. Does anybody know some example algorithms for this (Delphi)?
Update:
But looking at the number of page faults (no other application in Task Manager comes close to mine, not even by far) I guess that I could increase the speed of my application IF I manage to optimize memory layout (reduce the page faults).
Delphi 7, Win 7 32 bit, RAM 4GB (3GB visible, 2GB free).
Caveat - I'm only addressing the page faulting issue.
I cannot be sure but have you considered using Memory Mapped files? In this way windows will use the files themselves as the paging file (rather than the main paging file pagrefile.sys). If the files are read only then the number of page faults should theoretically decrease as the pages won't need to written out to disk via the paging file as windows will just load the data from the file itself as needed.
Now to reduce files from paging in and out you need to try and go through the data in one direction so that as new data is read, older pages can be discarded for ever. Here is where you trade off going over the files again and caching data - the cache has to be stored somewhere.
Note that Memory Mapped files is how windows loads .dlls and .exes amongst other things. I've used them to scan though gigabyte files without hitting memory limits (we had MBs in those days and not GBs of ram).
However from the data you describe I'd suggest the ability to not go back ovver files will reduce the amount of repaging going on.
On my machine most pagefaults are reported for developer studio which is reported to have 4M page faults after 30+ minutes total CPU time. You get 10 times more, in half the time. And memory is scarce on my system. So 40M faults seems like a lot.
It could just maybe be you have a memory leak.
the working set is only the physical memory in use for your application. If you leak memory, and don't touch it, it will get paged out. You will see the virtual memory useage (or page file use) increase. These pages might be swapped back in when the heap memory walks the heap, to get swapped out again by windows.
Because you have a lot of RAM, the swapped out pages will stay in physical memory, as nobody else needs them. (a page recovered from RAM counts as a soft fault, from disk as a hard one)
Do you use an exponential resize system ?
If you grow the block of memory in too small increments while loading, it might constantly request large blocks from the system, copy the data over, and then release the old block (assuming that fastmm (de)allocates very large blocks directly from the OS).
Maybe somehow this causes a loop where the OS releases memory from your app's process, and then adds it again, causing page faults on first write.
Also avoid Tstringlist.load* methods for very large files, IIRC these consume twice the space needed.

Virtual memory size

I have virtual memory size set to 756 MB on windows xp. but when reading on msdn it says virtual memory for each process on 32 bit OS is 4 GB by default. how it is different from the size of virtual memory that i set?
**Memory** **range** **Usage**
Low 2GB (0x00000000 through 0x7FFFFFFF) Used by the process.
High 2GB (0x80000000 through 0xFFFFFFFF) Used by the system.
also, how is the range is same for each process?
Your page file is set to 756 Mb. The page file is like extra RAM, but backed by the disk.
Virtual Memory, however, is different, and kind of complex.
Every process gets an address space of 4 Gb. This is the range of a 32-bit pointer, so that' works out nicely. Half of that is reserved for the Kernel (Operating System), and is the same in every process. The other half is for the process itself, and is unique to that process.
The operating system allocates "pages" to the private portion of memory as the process asks for it. The pages get a slot in the process's address space which has nothing at all to do with where they are in physical RAM. In fact, they may not even be in RAM if they're not currently being used. The operating system will "swap" pages out to the page file if it wants some physical RAM for something else.
An important thing to remember is that address 0x10000 in your process is totally different from 0x10000 in another process.
Fortunately, the operating system juggles all this around so you don't have to.
This is way too big a subject to cover adequately in an answer here. You almost certainly need to read a book (I recommend Jeffrey Richter's books for this kind of subject matter).
The 4 Gb is about address space. The 756 Mb is about backing store.
Quite a few things (especially the contents of executable files) use address space without using backing storage. When you execute a program, the executable file for that program (and all the DLLs it uses) are mapped to address space. Then, on a page-by-page basis, pieces of that executable are brought into physical memory as needed.
The 756 Mb is extra storage to "extend" the RAM space -- but this is normally used only for data, not code; code is already stored in the executable file, so the system reads the data directly from the executable file when it's needed. The 756 Mb is used primarily for data you've created or modified as the computer is running (though the definition of "modified/created" can be fuzzy -- for example, the contents of a web page you've loaded would be included because you caused it to come into memory, even though you didn't create it or change it at all).
The virtual memory setting in windows only affect the size of the virtual memory paging file, not the total size of virtual memory allocated to processes.

What will happen if a application is large enough to be loaded into the available RAM memory?

There is chance were a heavy weight application that needs to be launched in a low configuration system.. (Especially when the system has too less memory)
Also when we have already opened lot of application in the system & we keep on trying opening new new application what would happen?
I have only seen applications taking time to process or hangs up for sometime when I try operating with it in low config. system with low memory and old processors..
How it is able to accomodate many applications when the memory is low..? (like 128 MB or lesser..)
Does it involves any paging or something else..?
Can someone please let me know the theory behind this..!
"Heavyweight" is a very vague term. When the OS loads your program, the EXE is mapped in your address space, but only the code pages that run (or data pages that are referenced) are paged in as necessary.
You will likely get horrible performance if pages need to constantly be swapped as the program runs (aka many hard page faults), but it should work.
Since your commit charge is near the commit limit, and the commit limit will likely have no room to grow, you will also likely recieve many malloc()/VirtualAlloc(..., MEM_COMMIT)/HeapAlloc()/{Local|Global}Alloc() failures so you need to watch the return codes in your program.
Some keywords for search engines are: paging, swapping, virtual memory.
Wikipedia has an article called Paging (Redirected from Swap space).
There is often the use of virtual memory. Virtual memory pages are mapped to physical memory if they are used. If a physical page is needed and no page is available, another is written to disk. This is called swapping and that explains why crowded systems get slow and memory upgrades have positive effects on performance.

Resources