Paged memory vs Pinned memory in memory copy

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;
In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.

CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).
host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.
The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().

CUDA use DMA to transfer pinned memory to GPU. Pageable host memory cannot be used with DMA because they may reside on the disk.
If the memory is not pinned (i.e. page-locked), it's first copied to a page-locked "staging" buffer and then copied to GPU through DMA.
So using the pinned memory you save the time to copy from pageable host memory to page-locked host memory.

If the memory pages had not been accessed yet, they were probably never swapped in to begin with. In particular, newly allocated pages will be virtual copies of the universal "zero page" and don't have a physical instantiation until they're written to. New maps of files on disk will likewise remain purely on disk until they're read or written.

A verbose note on copying non-locked pages to locked pages.
It could be extremely expensive if non-locked pages are swapped out by OS on a busy system with limited CPU RAM. Then page fault will be triggered to load pages into CPU RAM through expensive disk IO operations.
Pinning pages can also cause virtual memory thrashing on a system where CPU RAM is precious. If thrashing happens, the throughput of CPU can be degraded a lot.


why cannot access to contiguous memory addresses in physical memory

According to Microsoft documentation in the following link :
A program can use a contiguous range of virtual addresses to access a
large memory buffer that is not contiguous in physical memory.
So there's a question,that why in physical memory cannot have contiguous memory for a process?
Also there's another question due to the documentation, the following picture which demonstrates virtual memory for user and system space:
The system virtual address space is unique in the whole of the memory but there's a virtual address space for each process ?
At first when a process is loaded into memory, the OS can optimize to load process pages contiguously to physical memory.The process pages in memory cant always be contiguous due to swapping in and out, because there are other processes and things in memory that occupy space,so if later when some process pages becomes less used it is swapped back to hard drive, and when it is needed again it is not guaranteed to be loaded to the same spot before swapping out because there can be another process page laying there. You should read about virtual memory to gain good understanding of all of this.
You'r Questionn is simple!you have asked why we can have large memory buffer in virtual memory but not in physical one! thats because we are limited to the hardware!if we were able to access as much as buffer we want on our physical memory,industries had to make like 1024GB memories for our satisfaction! but we are using 8GB memory and we are satisfy...!virtual memories exist to satisfy our needs and make hardwares much more efficient!
Why do we need virtual memory?

So my understanding is that every process has its own virtual memory space ranging from 0x0 to 0xFF....F. These virtual addresses correspond to addresses in physical memory (RAM). Why is this level of abstraction helpful? Why not just use the direct addresses?
I understand why paging is beneficial, but not virtual memory.
There are many reasons to do this:
If you have a compiled binary, each function has a fixed address in memory and the assembly instructions to call functions have that address hardcoded. If virtual memory didn't exist, two programs couldn't be loaded into memory and run at the same time, because they'd potentially need to have different functions at the same physical address.
If two or more programs are running at the same time (or are being context-switched between) and use direct addresses, a memory error in one program (for example, reading a bad pointer) could destroy memory being used by the other process, taking down multiple programs due to a single crash.
On a similar note, there's a security issue where a process could read sensitive data in another program by guessing what physical address it would be located at and just reading it directly.
If you try to combat the two above issues by paging out all the memory for one process when switching to a second process, you incur a massive performance hit because you might have to page out all of memory.
Depending on the hardware, some memory addresses might be reserved for physical devices (for example, video RAM, external devices, etc.) If programs are compiled without knowing that those addresses are significant, they might physically break plugged-in devices by reading and writing to their memory. Worse, if that memory is read-only or write-only, the program might write bits to an address expecting them to stay there and then read back different values.
Short answer: Program code and data required for execution of a process must reside in main memory to be executed, but main memory may not be large enough to accommodate the needs of an entire process.
Two proposals
(1) Using a very large main memory to alleviate any need for storage allocation: it's not feasible due to very high cost.
(2) Virtual memory: It allows processes that may not be entirely in the memory to execute by means of automatic storage allocation upon request. The term virtual memory refers to the abstraction of separating LOGICAL memory--memory as seen by the process--from PHYSICAL memory--memory as seen by the processor. Because of this separation, the programmer needs to be aware of only the logical memory space while the operating system maintains two or more levels of physical memory space.
Early computer programmers divided programs into sections that were transferred into main memory for a period of processing time. As higher level languages became popular, the efficiency of complex programs suffered from poor overlay systems. The problem of storage allocation became more complex.
Two theories for solving the problem of inefficient memory management emerged -- static and dynamic allocation. Static allocation assumes that the availability of memory resources and the memory reference string of a program can be predicted. Dynamic allocation relies on memory usage increasing and decreasing with actual program needs, not on predicting memory needs.
Program objectives and machine advancements in the '60s made the predictions required for static allocation difficult, if not impossible. Therefore, the dynamic allocation solution was generally accepted, but opinions about implementation were still divided.
One group believed the programmer should continue to be responsible for storage allocation, which would be accomplished by system calls to allocate or deallocate memory. The second group supported automatic storage allocation performed by the operating system, because of increasing complexity of storage allocation and emerging importance of multiprogramming.
In 1961, two groups proposed a one-level memory store. One proposal called for a very large main memory to alleviate any need for storage allocation. This solution was not possible due to very high cost. The second proposal is known as virtual memory.
To execute a process its data is needed in the main memory (RAM). This might not be possible if the process is large.
Virtual memory provides an idealized abstraction of the physical memory which creates the illusion of a larger virtual memory than the physical memory.
Virtual memory combines active RAM and inactive memory on disk to form
a large range of virtual contiguous addresses. implementations usually require hardware support, typically in the form of a memory management
unit built into the CPU.
The main purpose of virtual memory is multi-tasking and running large programmes. It would be great to use physical memory, because it would be a lot faster, but RAM memory is a lot more expensive than ROM.
Use of Virtual Memory

What happens if a page is present in Virtual Memory, but not in main memory?
How is it executed?
Is the program loaded into the Main Memory from the virtual Memory? If it is loaded to Main Memory from Virtual Memory, that that would be an IO operation since it is on disk.Then what is the use of Virtual Memory , if anyways we have to make an IO operation to execute it.
And when use program generates logical address , and MMU maps it to physical address , and if that address is not present in Main Memory , then does OS check in Virtual Memory??
Let me start by saying that this is a very simplified explanation, not the definite guide to virtual memory;
Virtual memory basically gives your process the illusion that it's the only thing running in the memory space of the computer. When the process accesses a virtual memory page, the MMU translates it into a physical memory access. If the physical memory page does not yet exist (or isn't in physical memory), the process is suspended and the operating system is notified and can add the page to memory (for example by fetching it from disk) before resuming the process again.
One reason for virtual memory is that the process doesn't have to worry too much how much memory it uses and doesn't have to change if you for example expand physical memory on the machine, it can just work as if it had all the memory it can address and have the operating system solve how the actual memory is used.
The reason it doesn't (usually) slow the computer to a crawl is that many processes don't use big parts of their memory at all times, if a memory page isn't accessed in an hour, the physical memory can be put to much better use during that hour than to be kept active. Of course, the more memory your processes actively use continuously, the slower your process will appear to run.

Virtual memory without secondary storage

Can you have virtual memory without a secondary storage ( hard disk ) ?
In a pure sense, yes you can: Virtual Memory
What makes memory virtual is the fact that all memory accesses by the process are intercepted at the CPU level and a hardware Memory Management Unit is used to manage a mapping of the process address space onto the physical memory, no matter where that storage is presently really located.
You can have computing systems with virtual memory that have no backing storage (which is what people call it when you can move pages of memory out to disk for later retrieval).
In this case, the virtual memory system is used to allow the OS to intercept and prevent illegal memory references, but not in order to increase the working-set size of processes beyond the amount of installed physical memory.

"Mem Usage" higher than "VM Size" in WinXP Task Manager

In my Windows XP Task Manager, some processes display a higher value in the Mem Usage column than the VMSize. My Firefox instance, for example shows 111544 K as mem usage and 100576 K as VMSize.
According to the help file of Task Manager Mem Usage is the working set of the process and VMSize is the committed memory in the Virtual address space.
My question is, if the number of committed pages for a process is A and the number of pages in physical memory for the same process is B, shouldn't it always be B ≤ A? Isn't the number of pages in physical memory per process a subset of the committed pages?
Or is this something to do with sharing of memory among processes? Please explain. (Perhaps my definition of 'Working Set' is off the mark).
Virtual Memory
Assume that your program (eg Oracle) allocated 100 MB of memory upon startup - your VM size goes up by 100 MB though no additional physical / disk pages are touched. ie VM is nothing but memory book keeping.
The total available physical memory + paging file memory is the maximum memory that ALL the processes in the system can allocate. The system does this so that it can ensure that at any point time if the processes actually start consuming all that memory it allocated the OS can supply the actual physical pages required.
Private Memory
If the program copies 10 MB of data into that 100 MB, OS senses that no pages have been allocated to the process corresponding to those addresses and assigns 10 MB worth of physical pages into your process's private memory. (This process is called page fault)
Working Set
Definition : Working set is the set of memory pages that have been recently touched by a program.
At this point these 10 pages are added to the working set of the process. If the process then goes and copies this data into another 10 MB cache previously allocated, everything else remains the same but the Working Set goes up again by 10 Mb if those old pages where not in the working set. But if those pages where already in the working set, then everything is good and the programs working set remains the same.
Working Set behaviour
Imagine your process never touches the first 10 pages ever again, in which case these pages are trimmed off from your process's working set and possibly sent to the page file so that the OS can bring in other pages that are more frequently used. However if there are no urgent low memory requirements, then this act of paging need not be done and OS can act as if its rich in memory. In this case the working set simply lets these pages remain.
When is Working Set > Virtual Memory
Now imagine the same program de-allocates all the 100 Mb of memory. The programs VM size is immediately reduced by 100 MB (remember VM = book keeping of all memory allocation requests)
The working set need not be affected by this, since that doesn't change the fact that those 10 Mb worth of pages where recently touched. Therefore those pages still remain in the working set of the process though the OS can reclaim them whenever it requires.
This would effectively make the VM < working set. However this will rectify if you start another process that consumes more memory and the working set pages are reclaimed by the OS.
XP's Task Manager is simply wrong. EDIT: If you don't believe me (and someone doesn't, because they voted this down), read Firefox 3 Memory Usage. I quote:
If you’re looking at Memory Usage
under Windows XP, your numbers aren’t
going to be so great. The reason:
Microsoft changed the meaning of
“private bytes” between XP and Vista
(for the better).
Sounds like MS got confused. You only change something like that if it's broken.
Try Process Explorer instead. What Task Manager labels "VM Size", Process Explorer (more correctly) labels "Private Bytes". And in Process Explorer, Working Set (and Private Bytes) are always less than or equal to Virtual Size, as you would expect.
File mapping
Very common way how Mem Usage can be higher than VM Size is by using file mapping objects (hence it can be related to shared memory, as file mapping is used to share memory). With file mapping you can have a memory which is committed (either in page file or in physical memory, you do not know), but has no virtual address assigned to it. The committed memory appears in Mem Usage, while used virtual addresses usage is tracked by VM Size.
Memory usage is the amount of electronic memory currently allocated to the process.
VM Size is the amount of virtual memory currently allocated to the process.
so ...
A page that exists only electronically will increase only Memory Usage.
A page that exists only on disk will increase only VM Size.
A page that exists both in memory and on disk will increase both.
Some examples to illustrate:
Currently on my machine, iexplore has 16,000K Memory Usage and 194,916 VM Size. This means that most of the memory used by Internet Explorer is idle and has been swapped out to disk, and only a fraction is being kept in main memory.
Contrast with mcshield.exe with has 98,984K memory usage and 98,168K VM Size. My conclusion here is that McAfee AntiVirus is active, with at lot of memory in use. Since it's been running for quite some time (all day, since booting), I expect that most of the 98,168K VM Size is copies of the electronic memory - though there's nothing in Task Manager to confirm this.
You might find some explaination in The Memory Shell Game
Working Set (A) – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM. These pages fully belong to the process. A working set is like a "currently/recently working on these pages" list.
Virtual Memory – This is a memory that an operating system can address. Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture.
Committed Memory – When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page. It is now backed by a physical memory page. This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk. The memory manager handles the translations from the virtual memory page to the physical page. A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.
BUT: PF (Page File) Usage - This is the total number of committed pages on the system. It does not tell you how many are actually written to the page file. It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.
Hence B > A...
If we agree that B represents "mem usage" or also PF usage, the problem comes from the fact it actually represents potential page usages: in Xp, this potential file space can be used as a place to assign those virtual memory pages that programs have asked for, but never brought into use...
Memory fragmentation is probably the reason:
If the process allocates 1 octet, it counts for 1 octet in the VMSize, but this 1 octet requires a physical page (4K on windows operating system).
If after allocating/freeing memory, the process has a second octet that is separated by more than 4K from the first one, this second octet will always be stored on a separate physical page than the 1 one.
So the VM Size count is 2 octets but the Memory Usage is 2 pages== 8K
So the fact that MemUsage is greater than VMSize shows that process does a lot of allocation and deallocation and fragments the memory.
This could be because the process is started a long time ago.
Or else there is place for optimization ;-)
