If processes don’t fit in memory, What moves them in and out of memory to run?
this question is based on Operating System Memory management theory.
I have checked about the purpose of memory management unit. Is this related to swapping?
The operating system will use a memory management technique called virtual memory.
This is when a computer compensates for shortages of physical memory by temporarily transferring pages (segments of memory) of data from RAM to backing store. RAM is much faster than secondary storage and when a computer needs to use secondary storage over primary the user will feel the computer running slower.
The operating systems virtual memory manager is responsible for managing this. It will use techniques such as placing pages that have not been referenced for in a while into secondary memory (you hard disk for example) and if a page in secondary storage is required it will move the page from secondary to primary memory.
Another point is that most modern apps will page themselves, such as when they are minimised for example, to reduce the amount of memory they're using for other applications running.
Related
An operating system itself has resources it needs to access, like block I/O cache and process control blocks. Does it use virtual memory addresses or physical memory addresses?
I feel like it should be the former since it prevents the need to keep a large area of physical memory for a purpose, even when it is mostly empty. The mechanism of page tables/virtual memory would do a much better job at keeping those resources that the OS really needs.
So which is it?
10 randomly selected operating systems will do virtual memory management in 10 different ways. There's no answer that applies to all operating systems.
Some (e.g. MS-DOS) don't support or use virtual memory management for anything, some (e.g. Linux) just map all of physical memory into kernel space and don't bother using virtual memory management tricks for the kernel itself (it's almost as if the kernel is in physical memory even though it's technically both), and some may do any number of virtual memory tricks in kernel space.
I am learning the concept of virtual memory, but this question has been confusing me for a while. Since most modern computers use virtual memory, when a program is in execution, the os is supposed to page data in and out between RAM and disk. But why do we still encounter "out of memory" issue? Could you please correct me if I misunderstood the concept? I really appreciate your explanation.
PS: For example, I was analyzing a large amount of data (>100G) output from simulation on a computing cluster, and read in the data to an C array. Very often the system crashed and complained a memory error.
First: Modern computer do indeed use virtual memory, however there is no magic here. Memory is not created out of nothing. Virtual memory schemes typically allow a portion of the mass storage sub-system (aka hard disk) to be used to hold portions of the process that are (hopefully) less frequently used.
This technique allows processes to use more memory than is available as RAM. However nothing is infinite. Eventually all RAM and Hard Drive resources will be used up and the process will get an out of memory error.
Second: It is not unheard of for operating systems to place a cap on the memory that a process may use. Hit that cap and again, the process gets an out of memory error.
Even with virtual memory the memory available is not unlimited.
Limit 1) Architectural limits. The processor and operating system will place some maximum virtual memory limit.
Limit 2) System Parameters. Many operating systems configure the maximum virtual memory size.
Limit 3) Process quotas. Many operating system have process quotas that limit the maximum virtual memory size.
Limit 4) System resources. Notably page file space.
So I know the operating system must be in control of giving an application a certain amount of Ram. But I'm curious how does it know how much to give to the application, and how does it know how much the said application is using? Like who and what is keeping track of that usage? And how does it know which memory is safe to use? I assume that some memory is reserved for critical systems. I must admit that I don't have very much knowledge of operating systems.
The operating system divides memory into "pages". They're typically 4KB in size.
The operating system keeps track of those pages in a table. By counting them we can determine how much memory is used or free.
Userland programs request memory by a system call. It depends on the system, and mmap() is used for Linux. This will request the OS to give an empty page to use for the program. Freeing the memory is basically the reverse.
In CPU world one can do it via memory map. Can similar things done for GPU?
If two process can share a same CUDA context, I think it will be trivial - just pass GPU memory pointer around. Is it possible to share same CUDA context between two processes?
Another possibility I could think of is to map device memory to a memory mapped host memory. Since it's memory mapped, it can be shared between two processes. Does this make sense / possible, and are there any overhead?
CUDA MPS effectively allows CUDA activity emanating from 2 or more processes to behave as if they share the same context on the GPU. (For clarity: CUDA MPS does not cause two or more processes to share the same context. However the work scheduling behavior appears similar to what you would observe if the work were emanating from the same process and therefore the same context.) However this won't provide for what you are asking for:
can two processes share the same GPU memory?
One method to achieve this is via CUDA IPC (interprocess communication) API.
This will allow you to share an allocated device memory region (i.e. a memory region allocated via cudaMalloc) between multiple processes. This answer contains additional resources to learn about CUDA IPC.
However, according to my testing, this does not enable sharing of host pinned memory regions (e.g. a region allocated via cudaHostAlloc) between multiple processes. The memory region itself can be shared using ordinary IPC mechanisms available for your particular OS, but it cannot be made to appear as "pinned" memory in 2 or more processes (according to my testing).
I'm a bit confused between about the difference between shared memory and distributed memory. Can you clarify?
Is shared memory for one processor and distributed for many (for network)?
Why do we need distributed memory, if we have shared memory?
Short answer
Shared memory and distributed memory are low-level programming abstractions that are used with certain types of parallel programming. Shared memory allows multiple processing elements to share the same location in memory (that is to see each others reads and writes) without any other special directives, while distributed memory requires explicit commands to transfer data from one processing element to another.
Detailed answer
There are two issues to consider regarding the terms shared memory and distributed memory. One is what do these mean as programming abstractions, and the other is what do they mean in terms of how the hardware is actually implemented.
In the past there were true shared memory cache-coherent multiprocessor systems. The systems communicated with each other and with shared main memory over a shared bus. This meant that any access from any processor to main memory would have equal latency. Today these types of systems are not manufactured. Instead there are various point-to-point links between processing elements and memory elements (this is the reason for non-uniform memory access, or NUMA). However, the idea of communicating directly through memory remains a useful programming abstraction. So in many systems this is handled by the hardware and the programmer does not need to insert any special directives. Some common programming techniques that use these abstractions are OpenMP and Pthreads.
Distributed memory has traditionally been associated with processors performing computation on local memory and then once it using explicit messages to transfer data with remote processors. This adds complexity for the programmer, but simplifies the hardware implementation because the system no longer has to maintain the illusion that all memory is actually shared. This type of programming has traditionally been used with supercomputers that have hundreds or thousands of processing elements. A commonly used technique is MPI.
However, supercomputers are not the only systems with distributed memory. Another example is GPGPU programming which is available for many desktop and laptop systems sold today. Both CUDA and OpenCL require the programmer to explicitly manage sharing between the CPU and the GPU (or other accelerator in the case of OpenCL). This is largely because when GPU programming started the GPU and CPU memory was separated by the PCI bus which has a very long latency compared to performing computation on the locally attached memory. So the programming models were developed assuming that the memory was separate (or distributed) and communication between the two processing elements (CPU and GPU) required explicit communication. Now that many systems have GPU and CPU elements on the same die there are proposals to allow GPGPU programming to have an interface that is more like shared memory.
In modern x86 terms, for example, all the CPUs in one physical computer share memory. e.g. 4-socket system with four 18-core CPUs. Each CPU has its own memory controllers, but they talk to each other so all the CPUs are part of one coherency domain. The system is NUMA shared memory, not distributed.
A room full of these machines form a distributed-memory cluster which communicates by sending messages over a network.
Practical considerations are one major reasons for distributed memory: it's impractical to have thousands or millions of CPU cores sharing the same memory with any kind of coherency semantics that make it worth calling it shared memory.