I understand how the process is stored in the form of pages. Also, the memory layout of a program generally contains the 4 segments i.e. Code segment, data segment, Stack, and heap. But I have some questions while combining the above both things.
Do the stack and heap also store in pages? How they get distinguished that particular section is stack and other is the heap?
If stack and heap (also the data seg.) is stored in pages then how they get linked to the particular process.
Does complete physical memory is divided into pages or only part of memory which stores the code segment? (I am confused because the page frame contains page number and instruction offset and instruction means the code of the program.)
Also when we say that there is 4 GB of virtual memory is there in the system then does it mean that for each process it is 4 GB or is it a total 4 GB? And if it is a total 4 GB then isn't it similar to physical memory (RAM)?
Related
Just going off wikipedia:
The page table, generally stored in main memory, keeps track of where the virtual pages are stored in the physical memory. This method uses two memory accesses (one for the page table entry, one for the byte) to access a byte. First, the page table is looked up for the frame number. Second, the frame number with the page offset gives the actual address. Thus any straightforward virtual memory scheme would have the effect of doubling the memory access time. Hence, the TLB is used to reduce the time taken to access the memory locations in the page table method.
So given that, what I'm curious about is why the TLB is actually faster because from what I know it's just a smaller, exact copy of the page table.
You still need to access the TLB to find the physical address, and then once you have that, you still need to actually access the data at the physical address, which is two lookups just like with the page table.
I can only think of two reasons why the TLB is faster:
looking up an address in the TLB or page table is not O(n) (I assumed it's O(1) like a hash table). Thus, since the TLB is much smaller, it's faster to do a lookup. Also in this case, why not just use a hash table instead of a TLB?
I incorrectly interpreted how the TLB works, and it's not actually doing two accesses.
I realize it has been three years since this question was asked, but since it is still just as relevant, and it still shows up in search engines, I'll try my best to produce a complete answer.
Accessing the main memory through the TLB rather than the page table is faster primarily for two reasons:
1. The TLB is faster than main memory (which is where the page table resides).
The typical access time is in the order of <1 ns for the TLB and 100 ns for main memory
A TLB access is part of an L1 cache hit, and modern CPUs can do 2 loads per clock if they both hit in L1d cache.
The reasons for this are twofold:
The TLB is located within the CPU, while main memory - and thus the page table - is not.
The TLB - like other caches - is made of fast and expensive SRAM, whereas main memory usually consists of slow and inexpensive DRAM (read more here).
Thus, if the supposition that both the TLB and page table require only one memory access was correct, a TLB hit would still, roughly speaking, halve memory access time. However, as we shall see next, the supposition is not correct, and the benefit of having a TLB is even greater.
2. Accessing the page table usually requires multiple memory accesses.
This really is the crux of the issue.
Modern CPUs tend to use multilevel page tables in order to save memory. Most notably, x86-64 page tables currently consist of up to four levels (and a fifth may be coming). This means that accessing a single byte in memory through the page table requires up to five memory accesses: four for the page table and one for the data. Obviously the cost would be unbearably high if not for the TLB; it is easy to see why CPU and OS engineers put in a lot of effort to minimize the frequency of TLB misses.
Finally, do note that even this explanation is somewhat of a simplification, as it ignores, among other things, data caching. The detailed mechanics of modern desktop CPUs are complex and, to a degree, undisclosed. For a more detailed discussion on the topic, refer to this thread, for instance.
Page-table accesses can and are cached by data caches on modern CPUs, but the next access in a page-walk depends on the result of the first access (a pointer to the next level of the page table), so a 4-level page walk would have about 4x 4 cycle = 16 cycle latency even if all accesses hit in L1d cache. That would be a lot more for the pipeline to hide than the ~3 to 4 cycle TLB latency that's part of an L1d cache hit load in a modern Intel CPU (which of course uses TLBs for data and instruction accesses).
You are right in your assumption that approach with TLB still requires 2 accesses. But the approach with TLB is faster because:
TLB is made of faster memory called associative memory
Usually we make 2 memory accesses to physical memory but with TLB there is 1 access to TLB and other access is to physical memory.
Associative memory is faster because it is content addressable memory but its expensive too , because of the extra logic circuits required.
You can read about the content addressable memory here.
It depends upon the specific implementation. In general, the TLB is a cache that exists within the CPU.
You still need to access the TLB to find the physical address, and then once you have that, you still need to actually access the data at the physical address, which is two lookups just like with the page table.
The CPU can access the cache much faster than it can access data through the memory bus. It is making two accesses to two different places (one faster and one slower). Also, it is possible for the memory location to be cached within the CPU as well, in which case no accesses are required to go through the memory bus.
I think #ihohen said it pretty much but as a student to future students may come here, in simple words an explanation is:
" Without a TLB in a single level paging you need 2 accesses to main memory:
1 for finding the translation of the logical adress in the page table (which is placed in main memory) and 1 another for actually accessing the memory block ".
Now with a TLB , you reduce the above only to one access (the second one) because the step of finding the translation (hopefully) will take place without needing to access main memory because you will find the translation in the TLB which placed in cpu ".
So when we say that a TLB reduces access time by 2 , we mean that approximately if we ignore the case of a TLB miss, and consider the simplest model of paging (the single level one) then is fair to say that a TLB speeds up the process by 2.
There will be many variations, because first and foremost today's computers will use advanced paging techniques (multilevel, demand paging e.t.c) but this sentence is an
intuitive explanation as to why the idea of TLB is much more helpful than a simple page table.
The book "Operating Systems " by Silberschatz states another (a little bit more detailed) math type to measure access time with a TLB:
Consider:
h : TLB hit ratio
τ : time to access main memory
e : time spend searching to find TLB registration
t = h * (e + τ) + (1-h)*(e + 2τ)
LINK 1: If size of the physical memory is 2^32-1, then what is the size of virtual memory?
the above link gives me an answer but i still do have some doubts.
pls answer in the way the questions posted here so that i will not be confused.....
1.Virtual memory is also called as Demand Paging whenever a page fault occurs
the operating system swaps the required page from the virtual memory. the virtual memory
here mean the harddisk or secondary storage. So how much space can be allocated for a
porcess in virutal memory? can this size(the space allocated for each process in the
Virtual memory) exceeds the size of our RAM size? i mean if our RAM is 4GB then what is
the maximum size of the virtual memory you can have for a process?can we have 4GB of
virtual memory for every process or can we have more than 4GB for every process?
(if it needs)
2.is the Virtual memory size fixed or dynamic? How much space is allocated for this memory
and in the above link it is told that 2^48 is the size of virtual memory in 64 bit machine
why is it only 2^48 and how can once can say a number like that?
thank you
If size of the physical memory is 2^32-1, then what is the size of virtual memory?
The size of the virtual address space is independent of the size of the physical address space. There is no answer.
So how much space can be allocated for a porcess in virutal memory?
That depends upon hardware limits, system parameters, and process quotas.
can this size(the space allocated for each process in the Virtual memory) exceeds the size of our RAM size?
Yes and it frequently does.
i mean if our RAM is 4GB then what is the maximum size of the virtual memory you can have for a process?
It can be anything. The rams size does not control.
can we have 4GB of virtual memory for every process or can we have more than 4GB for every process?
Both
is the Virtual memory size fixed or dynamic?
Dynamic
How much space is allocated for this memory and in the above link it is told that 2^48 is the size of virtual memory in 64 bit machine why is it only 2^48 and how can once can say a number like that?
It could be a hardware limit for a specific processor.
Paging is the way that virtual addresses are converted into physical addresses. This is done via page tables.
On x86 in Long Mode (64 bit mode), the page tables allow for 48 bit virtual address spaces (as in, 2^48 max size). This limitation is due to the design of x86's long mode page tables. Paging uses a few bits at a time from pointers to determine where to go next in the page tables. Basically, page tables are a relatively shallow b-tree style tree that let you look up the physical address corresponding to a virtual address.
To convert virtual addresses to physical addresses Long Mode page tables (for small pages) first extract 9 bits from the virtual address, then 9 more, then 9 more, then 9 more to find the right page, and use the low 12 bits to find the precise byte being accessed, for 48 bits total.
(For large and huge pages, x86 skips the last 1 and 2 steps of paging respectively, to find the address of the large or huge page, and the unused low 21 or 30 bits are used to find the precise byte in that page)
Virtual address spaces aren't necessarily dynamic, depending on what is meant by dynamic. The address space is always 48 bits (so long as you aren't switching between modes, like from long mode to protected mode with paging enabled (I.e. 32 bit mode)). Virtual address spaces are almost always sparse, as in most canonical (valid) addresses don't point don't point anything useful. The page tables don't have mappings for most addresses (accesses to those addresses generate page faults, which on Linux are often bounced back to userspace as the SIGSEGV you know and love).
That said, virtual memory can be dynamic in that when a page fault occurs the kernel could map in that page. To implement swap, OSes will use extra space on disk to give the illusion of more RAM by writing infrequently used pages back to disk, and lazily pulling pages back into RAM.
Fun fact, page tables have no restriction preventing the same physical page from being mapped in multiple times. You could build a monstrous page table with every virtual address pointing into exactly the same page (which is crazy), but doable. This means address spaces aren't necessarily sparse, just very likely to be. (Note that this page table would be huge. I'm sure someone has done the calculation, but my first guess would be order of terabytes)
I have question from exam but I don't understand the solution, can someone explain the solution for me ?
Memory access time =2.5*10^-7 sec
second memory time = 3*10^-6
TLB time = 10^-8
Given virtual address,value x and 3 level page table, how much time it takes to read x value from memory in the worst case?
the solution is : 10^-8 + 2.5*10^-7 + 3*(3*10^-6 + 2*2.5*10^-7) + 10^-8 = 1076*10^-7
It's pretty obvious that the solution is performing 2 TLB lookups, 7 memory accesses, and 3 secondary memory accesses.
Here are the steps in the process:
1) The CPU accesses the TLB to find the memory location that the virtual address maps to.
2) The CPU accesses main memory to look for the virtual address. This step fails.
3) The CPU accesses the page file (1 memory access to get the page file, 1 more to access the page file entry).
4) The CPU reads from secondary memory to get the page referred to in the page file.
5) Repeat steps 3 & 4 for each level in the page table.
There is no formula as far as I know to calculate best and worst times of memory accesses. However, there are various factors that influence it:
The width of the access. On 32-bit x86, 8-bit and 32-bit accesses tend to be faster than 16-bit ones.
Whether the access is aligned or not. Unaligned accesses tend to be slower than aligned accesses.
Whether accessed memory is cached. Accesses to cached memory are faster than accesses to uncached memory.
The NUMA domain of the accessed memory. Accessing memory belonging to a close NUMA domain is faster than accessing memory belonging to a far NUMA domain.
Whether paging is enabled. Accessing memory when paging is enabled involves traversing paging structures and therefore is slower.
The type of memory. For example, writing to video memory is slower than writing to "normal" memory. Respectively, reading from video memory is much much much slower than reading from "normal" memory.
Other factors I forgot to mention. It's hard to memorise them all.
Furthermore, the influence of each of these factors depends on the underlaying hardware, therefore it would be really hard to invent even an approximation formula that calculates best and worst times of memory accesses.
I've written a 32bit program using a dynamic array to store a list of triangles with an unknown count. My current strategy is to estimate a very large number of triangles and then trim the list when all the triangles are created. In some cases I'll only allocate memory once in others I'll need to add to the allocation.
With a very large data set I'm running out of memory when my application is memory usage is about 1.2GB and since the allocation step is so large I feel like I may be fragmenting memory.
Looking at FastMM (memory manager) I see these constants which would suggest one of these as a good size to increment by.
ChunkSize = 64 * 1024;
MaximumSmallBlockSize = 32752;
LargeBlockGranularity = 64 * 1024;
Would one of these be an optimal size for increasing the size of an array?
Eventually this program will become 64bit but we're not quite ready for that step.
Your real problem here is not that you are running out of memory, but that the memory allocator cannot find a large enough block of contiguous address space. Some simple things you can do to help include:
Execute the code in a 64 bit process.
Add the LARGEADDRESSAWARE PE flag so that your process gets a 4GB address space rather than 2GB.
Beyond that the best you can do is allocate smaller blocks so that you avoid the requirement to store your large data structure in contiguous memory. Allocate memory in blocks. So, if you need 1GB of memory, allocate 64 blocks of size 16MB, for instance. The exact block size that you use can be tuned to your needs. Larger blocks result in better allocation performance, but smaller blocks allow you to use more address space.
Wrap this up in a container that presents an array like interface to the consumer, but internally stores the memory in non-contiguous blocks.
As far as I know, dynamic arrays in Delphi use contiguous address space (at least in the virtual memory address space.)
Since you are running out of memory at 1.2 gb, I guess that's the point where the memory manager can't find a block contiguous memory large enough to fit a larger array.
One way you can work around this limitation would be to implement your array as a collection of smaller array of (lets say) 200 mb in size. That should give you some more headroom before you hit the memory cap.
From the 1.2 gb value, I would guess your program isn't compiled to be "large address aware". You can see here how to compile your application like this.
One last trick would be to actually save the array data in a file. I use this trick for one of my application where I needed to load a few GB of images to be displayed in a grid. What I did was to create a file with the attribute FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE and saved/loaded images from the resulting file. From CreateFile documentation:
A file is being used for temporary storage. File systems avoid writing
data back to mass storage if sufficient cache memory is available,
because an application deletes a temporary file after a handle is
closed. In that case, the system can entirely avoid writing the data.
Otherwise, the data is written after the handle is closed.
Since it makes use of cache memory, I believe it allows an application to use memory beyond the 32 bits limitation since the cache is managed by the OS and (as far as I know) not mapped inside the process' virtual memory space. After doing this change, performance were still pretty good. But I can't say if performances would still be good enough for your needs.
In my Windows XP Task Manager, some processes display a higher value in the Mem Usage column than the VMSize. My Firefox instance, for example shows 111544 K as mem usage and 100576 K as VMSize.
According to the help file of Task Manager Mem Usage is the working set of the process and VMSize is the committed memory in the Virtual address space.
My question is, if the number of committed pages for a process is A and the number of pages in physical memory for the same process is B, shouldn't it always be B ≤ A? Isn't the number of pages in physical memory per process a subset of the committed pages?
Or is this something to do with sharing of memory among processes? Please explain. (Perhaps my definition of 'Working Set' is off the mark).
Thanks.
Virtual Memory
Assume that your program (eg Oracle) allocated 100 MB of memory upon startup - your VM size goes up by 100 MB though no additional physical / disk pages are touched. ie VM is nothing but memory book keeping.
The total available physical memory + paging file memory is the maximum memory that ALL the processes in the system can allocate. The system does this so that it can ensure that at any point time if the processes actually start consuming all that memory it allocated the OS can supply the actual physical pages required.
Private Memory
If the program copies 10 MB of data into that 100 MB, OS senses that no pages have been allocated to the process corresponding to those addresses and assigns 10 MB worth of physical pages into your process's private memory. (This process is called page fault)
Working Set
Definition : Working set is the set of memory pages that have been recently touched by a program.
At this point these 10 pages are added to the working set of the process. If the process then goes and copies this data into another 10 MB cache previously allocated, everything else remains the same but the Working Set goes up again by 10 Mb if those old pages where not in the working set. But if those pages where already in the working set, then everything is good and the programs working set remains the same.
Working Set behaviour
Imagine your process never touches the first 10 pages ever again, in which case these pages are trimmed off from your process's working set and possibly sent to the page file so that the OS can bring in other pages that are more frequently used. However if there are no urgent low memory requirements, then this act of paging need not be done and OS can act as if its rich in memory. In this case the working set simply lets these pages remain.
When is Working Set > Virtual Memory
Now imagine the same program de-allocates all the 100 Mb of memory. The programs VM size is immediately reduced by 100 MB (remember VM = book keeping of all memory allocation requests)
The working set need not be affected by this, since that doesn't change the fact that those 10 Mb worth of pages where recently touched. Therefore those pages still remain in the working set of the process though the OS can reclaim them whenever it requires.
This would effectively make the VM < working set. However this will rectify if you start another process that consumes more memory and the working set pages are reclaimed by the OS.
XP's Task Manager is simply wrong. EDIT: If you don't believe me (and someone doesn't, because they voted this down), read Firefox 3 Memory Usage. I quote:
If you’re looking at Memory Usage
under Windows XP, your numbers aren’t
going to be so great. The reason:
Microsoft changed the meaning of
“private bytes” between XP and Vista
(for the better).
Sounds like MS got confused. You only change something like that if it's broken.
Try Process Explorer instead. What Task Manager labels "VM Size", Process Explorer (more correctly) labels "Private Bytes". And in Process Explorer, Working Set (and Private Bytes) are always less than or equal to Virtual Size, as you would expect.
File mapping
Very common way how Mem Usage can be higher than VM Size is by using file mapping objects (hence it can be related to shared memory, as file mapping is used to share memory). With file mapping you can have a memory which is committed (either in page file or in physical memory, you do not know), but has no virtual address assigned to it. The committed memory appears in Mem Usage, while used virtual addresses usage is tracked by VM Size.
See also:
What does “VM Size” mean in the Windows Task Manager? on Stackoverflow
Breaking the 32 bit Barrier in my developer blog
Usenet discussion Still confused why working set larger than virtual memory
Memory usage is the amount of electronic memory currently allocated to the process.
VM Size is the amount of virtual memory currently allocated to the process.
so ...
A page that exists only electronically will increase only Memory Usage.
A page that exists only on disk will increase only VM Size.
A page that exists both in memory and on disk will increase both.
Some examples to illustrate:
Currently on my machine, iexplore has 16,000K Memory Usage and 194,916 VM Size. This means that most of the memory used by Internet Explorer is idle and has been swapped out to disk, and only a fraction is being kept in main memory.
Contrast with mcshield.exe with has 98,984K memory usage and 98,168K VM Size. My conclusion here is that McAfee AntiVirus is active, with at lot of memory in use. Since it's been running for quite some time (all day, since booting), I expect that most of the 98,168K VM Size is copies of the electronic memory - though there's nothing in Task Manager to confirm this.
You might find some explaination in The Memory Shell Game
Working Set (A) – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM. These pages fully belong to the process. A working set is like a "currently/recently working on these pages" list.
Virtual Memory – This is a memory that an operating system can address. Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture.
Committed Memory – When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page. It is now backed by a physical memory page. This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk. The memory manager handles the translations from the virtual memory page to the physical page. A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.
BUT: PF (Page File) Usage - This is the total number of committed pages on the system. It does not tell you how many are actually written to the page file. It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.
Hence B > A...
If we agree that B represents "mem usage" or also PF usage, the problem comes from the fact it actually represents potential page usages: in Xp, this potential file space can be used as a place to assign those virtual memory pages that programs have asked for, but never brought into use...
Memory fragmentation is probably the reason:
If the process allocates 1 octet, it counts for 1 octet in the VMSize, but this 1 octet requires a physical page (4K on windows operating system).
If after allocating/freeing memory, the process has a second octet that is separated by more than 4K from the first one, this second octet will always be stored on a separate physical page than the 1 one.
So the VM Size count is 2 octets but the Memory Usage is 2 pages== 8K
So the fact that MemUsage is greater than VMSize shows that process does a lot of allocation and deallocation and fragments the memory.
This could be because the process is started a long time ago.
Or else there is place for optimization ;-)