How is memory loaded from virtual memory? - memory

I know that Operating systems usually keep Page Tables to map a chunk of virtual memory to a chunk of physical memory.
My question is, does the CPU load the whole chunk when it's loading a given byte?
lets say I have:
ld %r0, 0x4(%r1)
Assuming my page size is 4 KB, does the the CPU load all 4KB at once or it manages to
Load only a byte given the offset properly?
Is the page size mandated by the hardware or configurable by software and the OS?
Edit:
Figured that page size is mandated by hardware:
The available page sizes depend on the instruction set architecture, processor type, and operating (addressing) mode. The operating system selects one or more sizes from the sizes supported by the architecture

your question touches so many levels of cpu/memory architecture ... without knowing the exact cpu-architecture/memory-architecture/version you have in mind: while the cpu-command targets only one byte, it will trigger the memory-controller to locate the whole physical-page and load that one (atleast, prefetch might kick in) completely to second/first-level cache. your data is transfered after filling the cache.

On a typical modern CPU, yes, it loads the whole page.
It couldn't really work any other way, since there are only two states in the page tables for a given page: present and not present. If the page is present, it must be mapped to some page in physical memory. If not present, every access to that page produces a page fault. There is no "partially present" state.
In order for it to be safe for the OS to mark the page present, it has to load the entire page into physical memory and update the page tables to point the virtual page to the physical page. If it only loaded a single byte or a smaller amount, the application might later try to access some other byte on the same page that hadn't been loaded, and it'd read garbage. There's no way for the CPU to generate another page fault in that case to let the OS fix things up, unless the page were marked not present, in which case the original access wouldn't be able to complete either.
The page size is fixed in hardware, though some architectures offer a few different choices that the OS can select from. For instance, recent x86-64 CPUs allow pages to be either 4 KB, 2 MB or 1 GB. The OS can mix-and-match these at runtime; there are bits in the page tables to indicate the size of each page.

Related

x86 dirty bit in page table entry

The Intel Architecture manual says when there is first write access against a memory page, the CPU sets the dirty bit of the page table entry. I have questions regarding this issue.
1. The 'dirty bit' in this context is used for guaranteeing the correctness of disk swapping in, out of memory pages. is this correct?
2. Is this automatically performed by the hardware? or is this an implementation of operating system?
3. If it is automatically performed by the hardware, is there any noteworthy difference compared to the usual memory updates which are performed by software instructions?
Thank you in advance.
1 The 'dirty bit' in this context is used for guaranteeing the correctness of disk swapping in, out of memory pages. is this correct?
This hardware part of paging support. This bit helps OS determine in very fast and efficcient way to determine which page must be dumped to disk. Because if memory page will page out to disk and there is already allocated space in page file we can don`t dump this page to disk if this flag are cleared. This is just example of way how OS can use this flag in paging.
2 Is this automatically performed by the hardware? or is this an implementation of operating system?
Software clears this flag. Hardware sets this flag:
3.7.6 Page-Directory and Page-Table Entries
Dirty (D) flag, bit 6
Indicates whether a page has been written to when set. (This flag is
not used in page-directory entries that point to page tables.) Memory
management software typically clears this flag when a page is
initially loaded into physical memory. The processor then sets this
flag the first time a page is accessed for a write operation.
.
3 If it is automatically performed by the hardware, is there any noteworthy difference compared to the usual memory updates which are performed by software instructions?
They have LOCK semantics and atomic.

What is paging?

Paging is explained here, slide #6 :
http://www.cs.ucc.ie/~grigoras/CS2506/Lecture_6.pdf
in my lecture notes, but I cannot for the life of me understand it. I know its a way of translating virtual addresses to physical addresses. So the virtual addresses, which are on disks are divided into chunks of 2^k. I am really confused after this. Can someone please explain it to me in simple terms?
Paging is, as you've noted, a type of virtual memory. To answer the question raised by #John Curtsy: it's covered separately from virtual memory in general because there are other types of virtual memory, although paging is now (by far) the most common.
Paged virtual memory is pretty simple: you split all of your physical memory up into blocks, mostly of equal size (though having a selection of two or three sizes is fairly common in practice). Making the blocks equal sized makes them interchangeable.
Then you have addressing. You start by breaking each address up into two pieces. One is an offset within a page. You normally use the least significant bits for that part. If you use (say) 4K pages, you need 12 bits for the offset. With (say) a 32-bit address space, that leaves 20 more bits.
From there, things are really a lot simpler than they initially seem. You basically build a small "descriptor" to describe each page of memory. This will have a linear address (the address used by the client application to address that memory), and a physical address for the memory, as well as a Present bit. There will (at least usually) be a few other things like permissions to indicate whether data in that page can be read, written, executed, etc.
Then, when client code uses an address, the CPU starts by breaking up the page offset from the rest of the address. It then takes the rest of the linear address, and looks through the page descriptors to find the physical address that goes with that linear address. Then, to address the physical memory, it uses the upper 20 bits of the physical address with the lower 12 bits of the linear address, and together they form the actual physical address that goes out on the processor pins and gets data from the memory chip.
Now, we get to the part where we get "true" virtual memory. When programs are using more memory than is actually available, the OS takes the data for some of those descriptors, and writes it out to the disk drive. It then clears the "Present" bit for that page of memory. The physical page of memory is now free for some other purpose.
When the client program tries to refer to that memory, the CPU checks that the Present bit is set. If it's not, the CPU raises an exception. When that happens, the CPU frees up a block of physical memory as above, reads the data for the current page back in from disk, and fills in the page descriptor with the address of the physical page where it's now located. When it's done all that, it returns from the exception, and the CPU restarts execution of the instruction that caused the exception to start with -- except now, the Present bit is set, so using the memory will work.
There is one more detail that you probably need to know: the page descriptors are normally arranged into page tables, and (the important part) you normally have a separate set of page tables for each process in the system (and another for the OS kernel itself). Having separate page tables for each process means that each process can use the same set of linear addresses, but those get mapped to different set of physical addresses as needed. You can also map the same physical memory to more than one process by just creating two separate page descriptors (one for each process) that contain the same physical address. Most OSes use this so that, for example, if you have two or three copies of the same program running, it'll really only have one copy of the executable code for that program in memory -- but it'll have two or three sets of page descriptors that point to that same code so all of them can use it without making separate copies for each.
Of course, I'm simplifying a lot -- quite a few complete (and often fairly large) books have been written about virtual memory. There's also a fair amount of variation among machines, with various embellishments added, minor changes in parameters made (e.g., whether a page is 4K or 8K), and so on. Nonetheless, this is at least a general idea of the core of what happens (and it's still at a high enough level to apply about equally to an ARM, x86, MIPS, SPARC, etc.)
Simply put, its a way of holding far more data than your address space would normally allow. I.e, if you have a 32 bit address space and 4 bit virtual address, you can hold (2^32)^(2^4) addresses (far more than a 32 bit address space).
Paging is a storage mechanism that allows OS to retrieve processes from the secondary storage into the main memory in the form of pages. In the Paging method, the main memory is divided into small fixed-size blocks of physical memory, which is called frames. The size of a frame should be kept the same as that of a page to have maximum utilization of the main memory and to avoid external fragmentation.

40 million page faults. How to fix this?

I have an application that loads 170 files (let’s say they are text files) from disk in individual objects and kept in memory all the time. The memory is allocated once when I load those files from disk. So, there is no memory fragmentation involved. I also use FastMM to make sure my applications never leaks memory.
The application compares all these files with each other to find similarities. Over-simplified we can say that we compare text strings but the algorithm is way more complex as I have to allow some differences between strings. Each file is about 300KB. Loaded in memory (the object that holds it) it takes about 0.4MB of RAM. So, the running app takes about 60MB or RAM (working set). It processes the data for about 15 minutes. The thing is that it generates over 40 million page faults.
Why? I have about 2GB of free RAM. From what I know Page Faults are slow. How much they are slowing down my program?
How can I optimize the program to reduce these page faults? I guess it has something to do with data locality. Does anybody know some example algorithms for this (Delphi)?
Update:
But looking at the number of page faults (no other application in Task Manager comes close to mine, not even by far) I guess that I could increase the speed of my application IF I manage to optimize memory layout (reduce the page faults).
Delphi 7, Win 7 32 bit, RAM 4GB (3GB visible, 2GB free).
Caveat - I'm only addressing the page faulting issue.
I cannot be sure but have you considered using Memory Mapped files? In this way windows will use the files themselves as the paging file (rather than the main paging file pagrefile.sys). If the files are read only then the number of page faults should theoretically decrease as the pages won't need to written out to disk via the paging file as windows will just load the data from the file itself as needed.
Now to reduce files from paging in and out you need to try and go through the data in one direction so that as new data is read, older pages can be discarded for ever. Here is where you trade off going over the files again and caching data - the cache has to be stored somewhere.
Note that Memory Mapped files is how windows loads .dlls and .exes amongst other things. I've used them to scan though gigabyte files without hitting memory limits (we had MBs in those days and not GBs of ram).
However from the data you describe I'd suggest the ability to not go back ovver files will reduce the amount of repaging going on.
On my machine most pagefaults are reported for developer studio which is reported to have 4M page faults after 30+ minutes total CPU time. You get 10 times more, in half the time. And memory is scarce on my system. So 40M faults seems like a lot.
It could just maybe be you have a memory leak.
the working set is only the physical memory in use for your application. If you leak memory, and don't touch it, it will get paged out. You will see the virtual memory useage (or page file use) increase. These pages might be swapped back in when the heap memory walks the heap, to get swapped out again by windows.
Because you have a lot of RAM, the swapped out pages will stay in physical memory, as nobody else needs them. (a page recovered from RAM counts as a soft fault, from disk as a hard one)
Do you use an exponential resize system ?
If you grow the block of memory in too small increments while loading, it might constantly request large blocks from the system, copy the data over, and then release the old block (assuming that fastmm (de)allocates very large blocks directly from the OS).
Maybe somehow this causes a loop where the OS releases memory from your app's process, and then adds it again, causing page faults on first write.
Also avoid Tstringlist.load* methods for very large files, IIRC these consume twice the space needed.

What will happen if a application is large enough to be loaded into the available RAM memory?

There is chance were a heavy weight application that needs to be launched in a low configuration system.. (Especially when the system has too less memory)
Also when we have already opened lot of application in the system & we keep on trying opening new new application what would happen?
I have only seen applications taking time to process or hangs up for sometime when I try operating with it in low config. system with low memory and old processors..
How it is able to accomodate many applications when the memory is low..? (like 128 MB or lesser..)
Does it involves any paging or something else..?
Can someone please let me know the theory behind this..!
"Heavyweight" is a very vague term. When the OS loads your program, the EXE is mapped in your address space, but only the code pages that run (or data pages that are referenced) are paged in as necessary.
You will likely get horrible performance if pages need to constantly be swapped as the program runs (aka many hard page faults), but it should work.
Since your commit charge is near the commit limit, and the commit limit will likely have no room to grow, you will also likely recieve many malloc()/VirtualAlloc(..., MEM_COMMIT)/HeapAlloc()/{Local|Global}Alloc() failures so you need to watch the return codes in your program.
Some keywords for search engines are: paging, swapping, virtual memory.
Wikipedia has an article called Paging (Redirected from Swap space).
There is often the use of virtual memory. Virtual memory pages are mapped to physical memory if they are used. If a physical page is needed and no page is available, another is written to disk. This is called swapping and that explains why crowded systems get slow and memory upgrades have positive effects on performance.

"Mem Usage" higher than "VM Size" in WinXP Task Manager

In my Windows XP Task Manager, some processes display a higher value in the Mem Usage column than the VMSize. My Firefox instance, for example shows 111544 K as mem usage and 100576 K as VMSize.
According to the help file of Task Manager Mem Usage is the working set of the process and VMSize is the committed memory in the Virtual address space.
My question is, if the number of committed pages for a process is A and the number of pages in physical memory for the same process is B, shouldn't it always be B ≤ A? Isn't the number of pages in physical memory per process a subset of the committed pages?
Or is this something to do with sharing of memory among processes? Please explain. (Perhaps my definition of 'Working Set' is off the mark).
Thanks.
Virtual Memory
Assume that your program (eg Oracle) allocated 100 MB of memory upon startup - your VM size goes up by 100 MB though no additional physical / disk pages are touched. ie VM is nothing but memory book keeping.
The total available physical memory + paging file memory is the maximum memory that ALL the processes in the system can allocate. The system does this so that it can ensure that at any point time if the processes actually start consuming all that memory it allocated the OS can supply the actual physical pages required.
Private Memory
If the program copies 10 MB of data into that 100 MB, OS senses that no pages have been allocated to the process corresponding to those addresses and assigns 10 MB worth of physical pages into your process's private memory. (This process is called page fault)
Working Set
Definition : Working set is the set of memory pages that have been recently touched by a program.
At this point these 10 pages are added to the working set of the process. If the process then goes and copies this data into another 10 MB cache previously allocated, everything else remains the same but the Working Set goes up again by 10 Mb if those old pages where not in the working set. But if those pages where already in the working set, then everything is good and the programs working set remains the same.
Working Set behaviour
Imagine your process never touches the first 10 pages ever again, in which case these pages are trimmed off from your process's working set and possibly sent to the page file so that the OS can bring in other pages that are more frequently used. However if there are no urgent low memory requirements, then this act of paging need not be done and OS can act as if its rich in memory. In this case the working set simply lets these pages remain.
When is Working Set > Virtual Memory
Now imagine the same program de-allocates all the 100 Mb of memory. The programs VM size is immediately reduced by 100 MB (remember VM = book keeping of all memory allocation requests)
The working set need not be affected by this, since that doesn't change the fact that those 10 Mb worth of pages where recently touched. Therefore those pages still remain in the working set of the process though the OS can reclaim them whenever it requires.
This would effectively make the VM < working set. However this will rectify if you start another process that consumes more memory and the working set pages are reclaimed by the OS.
XP's Task Manager is simply wrong. EDIT: If you don't believe me (and someone doesn't, because they voted this down), read Firefox 3 Memory Usage. I quote:
If you’re looking at Memory Usage
under Windows XP, your numbers aren’t
going to be so great. The reason:
Microsoft changed the meaning of
“private bytes” between XP and Vista
(for the better).
Sounds like MS got confused. You only change something like that if it's broken.
Try Process Explorer instead. What Task Manager labels "VM Size", Process Explorer (more correctly) labels "Private Bytes". And in Process Explorer, Working Set (and Private Bytes) are always less than or equal to Virtual Size, as you would expect.
File mapping
Very common way how Mem Usage can be higher than VM Size is by using file mapping objects (hence it can be related to shared memory, as file mapping is used to share memory). With file mapping you can have a memory which is committed (either in page file or in physical memory, you do not know), but has no virtual address assigned to it. The committed memory appears in Mem Usage, while used virtual addresses usage is tracked by VM Size.
See also:
What does “VM Size” mean in the Windows Task Manager? on Stackoverflow
Breaking the 32 bit Barrier in my developer blog
Usenet discussion Still confused why working set larger than virtual memory
Memory usage is the amount of electronic memory currently allocated to the process.
VM Size is the amount of virtual memory currently allocated to the process.
so ...
A page that exists only electronically will increase only Memory Usage.
A page that exists only on disk will increase only VM Size.
A page that exists both in memory and on disk will increase both.
Some examples to illustrate:
Currently on my machine, iexplore has 16,000K Memory Usage and 194,916 VM Size. This means that most of the memory used by Internet Explorer is idle and has been swapped out to disk, and only a fraction is being kept in main memory.
Contrast with mcshield.exe with has 98,984K memory usage and 98,168K VM Size. My conclusion here is that McAfee AntiVirus is active, with at lot of memory in use. Since it's been running for quite some time (all day, since booting), I expect that most of the 98,168K VM Size is copies of the electronic memory - though there's nothing in Task Manager to confirm this.
You might find some explaination in The Memory Shell Game
Working Set (A) – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM. These pages fully belong to the process. A working set is like a "currently/recently working on these pages" list.
Virtual Memory – This is a memory that an operating system can address. Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture.
Committed Memory – When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page. It is now backed by a physical memory page. This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk. The memory manager handles the translations from the virtual memory page to the physical page. A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.
BUT: PF (Page File) Usage - This is the total number of committed pages on the system. It does not tell you how many are actually written to the page file. It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.
Hence B > A...
If we agree that B represents "mem usage" or also PF usage, the problem comes from the fact it actually represents potential page usages: in Xp, this potential file space can be used as a place to assign those virtual memory pages that programs have asked for, but never brought into use...
Memory fragmentation is probably the reason:
If the process allocates 1 octet, it counts for 1 octet in the VMSize, but this 1 octet requires a physical page (4K on windows operating system).
If after allocating/freeing memory, the process has a second octet that is separated by more than 4K from the first one, this second octet will always be stored on a separate physical page than the 1 one.
So the VM Size count is 2 octets but the Memory Usage is 2 pages== 8K
So the fact that MemUsage is greater than VMSize shows that process does a lot of allocation and deallocation and fragments the memory.
This could be because the process is started a long time ago.
Or else there is place for optimization ;-)

Resources