Purpose of address-spaced identifiers(ASIDs) - memory

I am currently studying Operating Systems by A Silberschatz, P Galvin, G Gagne.
I am studying memory management strategies, and on section where they introduce Translation Look-aside Buffer (TLB).
Some TLBs store address-space identifiers (ASIDs) in each TLB entry. An ASID uniquely identifies each process and is used to provide address-space protection for that process. When the TLB attempts to resolve virtual page numbers, it ensures that the ASID for the currently running process matches the ASID associated with the virtual page. If the ASIDs do not match, the attempt is treated as a TLB miss.
Above is a quote from the textbook explaining ASID.
I am a bit confused as TLB miss means the logical address weren't able to be matched in TLB, so it has to be checked with Page table to head towards the physical memory.
That being said, ASID is an extra bits for each entry in TLB to check if the process that is accessing that entry belongs to the process.
What I am wondering is, when ASID is used to refuse the process, shouldn't it trap, instead of TLB miss? TLB miss will forward the process to page table, where the logical address for the process will be able to be mapped to certain address in main memory.
Please help me where I am understanding incorrectly.
Thanks!

Lets say you have two processes running on a system. Process A has its 2d page mapped to the 100th page frame and Process B has its 2d page mapped to the 200th page frame.
So now the MMU needs to find page #2, but does not want to read the page tables again. Does it go to page frame 100 or page frame 200?
There are at least two ways of handling that problem. One is to flush the cache whenever there is a process switch.
The other is to assign some unique identifier for each process and include that in the TLB cache entries.
I am a bit confused as TLB miss means the logical address weren't able to be matched in TLB, so it has to be checked with Page table to head towards the physical memory.
To translate logical page #X to a physical page frame:
Look in the TLB for #X. If not there, go to the page table.
[#X exists] Is there an entry for #X with an ASID that matches the current process? If not there, go to the page table.
Use the page mapping in the TLB
What I am wondering is, when ASID is used to refuse the process, shouldn't it trap, instead of TLB miss?
Then you'd get traps the first time the process accessed a page and the program would crash.

Though one year has passed, I happen to have the same problem as yours.
And I found a detailed explanation of TLB miss:
For software-managed TLB, when the machine encounters a TLB miss, the hardware would raise an exception (trap) to the OS (switched to kernel mode), and the trap handler for TLB miss would then look up the page table and update the TLB.
After that, the handler would return to the interrupted instruction (try the instruction that caused the exception again), which would yield TLB hit this time.
operating system three easy pieces, the explanation is at section 19.3

I would think that a TLB miss should trap to the OS (or virtual memory manager) when that TLB was the final TLB for physical/real memory but there are also TLBs for L1 cache, L2 cache, and L3 cache. When a cache TLB has a miss, there may be a hardware page table walker that can resolve the TLB miss much faster than a context switch to the OS (which would also pollute the caches and TLBs).
Multiple processes share a L3 TLB on a multicore processor and two processes share a L1 TLB and L2 TLB when hyperthreading is available and enabled. Each process has its own independent virtual address space, which the TLBs must distinguish.

Related

A problem about virtual memory management in OS

Here is the context of this problem. I am confused that why I don't need to now how many entries in TLB?
For the first question:
when I access data in 0x2330, I find it in main memory since TLB is empty now, then I need 10 + 100 = 110(ns)
when I access data in 0x0565, I meet a page fault so I need 500(ns) , then I load it in TLB and main memory(now I should replace one page in main memory because the resident set contains 2 frames, but which page should I replace? The problem just say we use LRU replacement policy in TLB)
when I access data in 0x2345, what things may happen? I'm not sure ;w;
I am confused that why I don't need to now how many entries in TLB?
For an answer to be 100% correct, you do need to know how many TLB entries there are. There are 2 possibilities:
a) There are 2 or more TLB entries. In this case you have no reason to care what the replacement algorithm is and the question's "And the TLB is replaced with LRU replacement policy" is an unnecessary distraction. In practice for real hardware (but not necessary in theory for academia) this is extremely likely.
b) There is only 1 TLB entry. In this case you have no reason to care what the replacement algorithm is for a different reason - any TLB miss must cause the previous contents to be evicted (and all of those accesses will be "TLB miss" because no page is accessed 2 or more times in a row).
To work around this problem I'd make the most likely assumption (that there are 2 or more TLB entries) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in question), the answers are:").
when I access data in 0x2330, I find it in main memory since TLB is empty now, then I need 10 + 100 = 110(ns)
That's not quite right. The CPU would have to access the TLB to determine that it's a "TLB miss", then fetch the translation from memory into the TLB, then either:
(if there are no cache/s) fetch the data from memory (at physical address 0x27330) to do the final access; or
(if there are cache/s) check if the data is already cached and either:
(if "cache hit") fetch the data from cache, or
(if "cache miss") fetch the data from memory
Sadly; the question doesn't mention anything about cache/s. To work around that problem I'd make the most likely assumption (that there are no caches - see notes) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in the question) and that there are also no cache/s (also not stated in the question), the answers are:").
Note: "No cache/s" is the most likely assumption for the context (academia focusing on teaching virtual memory in isolation); but is also the least likely assumption in the real world.
when I access data in 0x0565, I meet a page fault so I need 500(ns) , then I load it in TLB and main memory(now I should replace one page in main memory because the resident set contains 2 frames, but which page should I replace?
You're right again - the question only says "The OS adopts fixed allocation and local replacement policy" (and doesn't say if it uses LRU or something else).
To work around this problem I'd make a sane assumption (that the OS uses LRU replacment policy) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in the question), and that there are also no cache/s (also not stated in the question), and that the OS is using an LRU page replacement policy (also not stated in the question); the answers are:").

How To Calculate Process Size from TLB size and mean memory access time

In a paged memory system where the entire process is loaded in memory(not demand paging), you have a 35-entry TLB. Assume that the probability of a memory access being in any one page is the same (unlike normal). If the TLB search time is 5nsec and memory access time is 50nsec, how large is the process in the pages if the effective access time is 70nsec?
How do I calculate the size of the process??
This system uses virtual memory in process, and to make access to memory, virtual address should be translated into physical. This translation is stored in the same physical memory using some variant of "page table" (https://en.wikipedia.org/wiki/Page_table).
In variant without TLB (TLB of size zero entry), every program access will need to read translation entry from page table before real access can be made. So, effective (average) memory access time will be equal to 2 * main memory access time.
eff_time(TLB_size_0) = 2 * main_memory_access_time
TLB is optimization (check http://ostep.org book for more details of real world TLB), which caches several recently used translations (and every translation entry describes 1 page). In ideal case all virtual addresses used by program will hit TLB, and there will be only latency of TLB added to memory access time. With 35-entry TLB this will be true for programs (or periods of time) when no more than 35 pages are accessed.
But when program does uniform memory accesses and uses more pages (has bigger size in page count) than can be stored in TLB, some accesses will need to do "page table walk" (in case of 1 level page table - 1 additional memory access) to refill some TLB entry. If 1/5 of program memory accesses miss TLB (and 4/5 don't), mean effective access time will be like
eff_time (TLB_miss_rate_of_1_over_5) = (1-1/5) * full_access_time_with_TLB_hit + 1/5 * full_access_time_with_TLB_miss
where full_access_time_with_TLB_hit is time to do successful search in TLB and do 1 main memory access, and full_access_time_with_TLB_miss is time to do unsucessful TLB search, access page table (by reading main memory), (possibly refill TLB, and redo search if you MMU is not optimized) and then doing access of memory which was required by application.

Do modern OS's use paging and segmentation?

I was reading about memory architecture and I got a bit confused with the paging and segmentation. I read that modern OS systems use only paging to manage memory access but looking at a disassembled codes I can see segments like "ds" and "fs". Does it means that the OS (saw that on Windows and Linux) is using both segmentation and paging or is it just mapping all the segments into the same pages (making segments irrelevant) ?
Ok, based on the book Modern Operating Systems 3rd Edition by Andrew S. Tanenbaum and the materials form Open Security Training (opensecuritytraining.info), i manage to understand the segmentation and paging and the answer for my question is:
Concepts:
1.1.Segmentation:
Segmentation is the division of the memory into pieces (sections) called segments. These segments are independents from each other, have variable sizes and may grow as needed.
1.2. Virtual Memory:
A virtual memory is an abstraction of real memory. This means that it maps a virtual address (used by programs) into a physical address (used by the hardware). If a program wants to access the memory 2000 (mov eax, 2000), the virtual address (2000) will be converted into the real physical address ( for example 1422) and then the program will access the memory 1422 thinking that he’s accessing the memory 2000.
So, if virtual memory is being used by the system, programs no longer access real memory directly, instead, they used the correspondent virtual memory.
1.3. Paging:
Paging is a virtual memory management scheme. As explained before, it maps a virtual address into a physical address. Paging divides the virtual memory into pieces called “pages” and also divides the physical memory into pieces called “frame pages”. A page can be bound to one or more frame page ( keep in mind that a page can map different frame pages, but just one at the time)
Advantages and Disadvantages
2.1. Segmentation:
The main advantage of using segmentation is to divide the memory into different linear address spaces. With this programs can have an address space for his code, other for the stack, another for dynamic variables (heap) etc. The disadvantage for segmentation is the fragmentation of the memory.
Consider this example – One portion of the memory is divided into 4 segments sequentially allocated -> seg1(in use) --- seg2(in use)--- seg3(in use)---seg4(in use). Now if we free the memory from seg2, we will have -> seg1(in use) --- seg2(FREE)--- seg3(in use)---seg4(in use). If we want to allocate some data, we can do it by using seg2, but if the size of data is bigger than the size of the seg2, we won’t be able to do so and the space will be wasted and the memory fragmented. Another problem is that some segments could have a larger size and since this segment can’t be “broken” into smaller pieces, it must be fully allocated in memory.
2.1. Paging:
The main advantages of using paging is to abstract the physical memory, hence, the programs (and programmers) don’t need to bother with memory addresses. You can have two programs that access the (virtual) memory 2000 because it will be mapped into two different physical memories. Also, it’s possible to use the hard disk to ensure that only the necessary pages are allocated into memory. The main disadvantage is that paging uses only one linear address space. Paging maps virtual memories from 0 to the max addressable value for all programs. Now consider this example - two chunks of memory, “chk1” and “chk2” are next to each other on virtual memory ( chk1 is allocated from address 1000 to 2000 and chk2 uses from 2001 to 3000), if chk1 needs to grow, the OS needs to make room in the memory for the new size of chk1. If that happens again, the OS will have to do the same thing or, if no space can be found, an exception will pop. The routines for managing this kind of operations are very bad for performance, because it this growth and shrink of the memory happens a lot of time.
Segmentation and Paging
3.1. Why combine both?
An operational systems can combine both segmentation and paging. Why? Because combining then it’s possible to have the best from both without their disadvantages. So, the memory is divided in many segments and each one of those have one or more pages. When a program tries to access a memory address, first the OS goes to the correspondent segment, there he’ll find the correspondent page, so he goes to the page and there he’ll find the frame page that the program wanted to access. Using this approach, the OS divides the memory into various segments, including segments for the kernel and user programs. These segments have variable sizes and access protection features. To solve the fragmentation and the “big segment” problems, paging is used. With paging a larger segment is broken into several pages and only the necessary pages remains in memory (and more pages could be allocated in memory if needed). So, basically, the Modern OSs have two memory abstractions, where segmentation is more used for “handling the programs” and Paging for “managing the physical memory”.
3.2. How it really works?
An OS running segmentation and Paging will have the following structures:
3.2.1. Segment Selector:
This represents an index on the Global/Local Descriptor Table. It contains 3 fields that represent the index on the descriptor table, a bit to identify if that segment is present on Global or Local descriptor table and a privilege level.
3.2.2. Segment Register:
A CPU register used to store segment selector. Usually ( on a x86 machine) there is at least the register CS (Code Segment) and DS (Data Segment).
3.2.3. Segment descriptors:
This structure contains the data regarding a segment, like his base address, size ( in pages or in bytes), the access privilege, the information of if this segment is present on the memory or not, etc. (for all the fields, search for segment descriptors on google)
3.2.4. Global/Local Descriptor Table:
The table that contains several segment descriptors. So this structure holds all the segment descriptors for the system. If the table is Global, it can hold other things like Local Descriptor Tables descriptors, Task State Segment Descriptors and Call Gate descriptors (I’ll not explain these here, please, google them). If the table is Local, it will only ( as far as I know) holds user related segment descriptors.
3.3. How it works?
So, to access a memory, first the program needs to access a segment. To do this a segment selector is loaded into a segment register and then a Global or Local Descriptor table (depending of field on the segment selector). This is the reason of the fully memory address be SEGMENT REGISTER: ADDRESS , like CS:ADDRESS -> 001B:0044BF7A. Now the OS goes to the G/LDT and (using the index field of the segment selector) finds the segment descriptor of the address trying to access. Then it checks if the segment if present, the protection and if everything is ok it goes to the address stated on the “base field” (of the descriptor) + the address offset . If paging is not enabled, the system goes directly into the real memory, but with paging on the address is treated as a virtual address and it goes to the Page directory. The base address + the offset are called Linear Address and will be interpreted as 3 fields: Directory+Page+offset. So on the Directory page, it will search for the directory entry specified on the “directory” field of the linear address, this entry points to the page table and the field “page” of the linear address is used to find the page, this entry points to a frame page and the offset is used to find the exactly address that the program want to access.
3.4. Modern Operational Systems
Modern OSes "do not use" segmentation. Its in quotes because they use 4 segments: Kernel Code Segment, Kernel Data Segment, User Code Segment and User Data Segment. What does it means is that all user's processes have the same code and data segments (so the same segment selector). The segments only change when going from user to kernel.
So, all the path explained on the section 3.3. occurs, but they use the same segments and, since the page tables are individual per process, a page fault is difficult to happen.
Hope this helps and if there is any mistakes or some more details ( I was I bit generic on some parts) please, feel free to comment and reply.
Thank you guys
Danilo PC
They only use paging for memory protection, while they use segmentation for other purposes (like storing thread-local data).

Can Intel processors delay TLB invalidations?

This in reference to InteI's Software Developer’s Manual (Order Number: 325384-039US May 2011), the section 4.10.4.4 "Delayed Invalidation" describes a potential delay in invalidation of TLB entries which can cause unpredictable results while accessing memory whose paging-structure entry has been changed.
The manual says ...
"Required invalidations may be delayed under some circumstances. Software devel-
opers should understand that, between the modification of a paging-structure entry
and execution of the invalidation instruction recommended in Section 4.10.4.2, the
processor may use translations based on either the old value or the new value of the
paging-structure entry. The following items describe some of the potential conse-
quences of delayed invalidation:
If a paging-structure entry is modified to change the R/W flag from 0 to 1, write
accesses to linear addresses whose translation is controlled by this entry may or
may not cause a page-fault exception."
Let us suppose a simple case, where a page-strucure entry is modified (r/w flag is flipped from 0 to 1) for a linear address and after that the corresponding TBL invalidation instruction is called immediately. My question is--as a consiquence of delayed invalidation of TLB s it possible that even after calling invalidation of TLB a write access to the linear address in question doesn't fault (page fault)?
Or is the "delayed invalidation" can only cause unpredictable results when "invalidate" instruction for the linear address whose page-structure has changed has not been issued?
TLBs are transparently optimisitically not uncached by CR3 changes. TLBs entries are marked with a unique identifier for the address-space and are left in the TLB until they are either touched by the wrong process (in which case the TLB entry is trashed) or the address-space is restored (in which case the TLB was preserved over the address-space changing).
This all happens transparently to the CPU. Your program (or OS) shouldn't be able to tell the difference between this behaviour and the TLB being actually invalidated by a TLB invalidation except via:
A) Timing - i.e. TLB optimisticly not uncaching is faster (which is why they do it)
B) There are edge cases where this behaviour is somewhat undefined. If you modify the code page on which you're sitting or touch memory you've just changed, the old value of the TLB might still be there (even across a CR3 change).
The solution to this is to either:
1) force a TLB update via a invlpg instruction. This purges the TLB entry, triggering a TLB read-in on the next touch of the page.
2) disable and re-enable paging via the CR0 register.
3) mark all pages as un-cachable via the cache-disable bit in CR0 or on all of the pages of the TLB (TLB entries marked uncachable are auto-purged after use).
4) Change the mode of the code-segment.
Note that this is genuinely undefined behaviour. Transitioning to SMM can invalidate the TLB, or might not, leaving this open to a race-condition. Don't depend on this behaviour.

4 questions about processor architecture. (Computer engineering)

Our teachers has asked us around 50 true of false questions in preparation for our final exam. I could find an answer for most of them online or by asking relative. How ever, those 4 questions adrive driving me crazy. Most of those question aren't that hard, I just cant get any satisfying answer anywhere. Sorry, the original question are not written in english, i had to translate them myself. If you don't understand something, please tell me.
Thanks!
True or false
The size of the manipulated address by the processor determines the size of the virtual memory. How ever, the size of the memory cache is independent.
For long, DRAM technology stayed imcompatible with CMOS technology used to do the standard logic in processor. This is the reason DRAM memory is (most of the time) used outside of the processor (on a different chip).
Pagination let correspond multiple virtual addressing space to a same space of physical addressing.
An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block.
"Manipulated address" is not a term of the art. You have an m-bit virtual address mapping to an n-bit physical address. Yes, a cache may be of any size up to the physical address size, but typically is much smaller. Note that cache lines are tagged with virtual or more typically physical address bits corresponding to the maximum virtual or physical address range of the machine.
Yes, DRAM processes and logic processes are each tuned for different objectives, and involve different process steps (different materials and thicknesses to lay down DRAM capacitor stacks/trenches, for example) and historically you haven't built processors in DRAM processes (except the Mitsubishi M32RD) nor DRAM in logic processes. Exception is so-called eDRAM that IBM likes to use for their SOI processes, and which is used as last level cache in IBM microprocessors such as the Power 7.
"Pagination" is what we call issuing a form feed so that text output begins at the top of the next page. "Paging" on the other hand is sometimes a synonym for virtual memory management, by which a virtual address is mapped (on a page by page basis) to a physical address. If you set up your page tables just so it allows multiple virtual addresses (indeed, virtual addresses from different processes' virtual address spaces) to map to the same physical address and hence the same location in real RAM.
"An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block."
Hmm, that's a strange question. Let's break it down. 1) You can have a direct mapped cache, in which an address maps to only one cache line. 2) You can have a fully associative cache, in which an address can map to any cache line; there is something like a CAM (content addressible memory) tag structure to find which if any line matches the address. Or 3) you can have an n-way set associative cache, in which you have, essentially, n sets of direct mapped caches, and a given address can map to one of n lines. There are other more esoteric cache organizations, but I doubt you're being taught them.
So let's parse the statement. "An associative cache memory". Well that rules out direct mapped caches. So we're left with "fully associative" and "n-way set associative". It has sets of 1 line. OK, so if it is set associative, then instead of something traditional like 4-ways x 64 lines/way, it is n-ways x 1 lines/way. In other words, it is fully associative. I would say this is a true statement, except the term of the art is "fully associative" not "entirely associative."
Makes sense?
Happy hacking!
True, more or less (it depends on the accuracy of your translation I guess :) ) The number of bits in addresses sets an upper limit on the virtual memory space; you could, of course, choose not to use all the bits. The size of the memory cache depends on how much actual memory is installed, which is independent; but of course if you had more memory than you can address, then it still can't be used.
Almost certainly false. We have RAM on separate chips so that we can install more without building a whole new computer or replacing the CPU.
There is no a-priori upper or lower limit to the cache size, though in a real application certain sizes make more sense than others, of course.
I don't know of any incompatibility. The reason why we use SRAM as on-die cache is because it's faster.
Maybe you can force an MMUs to map different virtual addresses to the same physical location, but usually it's used the other way around.
I don't understand the question.

Resources