A problem about virtual memory management in OS - memory

Here is the context of this problem. I am confused that why I don't need to now how many entries in TLB?
For the first question:
when I access data in 0x2330, I find it in main memory since TLB is empty now, then I need 10 + 100 = 110(ns)
when I access data in 0x0565, I meet a page fault so I need 500(ns) , then I load it in TLB and main memory(now I should replace one page in main memory because the resident set contains 2 frames, but which page should I replace? The problem just say we use LRU replacement policy in TLB)
when I access data in 0x2345, what things may happen? I'm not sure ;w;

I am confused that why I don't need to now how many entries in TLB?
For an answer to be 100% correct, you do need to know how many TLB entries there are. There are 2 possibilities:
a) There are 2 or more TLB entries. In this case you have no reason to care what the replacement algorithm is and the question's "And the TLB is replaced with LRU replacement policy" is an unnecessary distraction. In practice for real hardware (but not necessary in theory for academia) this is extremely likely.
b) There is only 1 TLB entry. In this case you have no reason to care what the replacement algorithm is for a different reason - any TLB miss must cause the previous contents to be evicted (and all of those accesses will be "TLB miss" because no page is accessed 2 or more times in a row).
To work around this problem I'd make the most likely assumption (that there are 2 or more TLB entries) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in question), the answers are:").
when I access data in 0x2330, I find it in main memory since TLB is empty now, then I need 10 + 100 = 110(ns)
That's not quite right. The CPU would have to access the TLB to determine that it's a "TLB miss", then fetch the translation from memory into the TLB, then either:
(if there are no cache/s) fetch the data from memory (at physical address 0x27330) to do the final access; or
(if there are cache/s) check if the data is already cached and either:
(if "cache hit") fetch the data from cache, or
(if "cache miss") fetch the data from memory
Sadly; the question doesn't mention anything about cache/s. To work around that problem I'd make the most likely assumption (that there are no caches - see notes) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in the question) and that there are also no cache/s (also not stated in the question), the answers are:").
Note: "No cache/s" is the most likely assumption for the context (academia focusing on teaching virtual memory in isolation); but is also the least likely assumption in the real world.
when I access data in 0x0565, I meet a page fault so I need 500(ns) , then I load it in TLB and main memory(now I should replace one page in main memory because the resident set contains 2 frames, but which page should I replace?
You're right again - the question only says "The OS adopts fixed allocation and local replacement policy" (and doesn't say if it uses LRU or something else).
To work around this problem I'd make a sane assumption (that the OS uses LRU replacment policy) and then clearly state that assumption in my answer (e.g. "Assuming there are 2 or more TLB entries (not stated in the question), and that there are also no cache/s (also not stated in the question), and that the OS is using an LRU page replacement policy (also not stated in the question); the answers are:").

Related

T or F: If a mahine using paged virutal memory has a 24-bit logical address and a 32-bit physical address, a page fault will never occur

I'm working on a practice final exam and I can't seem to figure out the answer to this question.
My understanding is that every initial page being brought in counts as a page fault, so even without the address lengths, this question should be false, correct? If we forget about this for a second, is the answer true? My thought behind this is that since the logical address only has 24 bits while the physical address has 32 bits, there would never be a case where the page has to be in a frame that is already occupied. Is more information required (such as page size) for this realm of reasoning?
every initial page being brought in counts as a page fault
Just as a note, this is true only if you create the process (populate the PCB, process control block) but you don't actually assign any frame. The first (and some of the other) reference (basically, the first istruction) will generate a page fault.
This is why you (you as the OS) have to assign a sufficent number of frame to avoid early page fault (and, with a pinch of luck and a good pager, even later in the execution of the process).
Back at your question: the answer is false (depends is more correct).
The reason is simple: if you don't know the size of the memory, you can't actually know how many frame do you have at hand. So the address size is totally useless in this specific context.

Purpose of address-spaced identifiers(ASIDs)

I am currently studying Operating Systems by A Silberschatz, P Galvin, G Gagne.
I am studying memory management strategies, and on section where they introduce Translation Look-aside Buffer (TLB).
Some TLBs store address-space identifiers (ASIDs) in each TLB entry. An ASID uniquely identifies each process and is used to provide address-space protection for that process. When the TLB attempts to resolve virtual page numbers, it ensures that the ASID for the currently running process matches the ASID associated with the virtual page. If the ASIDs do not match, the attempt is treated as a TLB miss.
Above is a quote from the textbook explaining ASID.
I am a bit confused as TLB miss means the logical address weren't able to be matched in TLB, so it has to be checked with Page table to head towards the physical memory.
That being said, ASID is an extra bits for each entry in TLB to check if the process that is accessing that entry belongs to the process.
What I am wondering is, when ASID is used to refuse the process, shouldn't it trap, instead of TLB miss? TLB miss will forward the process to page table, where the logical address for the process will be able to be mapped to certain address in main memory.
Please help me where I am understanding incorrectly.
Thanks!
Lets say you have two processes running on a system. Process A has its 2d page mapped to the 100th page frame and Process B has its 2d page mapped to the 200th page frame.
So now the MMU needs to find page #2, but does not want to read the page tables again. Does it go to page frame 100 or page frame 200?
There are at least two ways of handling that problem. One is to flush the cache whenever there is a process switch.
The other is to assign some unique identifier for each process and include that in the TLB cache entries.
I am a bit confused as TLB miss means the logical address weren't able to be matched in TLB, so it has to be checked with Page table to head towards the physical memory.
To translate logical page #X to a physical page frame:
Look in the TLB for #X. If not there, go to the page table.
[#X exists] Is there an entry for #X with an ASID that matches the current process? If not there, go to the page table.
Use the page mapping in the TLB
What I am wondering is, when ASID is used to refuse the process, shouldn't it trap, instead of TLB miss?
Then you'd get traps the first time the process accessed a page and the program would crash.
Though one year has passed, I happen to have the same problem as yours.
And I found a detailed explanation of TLB miss:
For software-managed TLB, when the machine encounters a TLB miss, the hardware would raise an exception (trap) to the OS (switched to kernel mode), and the trap handler for TLB miss would then look up the page table and update the TLB.
After that, the handler would return to the interrupted instruction (try the instruction that caused the exception again), which would yield TLB hit this time.
operating system three easy pieces, the explanation is at section 19.3
I would think that a TLB miss should trap to the OS (or virtual memory manager) when that TLB was the final TLB for physical/real memory but there are also TLBs for L1 cache, L2 cache, and L3 cache. When a cache TLB has a miss, there may be a hardware page table walker that can resolve the TLB miss much faster than a context switch to the OS (which would also pollute the caches and TLBs).
Multiple processes share a L3 TLB on a multicore processor and two processes share a L1 TLB and L2 TLB when hyperthreading is available and enabled. Each process has its own independent virtual address space, which the TLBs must distinguish.

How to learn the associativity (number of way) of the TLB?

I have a task to learn the number of ways in TLB-cache. Which algorithm should I use?
The question is a bit unclear as to what you need help with, so this is a summary of information related to the topics you mention.
There are two "ways" to get to memory - direct mapping, where the page table is kept in memory and is indexed by virtual page number. To translate from virtual page number to real page number the OS goes to the base address of the page table and adds the virtual page number. The value at this location gives the real address of the page.
The other way is associative mapping. Associative mapping keeps the page table in content-addressed memory, so when a virtual address is looked up, all the process's pages are searched in parallel giving O(1) lookup time complexity. Another advantage is that this stores only the pages that have been actually allocated.
The problem is that associative mapping requires special hardware to accomplish the content-addressed memory.
So the trade-off is that a small amount of content-addressed memory is used (a TLB = translation lookaside buffer which you refer to in your question), with the majority using direct mapping.
Then the big consideration is when to place an address in the TLB and which old one to evict from the TLB. For this there are many choices: most likely it will be Least Recently Used (LRU) to exploit temporal locality. Other choices could be Least Frequently Used, Round Robin (probably not very good here), WS_Clock, etc.

Can Intel processors delay TLB invalidations?

This in reference to InteI's Software Developer’s Manual (Order Number: 325384-039US May 2011), the section 4.10.4.4 "Delayed Invalidation" describes a potential delay in invalidation of TLB entries which can cause unpredictable results while accessing memory whose paging-structure entry has been changed.
The manual says ...
"Required invalidations may be delayed under some circumstances. Software devel-
opers should understand that, between the modification of a paging-structure entry
and execution of the invalidation instruction recommended in Section 4.10.4.2, the
processor may use translations based on either the old value or the new value of the
paging-structure entry. The following items describe some of the potential conse-
quences of delayed invalidation:
If a paging-structure entry is modified to change the R/W flag from 0 to 1, write
accesses to linear addresses whose translation is controlled by this entry may or
may not cause a page-fault exception."
Let us suppose a simple case, where a page-strucure entry is modified (r/w flag is flipped from 0 to 1) for a linear address and after that the corresponding TBL invalidation instruction is called immediately. My question is--as a consiquence of delayed invalidation of TLB s it possible that even after calling invalidation of TLB a write access to the linear address in question doesn't fault (page fault)?
Or is the "delayed invalidation" can only cause unpredictable results when "invalidate" instruction for the linear address whose page-structure has changed has not been issued?
TLBs are transparently optimisitically not uncached by CR3 changes. TLBs entries are marked with a unique identifier for the address-space and are left in the TLB until they are either touched by the wrong process (in which case the TLB entry is trashed) or the address-space is restored (in which case the TLB was preserved over the address-space changing).
This all happens transparently to the CPU. Your program (or OS) shouldn't be able to tell the difference between this behaviour and the TLB being actually invalidated by a TLB invalidation except via:
A) Timing - i.e. TLB optimisticly not uncaching is faster (which is why they do it)
B) There are edge cases where this behaviour is somewhat undefined. If you modify the code page on which you're sitting or touch memory you've just changed, the old value of the TLB might still be there (even across a CR3 change).
The solution to this is to either:
1) force a TLB update via a invlpg instruction. This purges the TLB entry, triggering a TLB read-in on the next touch of the page.
2) disable and re-enable paging via the CR0 register.
3) mark all pages as un-cachable via the cache-disable bit in CR0 or on all of the pages of the TLB (TLB entries marked uncachable are auto-purged after use).
4) Change the mode of the code-segment.
Note that this is genuinely undefined behaviour. Transitioning to SMM can invalidate the TLB, or might not, leaving this open to a race-condition. Don't depend on this behaviour.

4 questions about processor architecture. (Computer engineering)

Our teachers has asked us around 50 true of false questions in preparation for our final exam. I could find an answer for most of them online or by asking relative. How ever, those 4 questions adrive driving me crazy. Most of those question aren't that hard, I just cant get any satisfying answer anywhere. Sorry, the original question are not written in english, i had to translate them myself. If you don't understand something, please tell me.
Thanks!
True or false
The size of the manipulated address by the processor determines the size of the virtual memory. How ever, the size of the memory cache is independent.
For long, DRAM technology stayed imcompatible with CMOS technology used to do the standard logic in processor. This is the reason DRAM memory is (most of the time) used outside of the processor (on a different chip).
Pagination let correspond multiple virtual addressing space to a same space of physical addressing.
An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block.
"Manipulated address" is not a term of the art. You have an m-bit virtual address mapping to an n-bit physical address. Yes, a cache may be of any size up to the physical address size, but typically is much smaller. Note that cache lines are tagged with virtual or more typically physical address bits corresponding to the maximum virtual or physical address range of the machine.
Yes, DRAM processes and logic processes are each tuned for different objectives, and involve different process steps (different materials and thicknesses to lay down DRAM capacitor stacks/trenches, for example) and historically you haven't built processors in DRAM processes (except the Mitsubishi M32RD) nor DRAM in logic processes. Exception is so-called eDRAM that IBM likes to use for their SOI processes, and which is used as last level cache in IBM microprocessors such as the Power 7.
"Pagination" is what we call issuing a form feed so that text output begins at the top of the next page. "Paging" on the other hand is sometimes a synonym for virtual memory management, by which a virtual address is mapped (on a page by page basis) to a physical address. If you set up your page tables just so it allows multiple virtual addresses (indeed, virtual addresses from different processes' virtual address spaces) to map to the same physical address and hence the same location in real RAM.
"An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block."
Hmm, that's a strange question. Let's break it down. 1) You can have a direct mapped cache, in which an address maps to only one cache line. 2) You can have a fully associative cache, in which an address can map to any cache line; there is something like a CAM (content addressible memory) tag structure to find which if any line matches the address. Or 3) you can have an n-way set associative cache, in which you have, essentially, n sets of direct mapped caches, and a given address can map to one of n lines. There are other more esoteric cache organizations, but I doubt you're being taught them.
So let's parse the statement. "An associative cache memory". Well that rules out direct mapped caches. So we're left with "fully associative" and "n-way set associative". It has sets of 1 line. OK, so if it is set associative, then instead of something traditional like 4-ways x 64 lines/way, it is n-ways x 1 lines/way. In other words, it is fully associative. I would say this is a true statement, except the term of the art is "fully associative" not "entirely associative."
Makes sense?
Happy hacking!
True, more or less (it depends on the accuracy of your translation I guess :) ) The number of bits in addresses sets an upper limit on the virtual memory space; you could, of course, choose not to use all the bits. The size of the memory cache depends on how much actual memory is installed, which is independent; but of course if you had more memory than you can address, then it still can't be used.
Almost certainly false. We have RAM on separate chips so that we can install more without building a whole new computer or replacing the CPU.
There is no a-priori upper or lower limit to the cache size, though in a real application certain sizes make more sense than others, of course.
I don't know of any incompatibility. The reason why we use SRAM as on-die cache is because it's faster.
Maybe you can force an MMUs to map different virtual addresses to the same physical location, but usually it's used the other way around.
I don't understand the question.

Resources