Indirect/Indexed Addressing Mode - instruction-set

When the instruction LOAD 800 is fed I understand how the other values are loaded into the accumulator but I don't know how you get the results for indexed and indirect addressing.

Not sure which architecture you're discussing so I'll just explain generically as best I can (based on experience wtih more concrete architectures, and investigative analysis of the stuff shown in the graphic you posted).
Immediate mode means use the immediate value, so something like load r2, #800 would put the immediate value 800 into register 2.
Direct means direct memory access, so something like load r2, 800 loads the value from memory address 800, and that value is 900.
Indirect means indirect memory access, so something like load r2, (800) loads the value from the memory address at memory address 800. The memory address at 800 is 900 and the value at 900 is 1000.
This one is a register/base-address combination like load r2, (r1,#800). What that would do would be to add register 1 and the immediate value 800 (to get 1600) then grab the value from that memory location, giving 700.

Related

T or F: If a mahine using paged virutal memory has a 24-bit logical address and a 32-bit physical address, a page fault will never occur

I'm working on a practice final exam and I can't seem to figure out the answer to this question.
My understanding is that every initial page being brought in counts as a page fault, so even without the address lengths, this question should be false, correct? If we forget about this for a second, is the answer true? My thought behind this is that since the logical address only has 24 bits while the physical address has 32 bits, there would never be a case where the page has to be in a frame that is already occupied. Is more information required (such as page size) for this realm of reasoning?
every initial page being brought in counts as a page fault
Just as a note, this is true only if you create the process (populate the PCB, process control block) but you don't actually assign any frame. The first (and some of the other) reference (basically, the first istruction) will generate a page fault.
This is why you (you as the OS) have to assign a sufficent number of frame to avoid early page fault (and, with a pinch of luck and a good pager, even later in the execution of the process).
Back at your question: the answer is false (depends is more correct).
The reason is simple: if you don't know the size of the memory, you can't actually know how many frame do you have at hand. So the address size is totally useless in this specific context.

nand2tetris. Memory implementation

I realized Data memory implementation in nand2tetris course. But I really don't understand some parts of my implementation:
CHIP Memory {
IN in[16], load, address[15];
OUT out[16];
PARTS:
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);
Or(a=RAM1, b=RAM2, out=RAM);
RAM16K(in=in, load=RAM, address=address[0..13], out=RAMout);
Screen(in=in, load=scr, address=address[0..12], out=ScreenOut);
Keyboard(out=KeyboardOut);
Mux4Way16(a=RAMout, b=RAMout, c=ScreenOut, d=KeyboardOut, sel=address[13..14], out=out);
}
Is responsible for what load here. I understand that if load is 0 - out of Dmux4Way in any case will be 0 0 0 0. But i don't understand how it works in that case after that. Namely how it allows don't load data in Memory.
At least incomprehensible why in Screen we fed address[0..12] instead address[0..14] - full address. In my opinion we should use second because Screen memory map stay after RAM memory map and if we want to request for Screen memory map - we should use range (16 384 - 24 575) - decimal or (100000000000000 - 101111111111111) - binary. But how we can represent that range use just 13 width buss (address[0..12]) ??? It's impossible.
Therefore if we want to represent Screen memory map we should use range which was presented above. And that range has 15 width or address[0..14] BUT not address[0..12] (width 13). But why works just address[0..12] and doesn't work address[0..14](full address)
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);
I'm sorry to criticize you at the beginning, but questions you ask suggest that you didn't do this exercise yourself or didn't start the whole course from the beginning.
To answer your questions:
Ad.1.
You demultiplex a single bit (load bit) to the correct memory part. Thereafter, you then feed the input data to all memory parts at the same time.
It's easier and neater than doing it the other way around, namely, to direct 16-bit input to the correct part (RAM16K, screen, or keyboard) while having a load bit that is connected and active at every register in all the parts.
To clarify. You have 2 possible destinations when writing data: RAM and Screen. The smallest demultiplexer you have is a 4-way multiplexer and that's what you're using. When you write into memory, you need to provide 2 pieces of information: the data and destination, both at the same time.
You might demultiplex the input data with DMux4Way16 and separately single load bit with DMux4Way but that would take 2 demultiplexers, and we can do better than that. That's what's done here, you direct data input to both RAM and Screen and then only use one demultiplexer : DMux4Way to select one of 2 possible destinations, only the one selected will be loaded with new data, on the other data input will be ignored. Knowing that, you need to study A-instruction format: when bit 14 and 13 of A-instruction (or data residing in A-register) have the binary value 00 or 01, the destination is RAM. When bit 14 and 13 have the binary value 10, it means the screen is the destination.
When you notice that you choose these 2 bits as sel for your demultiplexer. Selections 0 and 1 have the same meaning, so you can OR them and feed the output as load to RAM. Selection 2 means Screen will be loaded with a, new value, so load bit goes there. Selection 3 is never used so we don't care about it - output d of demultiplexer will not be connected anywhere. We make use of the demultiplexer's feature: The selected output will have value 1 and all other outputs will yield 0 as a result. It means only 1 memory destination will be loaded.
Ad.2.
Screen is separate device, it has nothing to do with RAM, ROM or Keyboard memory devices here. You, and only you, give meaning to what bits mean what to this specific device. To answer your question, when you address some register in Screen you address it in its own internal address space. In its internal address space first address will be 0, but from whole Memory it will be 16384. It's your job to make this transition. In this particular case, size of Screen memory device it is not necessary to use 14-bit address bus, 13 bits is all you need. What would 14th bit mean in this case? It wouldn't add any value. Also, you are user and not designer of Screen, you only look at and follow its interface description.
Hope it answers your questions, if not I urge you to go back and study more carefully previous hardware related chapters from course.

Calculate hash of the BIOS

I've read that BIOS is mapped to memory at f000:. At f000:fff0 I see JMP to f000:e05b. At e05b another jump. So, the code jumps many times within f000 segment. So, the questions:
1) If I calculate hash of the segment f000:0000 - f000:ffff will I get the hash of the BIOS code?
2) Whether the all bytes of the segment are constant during warm reboot?
Not necessarily. The BIOS ROM may map to a larger or smaller area than that (though some early BIOSes did map to exactly that memory range).
Probably, but again, not necessarily.

How is RAM able to acess any place in memory at O(1) speed

We are taught that the abstraction of the RAM memory is a long array of bytes. And that for the CPU it takes the same amount of time to access any part of it. What is the device that has the ability to access any byte out of the 4 gigabytes (on my computer) in the same time? As this does not seem as a trivial task for me.
I have asked colleagues and my professors, but nobody can pinpoint to the how this task can be achieved with simple logic gates, and if it isn't just a tricky combination of logic gates, then what is it?
My personal guess is that you could achieve the access of any memory in O(log(n)) speed, where n would be the size of memory. Because each gate would split the memory in two and send you memory access instruction to the next split the memory in two gate. But that requires ALOT of gates. I can't come up with any other educated guess, and I don't even know the name of the device that I should look up in Google.
Please help my anguished curiosity, and thanks in advance.
edit<
This is what I learned!
quote from yours "the RAM can send the value from cell addressed X to some output pins", here is where everyone skip (again) the thing that is not trivial for me. The way that I see it, In order to build a gate that from 64 pins decides which byte out of 2^64 to get, each pin needs to split the overall possible range of memory into two. If bit at index 0 is 0 -> then the address is at memory 0-2^64/2, else address is at memory 2^64/2-2^64. And so on, However the amout of gates (lets call them) that the memory fetch will go through will be 64, (a constant). However the amount of gates needed is N, where N is the number of memory bytes there are.
Just because there is 64 pins, it doesn't mean that you can still decode it into a single fetch from a range of 2^64. Does 4 gigabytes memory come with a 4 gigabytes gates in the memory control???
now this can be improved, because as I read furiously more and more about how this memory is architectured, if you place the memory into a matrix with sqrt(N) rows and sqrt(N) columns, the amount of gates that a fetch memory will need to go through is O(log(sqrt(N)*2) and the amount of gates that will be required will be 2*sqrt(N), which is much better, and I think that its probably a trade secret.
/edit<
What the heck, I might as well make this an answer.
Yes, in the physical world, memory access cannot be constant time.
But it cannot even be logarithmic time. The O(log n) circuit you have in mind ultimately involves some sort of binary (or whatever) tree, and you cannot make a binary tree with constant-length wires in a 3D universe.
Whatever the "bits per unit volume" capacity of your technology is, storing n bits requires a sphere with radius O(n^(1/3)). Since information can only travel at the speed of light, accessing a bit at the other end of the sphere requires time O(n^(1/3)).
But even this is wrong. If you want to talk about actual limitations of our universe, our physics friends say the absolute maximum number of bits you can store in any sphere is proportional to the sphere's surface area, not its volume. So the actual radius of a minimal sphere containing n bits of information is O(sqrt(n)).
As I mentioned in my comment, all of this is pretty much moot. The models of computation we generally use to analyze algorithms assume constant-access-time RAM, which is close enough to the truth in practice and allows a fair comparison of competing algorithms. (Although good engineers working on high-performance code are very concerned about locality and the memory hierarchy...)
Let's say your RAM has 2^64 cells (places where it is possible to store a single value, let's say 8-bit). Then it needs 64 pins to address every cell with a different number. When at the input pins of your RAM there 'appears' a binary number X the RAM can send the value from cell addressed X to some output pins, and your CPU can get the value from there. In hardware the addressing can be done quite easily, for example by using multiple NAND gates (such 'addressing device' from some logic gates is called a decoder).
So it is all happening at the hardware-level, this is just direct addressing. If the CPU is able to provide 64 bits to 64 pins of your RAM it can address every single memory cell (as 64 bit is enough to represent any number up to 2^64 -1). The only reason why you do not get the value immediately is a kind of 'propagation time', so time it takes for the signal to go through all the logic gates in the circuit.
The component responsible for dealing with memory accesses is the memory controller. It is used by the CPU to read from and write to memory.
The access time is constant because memory words are truly layed out in a matrix form (thus, the "byte array" abstraction is very realistic), where you have rows and columns. To fetch a given memory position, the desired memory address is passed on to the controller, which then activates the right column.
From http://computer.howstuffworks.com/ram1.htm:
Memory cells are etched onto a silicon wafer in an array of columns
(bitlines) and rows (wordlines). The intersection of a bitline and
wordline constitutes the address of the memory cell.
So, basically, the answer to your question is: the memory controller figures it out. Of course that, given a memory address, the mapping to column and row must be easy to calculate in a constant time.
To fully understand this topic, I recommend you to read this guide on how memory works: http://computer.howstuffworks.com/ram.htm
There are so many concepts to master that it is difficult to explain it all in one answer.
I've been reading your comments and questions until I answered. I think you are on the right track, but there is some confusion here. The random access in which you are implying doesn't exist in the same way you think it does.
Reading, writing, and refreshing are done in a continuous cycle. A particular cell in memory is only read or written in a certain interval if a signal is detected to do so in that cycle. There is going to be support circuitry that includes "sense amplifiers to amplify the signal or charge detected on a memory cell."
Unless I am misunderstanding what you are implying, your confusion is in how easy it is to read/write to a cell. It's different dependent on chip design but there IS a minimum number of cycles it takes to read or write data to a cell.
These are my sources:
http://www.doc.ic.ac.uk/~dfg/hardware/HardwareLecture16.pdf
http://www.electronics.dit.ie/staff/tscarff/memory/dram_cycles.htm
http://www.ece.cmu.edu/~ece548/localcpy/dramop.pdf
To avoid a humungous answer, I left most of the detail out but all three of these will describe the process you are looking for.

4 questions about processor architecture. (Computer engineering)

Our teachers has asked us around 50 true of false questions in preparation for our final exam. I could find an answer for most of them online or by asking relative. How ever, those 4 questions adrive driving me crazy. Most of those question aren't that hard, I just cant get any satisfying answer anywhere. Sorry, the original question are not written in english, i had to translate them myself. If you don't understand something, please tell me.
Thanks!
True or false
The size of the manipulated address by the processor determines the size of the virtual memory. How ever, the size of the memory cache is independent.
For long, DRAM technology stayed imcompatible with CMOS technology used to do the standard logic in processor. This is the reason DRAM memory is (most of the time) used outside of the processor (on a different chip).
Pagination let correspond multiple virtual addressing space to a same space of physical addressing.
An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block.
"Manipulated address" is not a term of the art. You have an m-bit virtual address mapping to an n-bit physical address. Yes, a cache may be of any size up to the physical address size, but typically is much smaller. Note that cache lines are tagged with virtual or more typically physical address bits corresponding to the maximum virtual or physical address range of the machine.
Yes, DRAM processes and logic processes are each tuned for different objectives, and involve different process steps (different materials and thicknesses to lay down DRAM capacitor stacks/trenches, for example) and historically you haven't built processors in DRAM processes (except the Mitsubishi M32RD) nor DRAM in logic processes. Exception is so-called eDRAM that IBM likes to use for their SOI processes, and which is used as last level cache in IBM microprocessors such as the Power 7.
"Pagination" is what we call issuing a form feed so that text output begins at the top of the next page. "Paging" on the other hand is sometimes a synonym for virtual memory management, by which a virtual address is mapped (on a page by page basis) to a physical address. If you set up your page tables just so it allows multiple virtual addresses (indeed, virtual addresses from different processes' virtual address spaces) to map to the same physical address and hence the same location in real RAM.
"An associative cache memory with sets of 1 line is an entierly associative cache memory, because one memory block can go in any set since each sets are of the same size that of the block."
Hmm, that's a strange question. Let's break it down. 1) You can have a direct mapped cache, in which an address maps to only one cache line. 2) You can have a fully associative cache, in which an address can map to any cache line; there is something like a CAM (content addressible memory) tag structure to find which if any line matches the address. Or 3) you can have an n-way set associative cache, in which you have, essentially, n sets of direct mapped caches, and a given address can map to one of n lines. There are other more esoteric cache organizations, but I doubt you're being taught them.
So let's parse the statement. "An associative cache memory". Well that rules out direct mapped caches. So we're left with "fully associative" and "n-way set associative". It has sets of 1 line. OK, so if it is set associative, then instead of something traditional like 4-ways x 64 lines/way, it is n-ways x 1 lines/way. In other words, it is fully associative. I would say this is a true statement, except the term of the art is "fully associative" not "entirely associative."
Makes sense?
Happy hacking!
True, more or less (it depends on the accuracy of your translation I guess :) ) The number of bits in addresses sets an upper limit on the virtual memory space; you could, of course, choose not to use all the bits. The size of the memory cache depends on how much actual memory is installed, which is independent; but of course if you had more memory than you can address, then it still can't be used.
Almost certainly false. We have RAM on separate chips so that we can install more without building a whole new computer or replacing the CPU.
There is no a-priori upper or lower limit to the cache size, though in a real application certain sizes make more sense than others, of course.
I don't know of any incompatibility. The reason why we use SRAM as on-die cache is because it's faster.
Maybe you can force an MMUs to map different virtual addresses to the same physical location, but usually it's used the other way around.
I don't understand the question.

Resources