nand2tetris. Memory implementation - memory

I realized Data memory implementation in nand2tetris course. But I really don't understand some parts of my implementation:
CHIP Memory {
IN in[16], load, address[15];
OUT out[16];
PARTS:
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);
Or(a=RAM1, b=RAM2, out=RAM);
RAM16K(in=in, load=RAM, address=address[0..13], out=RAMout);
Screen(in=in, load=scr, address=address[0..12], out=ScreenOut);
Keyboard(out=KeyboardOut);
Mux4Way16(a=RAMout, b=RAMout, c=ScreenOut, d=KeyboardOut, sel=address[13..14], out=out);
}
Is responsible for what load here. I understand that if load is 0 - out of Dmux4Way in any case will be 0 0 0 0. But i don't understand how it works in that case after that. Namely how it allows don't load data in Memory.
At least incomprehensible why in Screen we fed address[0..12] instead address[0..14] - full address. In my opinion we should use second because Screen memory map stay after RAM memory map and if we want to request for Screen memory map - we should use range (16 384 - 24 575) - decimal or (100000000000000 - 101111111111111) - binary. But how we can represent that range use just 13 width buss (address[0..12]) ??? It's impossible.
Therefore if we want to represent Screen memory map we should use range which was presented above. And that range has 15 width or address[0..14] BUT not address[0..12] (width 13). But why works just address[0..12] and doesn't work address[0..14](full address)
DMux4Way(in=load, sel=address[13..14], a=RAM1, b=RAM2, c=scr, d=kbr);

I'm sorry to criticize you at the beginning, but questions you ask suggest that you didn't do this exercise yourself or didn't start the whole course from the beginning.
To answer your questions:
Ad.1.
You demultiplex a single bit (load bit) to the correct memory part. Thereafter, you then feed the input data to all memory parts at the same time.
It's easier and neater than doing it the other way around, namely, to direct 16-bit input to the correct part (RAM16K, screen, or keyboard) while having a load bit that is connected and active at every register in all the parts.
To clarify. You have 2 possible destinations when writing data: RAM and Screen. The smallest demultiplexer you have is a 4-way multiplexer and that's what you're using. When you write into memory, you need to provide 2 pieces of information: the data and destination, both at the same time.
You might demultiplex the input data with DMux4Way16 and separately single load bit with DMux4Way but that would take 2 demultiplexers, and we can do better than that. That's what's done here, you direct data input to both RAM and Screen and then only use one demultiplexer : DMux4Way to select one of 2 possible destinations, only the one selected will be loaded with new data, on the other data input will be ignored. Knowing that, you need to study A-instruction format: when bit 14 and 13 of A-instruction (or data residing in A-register) have the binary value 00 or 01, the destination is RAM. When bit 14 and 13 have the binary value 10, it means the screen is the destination.
When you notice that you choose these 2 bits as sel for your demultiplexer. Selections 0 and 1 have the same meaning, so you can OR them and feed the output as load to RAM. Selection 2 means Screen will be loaded with a, new value, so load bit goes there. Selection 3 is never used so we don't care about it - output d of demultiplexer will not be connected anywhere. We make use of the demultiplexer's feature: The selected output will have value 1 and all other outputs will yield 0 as a result. It means only 1 memory destination will be loaded.
Ad.2.
Screen is separate device, it has nothing to do with RAM, ROM or Keyboard memory devices here. You, and only you, give meaning to what bits mean what to this specific device. To answer your question, when you address some register in Screen you address it in its own internal address space. In its internal address space first address will be 0, but from whole Memory it will be 16384. It's your job to make this transition. In this particular case, size of Screen memory device it is not necessary to use 14-bit address bus, 13 bits is all you need. What would 14th bit mean in this case? It wouldn't add any value. Also, you are user and not designer of Screen, you only look at and follow its interface description.
Hope it answers your questions, if not I urge you to go back and study more carefully previous hardware related chapters from course.

Related

Using HERE in Forth for temporary space

I'm writing a game in Forth (for learning purposes).
The game is played on a "10 cell board". I'm trying new stuff so I did
here 10 [char] - fill
to set up the space for the board.
Then, to play 'X' in position 3
[char] X here 3 + c!
This has been working fine, but raises the question
Is this OK?
What if the board was a million cells wide?
Thanks
The described approach has the certain environmental dependencies, so your program just have to match the environmental restriction on programs of your Forth system (i.e. that you use).
1. Size of the data space
The word UNUSED returns "the amount of space remaining in the region addressed by HERE". So, a program can check the available space.
Also, according to the subsection 4.1.3 Other system documentation of the Forth Standard:
A system shall provide the following information:
[...]
program data space available, in address units;
So, you have just to check whether your Forth system provides enough data space for you program, and how the available data space can be configured (if any).
2. Transient regions
In the general case, it is not safe for a portable program to use the data space without reserving it.
According to the section 3.3.3.6 Other transient regions of the Forth Standard, the contents of the data space regions identified by PAD, WORD, and #> may become invalid after the data space is allocated. Сonsequently, the contents of the region identified by HERE may become invalid after the contents of the regions identified by PAD, WORD, and #> are changed.
See also A.3.3.3.6 Other transient regions:
In many existing Forth systems, these areas are at HERE or just beyond it, hence the many restrictions.
Moreover, some Forth systems may use the region identified by HERE in internal purposes during translation. E.g. Gforth 0.7.9 uses this region when decoding escaped strings.
The phrase:
s\" test\-passed" cr here over type cr type cr
outputs:
test-passed
test-passed
So, you have to check the restrictions of your Forth system whether you may use the region identified by HERE without reserving the space (and in what conditions).

T or F: If a mahine using paged virutal memory has a 24-bit logical address and a 32-bit physical address, a page fault will never occur

I'm working on a practice final exam and I can't seem to figure out the answer to this question.
My understanding is that every initial page being brought in counts as a page fault, so even without the address lengths, this question should be false, correct? If we forget about this for a second, is the answer true? My thought behind this is that since the logical address only has 24 bits while the physical address has 32 bits, there would never be a case where the page has to be in a frame that is already occupied. Is more information required (such as page size) for this realm of reasoning?
every initial page being brought in counts as a page fault
Just as a note, this is true only if you create the process (populate the PCB, process control block) but you don't actually assign any frame. The first (and some of the other) reference (basically, the first istruction) will generate a page fault.
This is why you (you as the OS) have to assign a sufficent number of frame to avoid early page fault (and, with a pinch of luck and a good pager, even later in the execution of the process).
Back at your question: the answer is false (depends is more correct).
The reason is simple: if you don't know the size of the memory, you can't actually know how many frame do you have at hand. So the address size is totally useless in this specific context.

How is RAM able to acess any place in memory at O(1) speed

We are taught that the abstraction of the RAM memory is a long array of bytes. And that for the CPU it takes the same amount of time to access any part of it. What is the device that has the ability to access any byte out of the 4 gigabytes (on my computer) in the same time? As this does not seem as a trivial task for me.
I have asked colleagues and my professors, but nobody can pinpoint to the how this task can be achieved with simple logic gates, and if it isn't just a tricky combination of logic gates, then what is it?
My personal guess is that you could achieve the access of any memory in O(log(n)) speed, where n would be the size of memory. Because each gate would split the memory in two and send you memory access instruction to the next split the memory in two gate. But that requires ALOT of gates. I can't come up with any other educated guess, and I don't even know the name of the device that I should look up in Google.
Please help my anguished curiosity, and thanks in advance.
edit<
This is what I learned!
quote from yours "the RAM can send the value from cell addressed X to some output pins", here is where everyone skip (again) the thing that is not trivial for me. The way that I see it, In order to build a gate that from 64 pins decides which byte out of 2^64 to get, each pin needs to split the overall possible range of memory into two. If bit at index 0 is 0 -> then the address is at memory 0-2^64/2, else address is at memory 2^64/2-2^64. And so on, However the amout of gates (lets call them) that the memory fetch will go through will be 64, (a constant). However the amount of gates needed is N, where N is the number of memory bytes there are.
Just because there is 64 pins, it doesn't mean that you can still decode it into a single fetch from a range of 2^64. Does 4 gigabytes memory come with a 4 gigabytes gates in the memory control???
now this can be improved, because as I read furiously more and more about how this memory is architectured, if you place the memory into a matrix with sqrt(N) rows and sqrt(N) columns, the amount of gates that a fetch memory will need to go through is O(log(sqrt(N)*2) and the amount of gates that will be required will be 2*sqrt(N), which is much better, and I think that its probably a trade secret.
/edit<
What the heck, I might as well make this an answer.
Yes, in the physical world, memory access cannot be constant time.
But it cannot even be logarithmic time. The O(log n) circuit you have in mind ultimately involves some sort of binary (or whatever) tree, and you cannot make a binary tree with constant-length wires in a 3D universe.
Whatever the "bits per unit volume" capacity of your technology is, storing n bits requires a sphere with radius O(n^(1/3)). Since information can only travel at the speed of light, accessing a bit at the other end of the sphere requires time O(n^(1/3)).
But even this is wrong. If you want to talk about actual limitations of our universe, our physics friends say the absolute maximum number of bits you can store in any sphere is proportional to the sphere's surface area, not its volume. So the actual radius of a minimal sphere containing n bits of information is O(sqrt(n)).
As I mentioned in my comment, all of this is pretty much moot. The models of computation we generally use to analyze algorithms assume constant-access-time RAM, which is close enough to the truth in practice and allows a fair comparison of competing algorithms. (Although good engineers working on high-performance code are very concerned about locality and the memory hierarchy...)
Let's say your RAM has 2^64 cells (places where it is possible to store a single value, let's say 8-bit). Then it needs 64 pins to address every cell with a different number. When at the input pins of your RAM there 'appears' a binary number X the RAM can send the value from cell addressed X to some output pins, and your CPU can get the value from there. In hardware the addressing can be done quite easily, for example by using multiple NAND gates (such 'addressing device' from some logic gates is called a decoder).
So it is all happening at the hardware-level, this is just direct addressing. If the CPU is able to provide 64 bits to 64 pins of your RAM it can address every single memory cell (as 64 bit is enough to represent any number up to 2^64 -1). The only reason why you do not get the value immediately is a kind of 'propagation time', so time it takes for the signal to go through all the logic gates in the circuit.
The component responsible for dealing with memory accesses is the memory controller. It is used by the CPU to read from and write to memory.
The access time is constant because memory words are truly layed out in a matrix form (thus, the "byte array" abstraction is very realistic), where you have rows and columns. To fetch a given memory position, the desired memory address is passed on to the controller, which then activates the right column.
From http://computer.howstuffworks.com/ram1.htm:
Memory cells are etched onto a silicon wafer in an array of columns
(bitlines) and rows (wordlines). The intersection of a bitline and
wordline constitutes the address of the memory cell.
So, basically, the answer to your question is: the memory controller figures it out. Of course that, given a memory address, the mapping to column and row must be easy to calculate in a constant time.
To fully understand this topic, I recommend you to read this guide on how memory works: http://computer.howstuffworks.com/ram.htm
There are so many concepts to master that it is difficult to explain it all in one answer.
I've been reading your comments and questions until I answered. I think you are on the right track, but there is some confusion here. The random access in which you are implying doesn't exist in the same way you think it does.
Reading, writing, and refreshing are done in a continuous cycle. A particular cell in memory is only read or written in a certain interval if a signal is detected to do so in that cycle. There is going to be support circuitry that includes "sense amplifiers to amplify the signal or charge detected on a memory cell."
Unless I am misunderstanding what you are implying, your confusion is in how easy it is to read/write to a cell. It's different dependent on chip design but there IS a minimum number of cycles it takes to read or write data to a cell.
These are my sources:
http://www.doc.ic.ac.uk/~dfg/hardware/HardwareLecture16.pdf
http://www.electronics.dit.ie/staff/tscarff/memory/dram_cycles.htm
http://www.ece.cmu.edu/~ece548/localcpy/dramop.pdf
To avoid a humungous answer, I left most of the detail out but all three of these will describe the process you are looking for.

Cache and memory

First of all, this is not language tag spam, but this question not specific to one language in particulary and I think that this stackexchange site is the most appropriated for my question.
I'm working on cache and memory, trying to understand how it works.
What I don't understand is this sentence (in bold, not in the picture) :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
So let's say I have this two adresses :
[1........0]10
[1........0]00
^
|
same 30 bits for boths [31-12] for the tag and [11-2] for the index (see figure below)
As I understand the first one will result in a MISS (I assume that the initial cache is empty). So one slot in the cache will be filled with the data located in this memory adress.
Now, we took the second one, since it has the same 30 bits, it will result in a HIT in the cache because we access the same slot (because of the same 10 bits) and the 20 bits of the adress are equals to the 20 bits stored in the Tag field.
So in result, we'll have the data located at the memory [1........0]10 and not [1........0]00 which is wrong !
So I assume this has to do with the sentence I quote above. Can anyone explain me why my reasoning is wrong ?
The cache in figure :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
It just mean that in memory, my words are aligned like that :
So when selecting a word, I shouldn't care about the two last bits, because I'll load a word.
This two last bits will be useful for the processor when a load byte (lb) instruction will be performed, to correctly shift the data to get the one at the correct byte position.

Memory Locations of Variables when Using IA-32 Assembly Language

Quick question on memory locations in IA-32 assembly language that i cannot seem to find the answer for anywhere else.
On IA-32 each memory address is 4 bytes long (e.g. 0x0040120e). Each of these addresses points to a 1 byte value (or in the case of a larger value, the first byte of it). Now look at these two simple IA-32 assembly language statements:
var1 db 2
var2 db 3
This will place var1 and var2 in adjacent memory cells (let's say 0x0040120e and 0f). Now I realize that the define directive db allocates 1 byte to the value. But, in the case above I have two values (2 and 3) that in fact only requires two bits each, to be stored.
Questions:
When using the db directive, do these two values still consume a full byte, even though they are smaller than 1 byte?
Is using a full byte for values that could get away with less, still the common way to go (as we have so much memory that we don't care)?
Does integers 0 to 255 then generally take up 1 byte and integers 256 to (2^16 - 1) take up 2 bytes (a word), etc.?
Thank you,
Magnus
EDIT 1: Made questions more clear (apologies for the back and forth)
EDIT 2: Added a structured reply below, based on other posters' input
yes. the B in DB is for Byte.
You could use a nibble for each, like so:
combined db 0x23
but you'd have to
a) shift the result for 4 bits right if you need the "2".
b) mask the leftmost 4 bits if you need the "3".
Hardly worth the effort these days ;-)
Yes, since the architecture is byte-addressable and cannot address anything smaller than a byte.
This means that data requiring less than one byte will need to share its address with other data.
In practice this means that you're going to have to know which bits in the pointed-out byte are used for this particular value.
For hardware registers this sort of mapping is very common.
EDIT: Ah, you seem to mean "values of the same variable" when you said "2 and 3". I thought you meant 2-bit and 3-bit values. You need to decide how many bits are needed at most for a particular variable, for all the values you need that variable to be able to store. There are variable-length encodings for integers of course, but that's generally rarely used in assembly and not what you'd typically use for some general-purpose variable.
You generally should expect to reserve all bits required for all values that a variable need to hold, up front. Otherwise, if you're worried about "wasting memory", you would need to move all other variables as soon as you get some "vacant bits" somewhere. That would end up costing fantastically much. Also, knowing the size of a variable is constant makes it possible to generate (or write) the proper code to handle it, otherwise you would of course also need to explicitly store somewhere "the size of the value held in variable x is now y bits". That becomes extremely painful very very quickly.
My initial question was a bit unstructured, so for the benefit of other searchers stopping by here I will use the answers received from #unwind and #geert3 to create a structured response. Again, this was my fault due to the initial poor structuring and creds for the answers goes to #unwind and #geert3.
When using the db directive you allocate 1 byte to the variable, and even if the variable takes up less space than 1 byte, it will still consume that full 1-byte address spot. As one might guess, that wastes a few bits of memory, but that is okay as you have enough memory and not too bothered about wasting a couple of bits. The reason you want to use the full 1-byte memory location is that it is easier to reference the variable when it is alone in the address slot (see #geert3's note on how to access it if you use less than a byte), and additionally, in case you want to reuse the variable later, it is great to know you have space for any number up to 255.
Yes, see answer to 1
Yes, you would normally allocate multiples of a byte to a variable, in a byte-addressable system

Resources