What's the purpose of aligned data for memory address - memory

I understand that physical memory is accessed by aligned chunks of 4(32-bit) or 8(64-bit) bytes.
But why do we need aligned data for memory address, lets say(in 32 bit machine):
I have a char c start at address 0(char takes one byte, then I have an integer i which start at address 1, so when I want to access the i, computer get the address of i which is 1, and then read 4 bytes from address 1 directly.
So if it works in this way, why do we need to pad 3 bytes after char c?

As you said, "memory can be accessed by aligned chunks of 4 or 8 bytes" depending on the architecture of a computer. This means the processor does access memory only on chunks of addresses dividable by 4 or 8 (this has pretty much to do with cost and design complexity I guess).
Let's illustrate your example :
struct foo {
char c;
int i;
};
Say foo is aligned in memory at address 0x100. If you access foo.c, you are accessing one byte only, but behind the scenes an entire word of 4 bytes has been read from memory by the CPU and the 3 next bytes in the word have been discarded.
Now if you read foo.i (wich is 4 bytes long) at memory location 0x101, the CPU will need two memory transactions. One at address 0x100 where it gets the three first bytes, and then the other one at address 0x104 to fetch the last remaining byte.
In the end, aligned data in memory saves unnecessary memory transactions.

Related

how long is a memory address typically in bits

I am confused with so many terminologies that my instructor talks about such as word,byte addressing and memory location.
I was under the impression that for a 32-bit processor,
it can address upto 2^32 bits, which is 4.29 X 10^9 bits (NOT BYTES).
The way I think now is:
The memory is like an array of buckets each of 1 byte length.
when we say byte addressing (which I guess is the most common ones), each char is 1 byte and is retrieved from the first bucket (say for example).
for int the next 4 bytes are put together in little-endian ordering to compute the Integer value.
so each memory, I see it as, 8 bits or 1 byte, which can give upto 2^8 locations, this is far less than what cpu can address.
There is some very basic mis-understanding here on my part which if some experts can explain in simple terms that a prosepective CS-major student can it in once forever.
I have read various pages including this one on word and here the unit of address resolution is given as 8b for ARM, which adds more to my confusion.
The processor uses 32 bits to store an address. With 32 bits, you can store 2^32 distinct numbers, ranging from 0 to 2^32 - 1. "Byte addressing" means that each byte in memory is individually addressable, i.e. there is an address x which points to that specific byte. Since there are 2^32 different numbers you can put into a 32-bit address, we can address up to 2^32 bytes, or 4 GB.
It sounds like the key misconception is the meaning of "byte addressing." That only means that each individual byte has its own address. Addresses themselves are still composed of multiple bytes (4, in this case, since four 8-bit bytes are taken together and interpreted as a single 32-bit number).
I was under the impression that for a 32-bit processor, it can address upto 2^32 bits, which is 4.29 X 10^9 bits (NOT BYTES).
This is typically not the case -- bit-level addressing is quite rare. Byte addressing is far more common. You could design a CPU that worked this way, though. In that case as you said, you would be able to address up to 2^32 bits = 2^29 bytes (512 MiB).
For one bit, You would have 0 or 1 and For two bits, you would have 00, 01, 10, 11. For 8 bits, you would have 2^8 which is 256 address values. Address and Data are separate terms. Address is the location and Data is the content in that location. Data width(content) is how many bits you could store in one memory cell address.(Think like an apartment with bedrooms- each apartment in a building has two bedrooms)and Data depth(address) is how many addresses you would have(In a building how many apartments you would have #1 thru #1400 etc). One bit in the CPU register can reference an individual byte in memory like one number in apartment number can reference one apartment. SIMM module RAMs had 32 bit Data width and DIMM modules have 64 bit Data width. It means in one memory address in DIMM, It stores 64 bits data. How many addresses can be multiplexed by two wires (two bit processing), you could make 4 addresses. (Each of these addresses could hold 64 bits if it is DIMM module ). 32 bit processing means, 32 wires, 2^32 address options. Even though, 64 bit processing has 64 bit registers and internal bus (wires) as 64 bit, http://www.tech-faq.com/address-bus.html, address bus max is 44 bits. means 2^44 maximum addressing can be achieved by Intel Super Server CPU Itanium 2.

Double-byte memory access granularity

I am attempting to learn about memory alignment, without much success admittedly. I am using this article from IBM.
Can someone please explain to me what this excerpt means from the double byte memory access granularity section:
However, notice what happens when reading from address 1. Because the address doesn't fall evenly on the processor's memory access boundary, the processor has extra work to do. Such an address is known as an unaligned address. Because address 1 is unaligned, a processor with two-byte granularity must perform an extra memory access, slowing down the operation.
Why is another memory access in order? What does it mean by memory access boundary and it being even on the memory access boundary?
I have a VERY limited knowledge on the CPU, as I have only delt with upper level programming (Objective-C and C++). Any help is greatly appreciated!
Thanks!
The example is describing what happens when you try to read a block of 4 consecutive bytes on a CPU with double-byte access granuality. On this type of CPU, memory is accessed as pairs of bytes, always starting with an even-numbered byte.
If you try to read the block starting with byte 0, it has to perform 2 reads: bytes 0-1 and bytes 2-3.
If you try to read the block starting with byte 1, it has to perform 3 reads: bytes 0-1 (to get byte 1), bytes 2-3, and bytes 4-5 (to get byte 4).
Memory access granularity is the number of bytes it accesses at a time, and a memory access boundary is where each of these groups of bytes begins. The groups of bytes are always addressed at even multiples of the granularity -- if it's double-byte granularity they start on even addresses, if it's quad-byte granularity they're at multiples of 4.
As an analogy, consider an apartment building with 4 units on each floor. Units 0-3 are on floor 0, units 4-7 are on floor 1, etc. If you want to slip a flyer under the doors of units 0-3, you only have to go to one floor. But if you want to slip a flyer under 1-4, you have to go to 2 floors: floor 0 for 1-3, floor 2 for unit 4.

address space and byte adressability

A microprocessor is byte addressable with 24bit address bus and 16bit data bus and one word contains two bytes. I was asked a question regarding attaching peripherals, adding memory, and address space and there's a few general concepts I don't see why they work.
Why is it that to calculate the address space you use the address bus not the data bus? Is the address space a function of the address bus or does it have to do with the microprocessor? How is it relevant that one word contains two bytes?
Why is it that to calculate the address space you use the address bus not the data bus?
Because it's the address bits that go out to the memory subsystem to tell them which memory location you want to read or write. The data bits just carry the data being read or written.
Is the address space a function of the address bus or does it have to do with the microprocessor?
Yes, the address space is a function of the address bus though there are tricks you can use to expand how much memory you can use.
An example of that is bank switching which gives you more accessible memory but no more address space (multiple blocks of memory co-exist at the same address, one at a time).
Another example is shown below where you can effectively double the usable memory, provided you're willing to only read and write words.
How is it relevant that one word contains two bytes?
The data bus size generally dictates the size of a memory cell. Larger memory cells can mean you can have more memory available to you but not more memory cells.
With your example, assuming you can only access words, you could get 16 megawords which is 32 megabytes.
This depends, of course, on how the memory is put together. It may be that you are able to access memory on individual byte boundaries (e.g., bytes 0/1 or 1/2 or 2/3) rather than just word boundaries, which would mean you don't actually get that full 32MB but only 16MB plus maybe one extra byte when you read the word at address FFFFFF).

Difference between word addressable and byte addressable

Can someone explain what's the different between Word and Byte addressable? How is it related to memory size etc.?
A byte is a memory unit for storage
A memory chip is full of such bytes.
Memory units are addressable. That is the only way we can use memory.
In reality, memory is only byte addressable. It means:
A binary address always points to a single byte only.
A word is just a group of bytes – 2, 4, 8 depending upon the data bus size of the CPU.
To understand the memory operation fully, you must be familiar with the various registers of the CPU and the memory ports of the RAM. I assume you know their meaning:
MAR(memory address register)
MDR(memory data register)
PC(program counter register)
MBR(memory buffer register)
RAM has two kinds of memory ports:
32-bits for data/addresses
8-bit for OPCODE.
Suppose CPU wants to read a word (say 4 bytes) from the address xyz onwards. CPU would put the address on the MAR, sends a memory read signal to the memory controller chip. On receiving the address and read signal, memory controller would connect the data bus to 32-bit port and 4 bytes starting from the address xyz would flow out of the port to the MDR.
If the CPU wants to fetch the next instruction, it would put the address onto the PC register and sends a fetch signal to the memory controller. On receiving the address and fetch signal, memory controller would connect the data bus to 8-bit port and a single byte long opcode located at the address received would flow out of the RAM into the CPU's MDR.
So that is what it means when we say a certain register is memory addressable or byte addressable. Now what will happen when you put, say decimal 2 in binary on the MAR with an intention to read the word 2, not (byte no 2)?
Word no 2 means bytes 4, 5, 6, 7 for 32-bit machine. In real physical memory is byte addressable only. So there is a trick to handle word addressing.
When MAR is placed on the address bus, its 32-bits do not map onto the 32 address lines(0-31 respectively). Instead, MAR bit 0 is wired to address bus line 2, MAR bit 1 is wired to address bus line 3 and so on. The upper 2 bits of MAR are discarded since they are only needed for word addresses above 2^32 none of which are legal for our 32 bit machine.
Using this mapping, when MAR is 1, address 4 is put on the bus, when MAR is 2, address 8 is put on the bus and so forth.
It is a bit difficult in the beginning to understand. I learnt it from Andrew Tanenbaums's structured computer organisation.
This image should make it easy to understand:
http://i.stack.imgur.com/rpB7N.png
Simply put,
• In the byte addressing scheme, the first word starts at address 0, and
the second word starts at address 4.
• In the word addressing scheme, all bytes of the first word are located
in address 0, and all bytes of the second word are located in address 1.
The advantage of byte-addressability are clear when we consider applications that process data one byte at a time. Access of a single byte in a byte-addressable system requires only the issuing of a single address. In a 16–bit word addressable system, it is necessary first to compute the address of the word containing the byte, fetch that word, and then extract the byte from the two-byte word. Although the processes for byte extraction are well understood, they are less efficient than directly accessing the byte. For this reason, many modern machines are byte addressable.
Addressability is the size of a unit of memory that has its own address. It's also the smallest chunk of memory that you can modify without affecting its neighbours.
For example: a machine where bytes are the normal 8 bits, and the word-size = 4 bytes. If it's a word-addressable machine, there's no such thing as the address of the second byte of an int. Dealing with strings (e.g. an array like char str[]) becomes inconvenient, because you still store characters packed together. Modifying just str[1] means loading the word that contains it, doing some shift/and/or operations to apply the change, then doing a word store.
Note that this is different from a machine that doesn't allow unaligned word load/stores (where the low 2 bits of a word address have to be 0). Such machines usually have a byte load/store instruction. We're talking about machines without even that.
CPU addresses might actually still include the low bits, but require them to always be zero (or ignore them). However, after checking that they're zero, the could be discarded, so the rest of the memory system only sees the word address, where two adjacent words have an address that differs by 1 (not 4). However, on a 16-bit CPU where a register can only hold 64k different addresses, you wouldn't likely do this. Each separate CPU address would refer to a different 2 bytes of memory, instead of discarding the low bit. 2B word-addressable memory would let you address 128kiB of memory, instead of just 64kiB with byte-addressable memory.
Fun fact: ARM used to use the low 2 bits of an address as a shuffle control for unaligned word loads. (But it always had byte load/store instructions.)
See also:
https://en.wikipedia.org/wiki/Word-addressable
https://en.wikipedia.org/wiki/Byte_addressing
Note that bit-addressable memory could exist, but doesn't. 8-bit bytes are nearly universally standard now. (Ancient computers sometimes had larger bytes, see the history section of wikipedia's Byte article.)

Memory assignment of local variables

void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
We must remember that memory can only
be addressed in multiples of the word
size. A word in our case is 4 bytes,
or 32 bits. So our 5 byte buffer is
really going to take 8 bytes (2 words)
of memory, and our 10 byte buffer is
going to take 12 bytes (3 words) of
memory. That is why SP is being
subtracted by 20.
Why it's not ceil((5+10)/4)*4=16?
Because individual variables should be aligned. With your proposed formula, you'd align only the first variable on the stack, leaving following variables unaligned, which is bad for performance.
This is also known as "packing" and can be done in C/C++ with pragmas, but is only useful in very specific cases and can be dangerous both for performance and as a cause of potential runtime traps. Some processors will generate faults on unaligned accesses at runtime, which will crash your program.
The variables on your architecture are aligned individually. buffer1 gets rounded up to 8 and buffer2 to 12 so that both of their starting addresses are 4-byte aligned. So 8+12 = 20.

Resources