Unable to access Znyq AXI BRAM from Linux - memory

In my project, data is written to a BRAM (generated through the Block Ram IP generator) from a custom IP. Then, I use an AXI BRAM controller to interface the memory with the AXI bus and make it accessible to the Linux running on the ARM.
The base address for the controller is 0x4200_0000 with a range of 8K (up to 0x4200_1FFF). The memory has 8K positions too, each with a width of 32 bits.
To make sure the access problem isn't in the data generated in my custom IP, I initialize the memory simply numbering each of the 8K address (so address 1 contains 0x01, etc, up to 0x1fff).
The problem comes when attempting to read those values from Linux. Using devmem 0x42000001 on command line returns 0x04000000 and the following:
Alignment trap: devmem (1257) PC=0x0001ca94 Instr=0xe7902005 Address=0xb6f9d2fd FSR 0x011
Which seems to indicate Linux is expecting each address value to map to a byte, not a 32bits word. The alignment traps happen until devmem 0x42000004, which returns 0x00000004, the correct value for the fourth direction, but the values in addresses not multiple of 4 can't be accessed. devmem 0x42000002 returns 0x00040000 (notice the 0x04 shifting) as well as the alignment trap. I found the problem with my original python script which uses mmap to map /dev/mem: I have to read each 4 address values since each individual address seems to map to a byte, but that means I only get one of each four values.
Any ideas on how to properly interface with the AXI controller and the memory behind it?
******* Edit to clarify the issue I have. When in doubt, add a picture:

Which seems to indicate Linux is expecting each address value to map to a byte
That is the standard mapping in all modern CPUs. When you use AXI with a data bus wider then 8 bits, the bottom address bits select a byte from the AXI data bus. Go to the ARM website and download the AXI specification.
The base address for the controller is 0x4200_0000 with a range of 8K (up to 0x4200_1FFF). The memory has 8K positions too, each with a width of 32 bits.
That is wrong 8K of 32 bits has an address range of 8K*4 = 0x0000 .. 0x7FFF.
I suggest you re-build the BRAM but use different parameters for the Block Ram IP generator.
I changed the RAM so that the port exposed to the AXI controller operates with 8 bits. .....
Your Zynq AXI bus is probably 32 bits wide. Thus a standard connected memory should be 32 bits wide, where you should have byte-write enables.
If you connect an 8-bit memory to a 32-bit bus and do not, or wrongly adapt the address you may lose 3 out of 4 bytes.
What is not clear to me is which behavior you exactly want.
Standard 8Kx32 bit memory with byte access
or
8kx8 bit memory where you have a byte at 0x0, 0x4, 0x8 etc.
In case 2 you should use the AXI address different: you should shift the address bits up two positions so each byte occupies 4 address locations.
You also have to decide where to place the byte:
LS position only: tie the MS 24 bits to zero
MS position only: tie the LS 24 bits to zero
Repeated over all 4 locations: replicate the byte four times over the 32 bits.
Whatever else you fancy. (It's your hardware, you can do what you want.)
Beware that for any module you connect to an AXI bus, the preceding AXI splitters should be set up to cover the correct address range. But I assume you have none of those.

Related

How byte addressing works?

I am new to computer architecture. So correct me if I am wrong.
If a memory module consists of 8 memory chips and if each chip stores 4bits per address then by applying an address to the address pin of the module I can get (8 x 4=) 32 bit from that address in the module. But byte addressing tells that every byte has an address. But here I am accessing 32bits using an address. So how is it possible?
I think if each chip stores 1bit per address then by applying an address to the module I can access 8bit or one byte.
You say each chip stores 4bits per address and you have 8 on the same address bus. It is the address bus that is the limiting factor. The address bus must have 32 lines for each byte in a 32 bit architecture to be addressable. If you have 8 chips each producing 4 bits in response to the same address, then you have 32bits per address. The advantage of such an arrangement would be the address bus lines could be reduced by 2 without decreasing the addressable range (only the resolution).
You are correct in thinking that each chip would need to produce 1 bit per address to allow byte addressing.
That's the theory, in practice I would suspect a solution could be architected where the 4 bits could be time division multiplexed making each individually accessible.
I have heard for a long time not to address less than 32 bits at a time, as that may be the smallest unit addressable. Certainly it would make sense when 2Gb-4Gb was the physical limit of 32 bit byte addressing.

How many bytes can be stored in a memory unit that uses 8 address bits and a 16 bit architecture?

I need help understanding memory. How many bytes can be stored in a memory unit that uses 8 address bits and a 16 bit architecture?
I think it's 2^8 = 256. Is this correct?
Edit: I mean 256
It depends.
Firstly "16 bit architecture" is too vague to be a helpful characterization for this problem. A characterization like that generally refers to the width of registers and data paths (e.g. in the ALU), not how memory is addressed.
Secondly, the answer actually depends on whether the addresses are byte addresses or "word" addresses. AFAIK "almost all" new processor / instruction set architectures designed since the 1980's have used byte addresses. But prior to that it was common for addresses to address words of up to 60 bits (or possibly more).
But assuming byte addressing, then an 8 bit address allows you to address 2^8 bytes; i.e. 256 bytes.
On the other hand, if we assume word addressing with a 16 bit word, then 8 bit addresses will address 256 words ... which is 512 bytes.
The base answer is the 2^number of bits. However,long ago sixteen bit systems came up with means for accessing more than 2^16 memory though segments. While an application can only access 2^16 bytes at a time, changing the values in hardware registers allows the application to change which 2^16 bytes of a larger address space are being accessed.
You typically do things like
Map buffer 1 to the address space.
Queue a read operation to the buffer.
Map another buffer 2 to the address space.
Read operation completes-map Buffer 1 back so that the data can be accessed.

Why does 20 address space with on a 16 bit machine give access to 1 Megabyte and not 2 Megabytes?

OK, this question sounds simple but I am taken by surprise. In the ancient days when 1 Megabyte was a huge amount of memory, Intel was trying to figure out how to use 16 bits to access 1 Megabyte of memory. They came up with the idea of using segment and offset address values to generate a 20 bit address.
Now, 20 bits gives 2^20 = 1,048,576 locations that can be addressed. Now assuming that we access 1 byte per address location we get 1,048,576/(1024*1024) = 2^20/2^20 Megabytes = 1 Megabyte. Ok understood.
The confusion comes here, we have 16 bit data bus in the ancient 8086 and can access 2 bytes at a time rather than 1, this equate 20 bit address to being able to access a total of 2 Megabyte of data right? Why do we assume that each address only has 1 byte stored in it when the data bus is 2 bytes wide? I am confused here.
It is very important to consider the bus when trying to understand this. This is probably more of an electrical question than a software one, but here is the answer:
For 8086, when reading from ROM, The least significant address line (A0) is not used, reducing the number of address lines to 19 right then and there.
In the case where the CPU needs to read 16 bits from an odd address, say, bytes at 0x3 and 0x4, it will actually do two 16-bit reads: One from 0x2 and one from 0x4, and discard bytes 0x2 and 0x5.
For 8-bit ROM reads, the read on the bus is still 16-bits but the unneeded byte is discarded.
But for RAM there is sometimes a need to write just a single byte, this gets a little more complex. There is an extra output signal on the processor called BHE# (Bus high enable). The combination of A0 and BHE# are used to determine if the write is an 8 or 16-bits wide, and whether or not it is at an odd or even address.
Understanding these two signals is key to answering your question. Stating it simply as possible:
8-bit even access: A0 OFF, BHE# OFF
8-bit odd access: A0 ON, BHE# ON
16-bit access (must be even): A0 OFF, BHE# ON
And we cannot have a bus cycle with A0 ON and BHE# OFF because an odd access to the even byte of the bus is meaningless.
Relating this back to your original understanding: You are completely correct in the case of memory devices. A 1 megabyte 16-bit memory chip will indeed only have 19 address lines, to that chip, 16 bits is a byte, and in effect, they do not physically have an A0 address input.
... almost. 16-bit writable memory devices have two extra signals (BHE# and BLE#) which are connected to the CPU's BHE# and A0 respectively. This so they know to ignore part of the bus when an 8-bit access is under way, making them hybrid 8/16 bit devices. ROM chips do not have these signals.
For the hardware unenlightened, this is a fairly complex area we're touching on here, and it does get very complex indeed in terms of performance considerations and in large systems with mixed 8 and 16 bit hardware.
It's is all explained in fantastic detail in the 8086 datasheet
It's because a byte is the 'atom' in memory addressing and the code must be able to access all the individual bytes in the address space. really a matter of software and compatibility with 8-bit existing software back then.
This too may interest you: How a single byte of memory is accessed by CPU in a 32-bit memory and 32-bit processor

address space and byte adressability

A microprocessor is byte addressable with 24bit address bus and 16bit data bus and one word contains two bytes. I was asked a question regarding attaching peripherals, adding memory, and address space and there's a few general concepts I don't see why they work.
Why is it that to calculate the address space you use the address bus not the data bus? Is the address space a function of the address bus or does it have to do with the microprocessor? How is it relevant that one word contains two bytes?
Why is it that to calculate the address space you use the address bus not the data bus?
Because it's the address bits that go out to the memory subsystem to tell them which memory location you want to read or write. The data bits just carry the data being read or written.
Is the address space a function of the address bus or does it have to do with the microprocessor?
Yes, the address space is a function of the address bus though there are tricks you can use to expand how much memory you can use.
An example of that is bank switching which gives you more accessible memory but no more address space (multiple blocks of memory co-exist at the same address, one at a time).
Another example is shown below where you can effectively double the usable memory, provided you're willing to only read and write words.
How is it relevant that one word contains two bytes?
The data bus size generally dictates the size of a memory cell. Larger memory cells can mean you can have more memory available to you but not more memory cells.
With your example, assuming you can only access words, you could get 16 megawords which is 32 megabytes.
This depends, of course, on how the memory is put together. It may be that you are able to access memory on individual byte boundaries (e.g., bytes 0/1 or 1/2 or 2/3) rather than just word boundaries, which would mean you don't actually get that full 32MB but only 16MB plus maybe one extra byte when you read the word at address FFFFFF).

Difference between word addressable and byte addressable

Can someone explain what's the different between Word and Byte addressable? How is it related to memory size etc.?
A byte is a memory unit for storage
A memory chip is full of such bytes.
Memory units are addressable. That is the only way we can use memory.
In reality, memory is only byte addressable. It means:
A binary address always points to a single byte only.
A word is just a group of bytes – 2, 4, 8 depending upon the data bus size of the CPU.
To understand the memory operation fully, you must be familiar with the various registers of the CPU and the memory ports of the RAM. I assume you know their meaning:
MAR(memory address register)
MDR(memory data register)
PC(program counter register)
MBR(memory buffer register)
RAM has two kinds of memory ports:
32-bits for data/addresses
8-bit for OPCODE.
Suppose CPU wants to read a word (say 4 bytes) from the address xyz onwards. CPU would put the address on the MAR, sends a memory read signal to the memory controller chip. On receiving the address and read signal, memory controller would connect the data bus to 32-bit port and 4 bytes starting from the address xyz would flow out of the port to the MDR.
If the CPU wants to fetch the next instruction, it would put the address onto the PC register and sends a fetch signal to the memory controller. On receiving the address and fetch signal, memory controller would connect the data bus to 8-bit port and a single byte long opcode located at the address received would flow out of the RAM into the CPU's MDR.
So that is what it means when we say a certain register is memory addressable or byte addressable. Now what will happen when you put, say decimal 2 in binary on the MAR with an intention to read the word 2, not (byte no 2)?
Word no 2 means bytes 4, 5, 6, 7 for 32-bit machine. In real physical memory is byte addressable only. So there is a trick to handle word addressing.
When MAR is placed on the address bus, its 32-bits do not map onto the 32 address lines(0-31 respectively). Instead, MAR bit 0 is wired to address bus line 2, MAR bit 1 is wired to address bus line 3 and so on. The upper 2 bits of MAR are discarded since they are only needed for word addresses above 2^32 none of which are legal for our 32 bit machine.
Using this mapping, when MAR is 1, address 4 is put on the bus, when MAR is 2, address 8 is put on the bus and so forth.
It is a bit difficult in the beginning to understand. I learnt it from Andrew Tanenbaums's structured computer organisation.
This image should make it easy to understand:
http://i.stack.imgur.com/rpB7N.png
Simply put,
• In the byte addressing scheme, the first word starts at address 0, and
the second word starts at address 4.
• In the word addressing scheme, all bytes of the first word are located
in address 0, and all bytes of the second word are located in address 1.
The advantage of byte-addressability are clear when we consider applications that process data one byte at a time. Access of a single byte in a byte-addressable system requires only the issuing of a single address. In a 16–bit word addressable system, it is necessary first to compute the address of the word containing the byte, fetch that word, and then extract the byte from the two-byte word. Although the processes for byte extraction are well understood, they are less efficient than directly accessing the byte. For this reason, many modern machines are byte addressable.
Addressability is the size of a unit of memory that has its own address. It's also the smallest chunk of memory that you can modify without affecting its neighbours.
For example: a machine where bytes are the normal 8 bits, and the word-size = 4 bytes. If it's a word-addressable machine, there's no such thing as the address of the second byte of an int. Dealing with strings (e.g. an array like char str[]) becomes inconvenient, because you still store characters packed together. Modifying just str[1] means loading the word that contains it, doing some shift/and/or operations to apply the change, then doing a word store.
Note that this is different from a machine that doesn't allow unaligned word load/stores (where the low 2 bits of a word address have to be 0). Such machines usually have a byte load/store instruction. We're talking about machines without even that.
CPU addresses might actually still include the low bits, but require them to always be zero (or ignore them). However, after checking that they're zero, the could be discarded, so the rest of the memory system only sees the word address, where two adjacent words have an address that differs by 1 (not 4). However, on a 16-bit CPU where a register can only hold 64k different addresses, you wouldn't likely do this. Each separate CPU address would refer to a different 2 bytes of memory, instead of discarding the low bit. 2B word-addressable memory would let you address 128kiB of memory, instead of just 64kiB with byte-addressable memory.
Fun fact: ARM used to use the low 2 bits of an address as a shuffle control for unaligned word loads. (But it always had byte load/store instructions.)
See also:
https://en.wikipedia.org/wiki/Word-addressable
https://en.wikipedia.org/wiki/Byte_addressing
Note that bit-addressable memory could exist, but doesn't. 8-bit bytes are nearly universally standard now. (Ancient computers sometimes had larger bytes, see the history section of wikipedia's Byte article.)

Resources