Where does code memory in Harvard architecture refers to? - memory

Harvard Architecture is a computer architecture with separate bus for code and data memory. Is that architecture referring code memory which is in RAM or ROM (for Micro-controllers). I was confused when the architecture says about code memory. As far as i know for small scale embedded systems code will always be executing from ROM, whereas in Medium scale and Sophisticated Embedded systems Code memory can be transferred to RAM from ROM for faster execution. If that is the case is RAM connected with two buses one for code and other for data memory. Can any one please help me in understanding this.

You might want to see this
https://en.wikipedia.org/wiki/Modified_Harvard_architecture
The first time I came across this Harvard architecture thing is on PICs, and they do have their RAM and ROM separated on 2 different address space. But it seems like this is not the only way to do it. Having the data & code accessible at the same time is the key. For example, having a single RAM memory space virtually partitioned to store code & data separately, but accessible by the processor at the same time. It's not a pure Harvard architecture system, but close enough.

Harvard Architecture is for the most part an academic exercise. First you have to ask how do they determine the split to the four busses? An internal von neumann that splits by address? many von nuemann implementations if not all split by address and if you draw a bigger box you see many separate busses sometimes data and instruction are joined, sometimes not.
Because you cant use pure harvard for a bootloader or operating system it is really just a mental exercise. A label like von neumann that folks like to toss about if for no other reason to create confusion. The real world is somewhere in between. AMBA/AXI and other busses are labelled modified harvard because they tag the data and instruction transactions as such but share the same busses (there isnt a single bus on a number of these there are separate read address, read data, write address, write data). the processor has not been the bottleneck in a long time the processor and these busses can be and are idle so you can have room for instruction and data and peripherals on the same set of busses, particularly if you separate read address, read data, write address, write data into separate busses with id tags being used to connect the dots and complete transactions.
as mentioned on wikipedia the closest you are really going to see in the real world is something like a microcontroller. And when they talk about memory the really just mean address space, what is out there on the other end of the bus can be sram, dram, flash, eeprom, etc or a combination. On either side, as well as all the peripherals on that bus. So in a microcontroller the instructions are in flash in this model and the sram is the data and if a pure harvard architecture there is no way to load code to sram and run it there, likewise you cant use the data bus to program the flash either or to buffer up data to be flashed, the rom/flash gets magically loaded by a not shown on hardvard architecture path. likely a crossover between the I/O bus resources and the instruction bus resources, which begs to be called modified harvard.
for Von Neumann, you have early address decoders that spit the bus into instructions, data, I/O, and sub divisions of those, perhaps the data and instruction stay combined but you dont have a pure single bus from end to end. not practical.
Look at the pictures on wikipedia, understand one has separate busses for things the other is combined. Pass the test and forget the terms you wont need them after that they are not really relevant.

Harvard has almost nothing to do with RAM or ROM - It just says that, in principle, instruction fetches and data read/write is done over separate buses.
That simply implies that at least some ROM (bootstrap code) needs to be found on the instruction memory bus - the rest can be RAM. The non-instruction bus can access RAM or ROM as well - ROM could hold constant data.
On "real" implementations like the AVR MCUs, however, the instruction bus addresses Flash ROM, while the non-instruction bus (I'm deliberately not writing "data bus", that's something different) addresses SRAM. You don't even "see" these buses on an AVR - They are purely internal to most of these MCUs.

Related

Are data and instructions segregated in the data bus in modified Harvard architectures?

In a modified Harvard architecture, both data and instructions (code) are stored together in DRAM and in L2 cache, while being separate at the L1 level. They are also both transferred from DRAM to cache through the data bus. I read that there can be separate memory controllers for data/instructions. But is there a subdivision of the bus lines into data and instructions?
And if they are separate, what are the trade-offs of having split bus lines vs unified lines? are they physically implemented differently or are they fungible?
External bus lines, no. That would be full Harvard, not just split L1.
"Modified Harvard" is just a speed hack for a von Neumann architecture, with the only visible effect being that you need to run a cache-sync instruction for self-modifying code / JIT code-gen to work reliably.
(Unless you have an ISA like x86 that requires L1i + the pipeline to be coherent with data caches, in which case stores have to snoop in-flight code addresses...)
See:
What does a 'Split' cache means. And how is it useful(if it is)?
Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? - re: multi-level caches with unified L2.
There are some chips with separate busses for instructions and data.
You typically find that true-Harvard setup in cache-less microcontrollers where instructions are fetched from ROM (or NOR flash or something), while data load/store uses SRAM or DRAM.
Some, like AVR, have a load-program-memory (LPM) instruction, so you can have read-only constant data (like lookup tables) in EEROM / flash. Different external busses can be hooked up to different types of memory. If supported, SPM (Store program memory) can even write to that memory, for persistence across power cycles, although limited write-endurance of such memory means you don't want to do that as part of normal operation. Having an LPM instruction means it's a bit less pure Harvard, but it's apparently not available on all AVR devices. But if you do have it, it's more like separate busses for flash vs. RAM, with ability to load/store data to either. But you can only execute from program memory, so it's not full von Neumann unless SPM is also supported.
Two separate address-spaces also means you can address e.g. 64k of code and 64k of data, instead of 64k total, with 16-bit addresses.

ARM: memory address ... why is it 0x04030201...?

can someone explain to me why we represent, the memory address itself in this way:
"Word on address =0x00":
0x04030201,
I know each of the 01, 02, 03, 04 is one byte, but can someone explain to me where that byte is, what does it represent? a memory cell in a register? I am totally confused...
An address, memory or otherwise is really no different than an address on a building. Sometimes systematically and well chosen, sometimes haphazardly. In any case some buildings are fire stations, some are grocery stores, some are apartments and some are houses and so on. But the addressing system used by a city or nation can get your item directly to that building.
when we talk about addresses in software it is no different. the processor at the lowest level doesnt know or care there is an address bus, an interface where the address is projected outside the processor. As you add on each layer of logic, like an onion, around the processor and eventually other chips, memory, usb controllers, hard drive controllers, etc. Just like the parts of an address on an envelope, portions of that address are extracted and the read or write command is delivered to the individual logic who wears that address on the side of their building.
You cant simply ask what is address 0x04030201 without any context. Addresses schemes are fairly specific their system there are hundreds or thousands or tens of thousands of arm based systems all of which have different address schemes and that address could point to nothing, with nobody to answer that request dies and possibly hangs the processor or it could be some ram or it could be a register in a usb controller or video controller or disk drive controller.
Generally you have read and write operations, in this example that would be once the letter makes it to the individual at the address on the envelope the contents of the letter contain instructions. Do this (write), or get this and mail it back (read). And the individual in the case of hardware just does what it is told without asking. If it is a read then it performs a read within the context of whatever that device is. A hard disk controller a read of a particular address might be a temperature sensor, or a status register that contains the speed at which the motor is spinning, or it might be some data that had been recently read from the hard disk. In the simple case of memory it is likely just some memory, some bytes.
how much stuff is being read is also another item that is specified on the processors bus, and this varies from processor to processor as to what is available to the programmer. Sometimes you can request to read or write individual bytes, sometimes 16 bit items or 32 or 64, etc.
then you get into address translation. Using the mail analogy this is kind of like having your mail forwarded to another address. You write one address on the letter, the post office has a forwarding request for that address, so they change your address to the new address and then complete the delivery of the letter. When you hear of a memory management unit, MMU, and in some uses of the word virtual memory, that is the kind of thing that is going on. Lets say that we want to make the programmers life simple and we tell every one that ram starts at address 0x00000000. that makes it much easier to have a compiler choose memory locations where our variables and arrays and programs live, it can compile every program the same way based on that address. But how is it that I can have many programs running at once if they all share the same memory. well they dont. One program thinks it is writing to address 0x00000000 but in reality there is some unique address which can be completely different that does belong only to that program lets say address 0x10000000, the mmu is like the mail carrier at the post office that changes the address, the processor knows from information as to which task is running that it needs to convert 0x00000000 to 0x10000000. When another program accesses what it thinks is 0x00000000 it might have that address changed to 0x11000000, and another 0x00000000 might map to physical address 0x12000000. The address that actually hits the memory is called the physical address, the address that the program uses is called the virtual address, it isnt real it is destined to be changed.
This mmu stuff not only allows for compilers and programmers to have an easier job but also the mmu allows us to protect one program or the operating system from another. Application programs run at a certain protection level which the mmu uses to know what that user is allowed to do. if a program generates a virtual address that is outside of its address space, say the system has 1 gig of memory and the program tries to address 1 gig plus a little bit more. the mmu instead of converting that to a physical address instead generates an interrupt to the processor which switches that processor into a mode that has more permissions, basically the operating system, and the operating system can then decided to try to use that other kind of virtual memory and allow the program to have more memory, or it may kill the program and put up a warning message to the user that such and such program has had a protection fault and was killed.
Address schemes for computers are generally a lot more thought out than developers that number houses in new neighborhoods, but not always, but it is not that far removed from an address on an envelope. You pick apart bits in the address and those chunks of bits mean something and deliver this processor request to the individual at that address. How the bits are parsed is very specific to the processor and platform and in some cases is dynamic or programmable on the fly, so if your next question is what is 0xabcd on my system, we may still not be able to help you. You may have to do more research or give is a lot of info...
Think of memory as an array of bytes. 'Word on address' can mean different things depending what the CPU designers consider a Word. In your case it seems a Word is 32 bits long.
So 'Word on address=0x00: 0x04030201' means:
'Beginning at memory cell 0x00 (inclusive), the 'next' four bytes are 0x04 0x03 0x02 0x01.
Also, depending on the endianness of your CPU the meaning of 'next' changes. It could be that 0x04 is stored in cell 0x00, or that 0x01 is stored there.

Speed comparison eeprom-flash-sram

Currently coding for atmel tiny45 microcontroller and I use several lookup tables. Where is the best place to store them? Could you give me a general idea about the memory speed differences between sram-flash-eeprom?
EEPROM is by far the slowest alternative, with write access times in the area of 10ms. Read access is about as fast as FLASH access, plus the overhead of address setup and triggering. Because there's no auto-increment in the EEPROM's address registers, every byte read will require at least four instructions.
SRAM access is the fastest possible (except for direct register access).
FLASH is a little slower than SRAM and needs indirect addressing in every case (Z-pointer), which may or may not be needed for SRAM access, depending on the structure and access pattern of your table.
For execution times of instructions see AVR Instruction Set, especially the LPM vs. the LDS, LD, and LDD instructions.

What is paging?

Paging is explained here, slide #6 :
http://www.cs.ucc.ie/~grigoras/CS2506/Lecture_6.pdf
in my lecture notes, but I cannot for the life of me understand it. I know its a way of translating virtual addresses to physical addresses. So the virtual addresses, which are on disks are divided into chunks of 2^k. I am really confused after this. Can someone please explain it to me in simple terms?
Paging is, as you've noted, a type of virtual memory. To answer the question raised by #John Curtsy: it's covered separately from virtual memory in general because there are other types of virtual memory, although paging is now (by far) the most common.
Paged virtual memory is pretty simple: you split all of your physical memory up into blocks, mostly of equal size (though having a selection of two or three sizes is fairly common in practice). Making the blocks equal sized makes them interchangeable.
Then you have addressing. You start by breaking each address up into two pieces. One is an offset within a page. You normally use the least significant bits for that part. If you use (say) 4K pages, you need 12 bits for the offset. With (say) a 32-bit address space, that leaves 20 more bits.
From there, things are really a lot simpler than they initially seem. You basically build a small "descriptor" to describe each page of memory. This will have a linear address (the address used by the client application to address that memory), and a physical address for the memory, as well as a Present bit. There will (at least usually) be a few other things like permissions to indicate whether data in that page can be read, written, executed, etc.
Then, when client code uses an address, the CPU starts by breaking up the page offset from the rest of the address. It then takes the rest of the linear address, and looks through the page descriptors to find the physical address that goes with that linear address. Then, to address the physical memory, it uses the upper 20 bits of the physical address with the lower 12 bits of the linear address, and together they form the actual physical address that goes out on the processor pins and gets data from the memory chip.
Now, we get to the part where we get "true" virtual memory. When programs are using more memory than is actually available, the OS takes the data for some of those descriptors, and writes it out to the disk drive. It then clears the "Present" bit for that page of memory. The physical page of memory is now free for some other purpose.
When the client program tries to refer to that memory, the CPU checks that the Present bit is set. If it's not, the CPU raises an exception. When that happens, the CPU frees up a block of physical memory as above, reads the data for the current page back in from disk, and fills in the page descriptor with the address of the physical page where it's now located. When it's done all that, it returns from the exception, and the CPU restarts execution of the instruction that caused the exception to start with -- except now, the Present bit is set, so using the memory will work.
There is one more detail that you probably need to know: the page descriptors are normally arranged into page tables, and (the important part) you normally have a separate set of page tables for each process in the system (and another for the OS kernel itself). Having separate page tables for each process means that each process can use the same set of linear addresses, but those get mapped to different set of physical addresses as needed. You can also map the same physical memory to more than one process by just creating two separate page descriptors (one for each process) that contain the same physical address. Most OSes use this so that, for example, if you have two or three copies of the same program running, it'll really only have one copy of the executable code for that program in memory -- but it'll have two or three sets of page descriptors that point to that same code so all of them can use it without making separate copies for each.
Of course, I'm simplifying a lot -- quite a few complete (and often fairly large) books have been written about virtual memory. There's also a fair amount of variation among machines, with various embellishments added, minor changes in parameters made (e.g., whether a page is 4K or 8K), and so on. Nonetheless, this is at least a general idea of the core of what happens (and it's still at a high enough level to apply about equally to an ARM, x86, MIPS, SPARC, etc.)
Simply put, its a way of holding far more data than your address space would normally allow. I.e, if you have a 32 bit address space and 4 bit virtual address, you can hold (2^32)^(2^4) addresses (far more than a 32 bit address space).
Paging is a storage mechanism that allows OS to retrieve processes from the secondary storage into the main memory in the form of pages. In the Paging method, the main memory is divided into small fixed-size blocks of physical memory, which is called frames. The size of a frame should be kept the same as that of a page to have maximum utilization of the main memory and to avoid external fragmentation.

How to find number of memory accesses

Can anybody tell me a unix command that can be used to find the number of memory accesses that took place in a given interval. vmstat, top and sar only give the amount of physical memory space occupied/available .. But do not give the number of memory of accesses in a given interval
If I understand what you're asking, such a feature would almost certainly require hardware support at a very low level (e.g. a counter of some sort that monitors memory bus activity).
I don't think such support is available for the common architectures supported by
Unix or Linux, so I'm going to go out on a limb and say that no such Unix command exists.
The situation is somewhat different when considering memory in units of pages,
because most architectures that support virtual memory have dedicated MMU hardware
which operates at that level of granularity, and can be accessed by the operating
system. But as far as I know, the sorts of counter data you'd get from the MMU would
represent events like page faults, allocations, and releases, rather than individual
reads or writes.

Resources