Speed comparison eeprom-flash-sram - memory

Currently coding for atmel tiny45 microcontroller and I use several lookup tables. Where is the best place to store them? Could you give me a general idea about the memory speed differences between sram-flash-eeprom?

EEPROM is by far the slowest alternative, with write access times in the area of 10ms. Read access is about as fast as FLASH access, plus the overhead of address setup and triggering. Because there's no auto-increment in the EEPROM's address registers, every byte read will require at least four instructions.
SRAM access is the fastest possible (except for direct register access).
FLASH is a little slower than SRAM and needs indirect addressing in every case (Z-pointer), which may or may not be needed for SRAM access, depending on the structure and access pattern of your table.
For execution times of instructions see AVR Instruction Set, especially the LPM vs. the LDS, LD, and LDD instructions.

Related

Where does code memory in Harvard architecture refers to?

Harvard Architecture is a computer architecture with separate bus for code and data memory. Is that architecture referring code memory which is in RAM or ROM (for Micro-controllers). I was confused when the architecture says about code memory. As far as i know for small scale embedded systems code will always be executing from ROM, whereas in Medium scale and Sophisticated Embedded systems Code memory can be transferred to RAM from ROM for faster execution. If that is the case is RAM connected with two buses one for code and other for data memory. Can any one please help me in understanding this.
You might want to see this
https://en.wikipedia.org/wiki/Modified_Harvard_architecture
The first time I came across this Harvard architecture thing is on PICs, and they do have their RAM and ROM separated on 2 different address space. But it seems like this is not the only way to do it. Having the data & code accessible at the same time is the key. For example, having a single RAM memory space virtually partitioned to store code & data separately, but accessible by the processor at the same time. It's not a pure Harvard architecture system, but close enough.
Harvard Architecture is for the most part an academic exercise. First you have to ask how do they determine the split to the four busses? An internal von neumann that splits by address? many von nuemann implementations if not all split by address and if you draw a bigger box you see many separate busses sometimes data and instruction are joined, sometimes not.
Because you cant use pure harvard for a bootloader or operating system it is really just a mental exercise. A label like von neumann that folks like to toss about if for no other reason to create confusion. The real world is somewhere in between. AMBA/AXI and other busses are labelled modified harvard because they tag the data and instruction transactions as such but share the same busses (there isnt a single bus on a number of these there are separate read address, read data, write address, write data). the processor has not been the bottleneck in a long time the processor and these busses can be and are idle so you can have room for instruction and data and peripherals on the same set of busses, particularly if you separate read address, read data, write address, write data into separate busses with id tags being used to connect the dots and complete transactions.
as mentioned on wikipedia the closest you are really going to see in the real world is something like a microcontroller. And when they talk about memory the really just mean address space, what is out there on the other end of the bus can be sram, dram, flash, eeprom, etc or a combination. On either side, as well as all the peripherals on that bus. So in a microcontroller the instructions are in flash in this model and the sram is the data and if a pure harvard architecture there is no way to load code to sram and run it there, likewise you cant use the data bus to program the flash either or to buffer up data to be flashed, the rom/flash gets magically loaded by a not shown on hardvard architecture path. likely a crossover between the I/O bus resources and the instruction bus resources, which begs to be called modified harvard.
for Von Neumann, you have early address decoders that spit the bus into instructions, data, I/O, and sub divisions of those, perhaps the data and instruction stay combined but you dont have a pure single bus from end to end. not practical.
Look at the pictures on wikipedia, understand one has separate busses for things the other is combined. Pass the test and forget the terms you wont need them after that they are not really relevant.
Harvard has almost nothing to do with RAM or ROM - It just says that, in principle, instruction fetches and data read/write is done over separate buses.
That simply implies that at least some ROM (bootstrap code) needs to be found on the instruction memory bus - the rest can be RAM. The non-instruction bus can access RAM or ROM as well - ROM could hold constant data.
On "real" implementations like the AVR MCUs, however, the instruction bus addresses Flash ROM, while the non-instruction bus (I'm deliberately not writing "data bus", that's something different) addresses SRAM. You don't even "see" these buses on an AVR - They are purely internal to most of these MCUs.

Does simulating memory-mapped I/O using VMX require instruction decoding?

I am wondering how a hypervisor using Intel's VMX / VT technology would simulate memory-mapped I/O (so that the guest could think it was performing memory mapped I/O againsta device).
I think the basic principle would be to set up the EPT page tables in such a way that the memory addresses in question would cause an EPT violation (i.e. VM exit) by setting them such that they cannot be read or written? However, the next question is how to process the VM exit. Such a VM-exit would fill out all the exit qualification reasons etc. including the guest-linear and guest-physical address etc. But what I am missing in these exit qualification fields is some field indicating - in case of a write instruction - the value that was attempted to be written and the size of the write. Likewise, for a read instruction it would be nice with some bit fields indicating the destination of the read, say a register or a memory location (in case of memory-to-memory string operations). This would make it very easy for the hypervisor to figure out what the guest was trying to do and then simulate the device behavior towards the guest.
But the trouble is, I can't find such fields among the exit qualifications. I can see an instruction pointer to where the faulting instruction is, so I could walk the page tables to read in the instruction and then decode it to understand the instruction, then simulate the I/O behavior. However, this requires the hypervisor to have a fairly complete picture of all x86 instructions, and be able to decode them. That seems to be quite a heavy burden on the hypervisor, and will also require it to stay current with later instruction additions. And the CPU should already have this information.
There's a chance that that I am missing these relevant fields because the documentation is quite extensive, but I have tried to search carefully but have not been able to find it. Maybe someone can point me in the right direction OR confirm that the hypervisor will need to contain an instruction decoder.
I believe most VMs decode the instruction. It's not actually that hard, and most VMs have software emulators to fallback on when the CPU VM extensions aren't available or up to the task. You don't need to handle every instruction, just those that can take memory operands, and you can probably ignore everything that isn't a 1, 2, or 4 byte memory operand since you're not likely to emulating device registers other than those sizes. (For memory mapped device buffers, like video memory, you don't want to be trapping every memory accesses because that's too slow, and so you'll have to take different approach.)
However, there is one way you can let the CPU do the work for you, but it's much slower then decoding the instruction itself and it's not entirely perfect. You can single step the instruction while temporarily mapping in a valid page of RAM. The VM exit will tell you the guest physical address access and whether it was a read or write. Unfortunately it doesn't reliably tell you whether it was read-modify-write instruction, those may just set the write flag, and with some device registers that can make a difference. It might be easier to copy the instruction (it can only be a most 15 bytes, but watch out for page boundaries) and execute it in the host, but that requires that you can map the page to same virtual address in the host as in the guest.
You could combine these techniques, decode the common instructions that are actually used to access memory mapped device registers, while using single stepping for the instructions you don't recognize.
Note that by choosing to write your own hypervisor you've put a heavy burden on yourself. Having to decode instructions in software is a pretty minor burden compared to the task of emulating an entire IBM PC compatible computer. The Intel virtualisation extensions aren't designed to make this easier, they're just designed to make it more efficient. It would be easier to write a pure software emulator that interpreted the instructions. Handling memory mapped I/O would be just a matter of dispatching the reads and writes to the correct function.
I don't know in details how VT-X works, but I think I see a flaw in your wishlist way it could work:
Remember that x86 is not a load/store machine. The load part of add [rdi], 2 doesn't have an architecturally-visible destination, so your proposed solution of telling the hypervisor where to find or put the data doesn't really work, unless there's some temporary location that isn't part of the guest's architectural state, used only for communication between the hypervisor and the VMX hardware.
To handle a read-modify-write instruction with a memory destination efficiently, the VM should do the whole thing with one VM exit. So you can't just provide separate load and store interfaces.
More importantly, handling atomic read-modify-writes is a special case. lock add [rdi], 2 can't just be done as a separate load and store.

Does AArch64 support unaligned access?

Does AArch64 support unaligned access natively? I am asking because currently ocamlopt assumes "no".
Providing the hardware bit for strict alignment checking is not turned on (which, as on x86, no general-purpose OS is realistically going to do), AArch64 does permit unaligned data accesses to Normal (not Device) memory with the regular load/store instructions.
However, there are several reasons why a compiler would still want to maintain aligned data:
Atomicity of reads and writes: naturally-aligned loads and stores are guaranteed to be atomic, i.e. if one thread reads an aligned memory location simultaneously with another thread writing the same location, the read will only ever return the old value or the new value. That guarantee does not apply if the location is not aligned to the access size - in that case the read could return some unknown mixture of the two values. If the language has a concurrency model which relies on that not happening, it's probably not going to allow unaligned data.
Atomic read-modify-write operations: If the language has a concurrency model in which some or all data types can be updated (not just read or written) atomically, then for those operations the code generation will involve using the load-exclusive/store-exclusive instructions to build up atomic read-modify-write sequences, rather than plain loads/stores. The exclusive instructions will always fault if the address is not aligned to the access size.
Efficiency: On most cores, an unaligned access at best still takes at least 1 cycle longer than a properly-aligned one. In the worst case, a single unaligned access can cross a cache line boundary (which has additional overhead in itself), and generate two cache misses or even two consecutive page faults. Unless you're in an incredibly memory-constrained environment, or have no control over the data layout (e.g. pulling packets out of a network receive buffer), unaligned data is still best avoided.
Necessity: If the language has a suitable data model, i.e. no pointers, and any data from external sources is already marshalled into appropriate datatypes at a lower level, then there's really no need for unaligned accesses anyway, and it makes the compiler's life that much easier to simply ignore the idea altogether.
I have no idea what concerns OCaml in particular, but I certainly wouldn't be surprised if it were "all of the above".

Write-Only Memory

I know there exists read-only values in many languages (final in Java const in C++ etc.) but does such a thing as "Write-Only" values exist? I've heard a variation of this in jokes, such as write-only code, but I'm wondering if this is actually a legitimate concept in computer science. To be honest, I can't see how it would be helpful in any situation, but I'm just wondering.
In unix shell scripting there is a concept of write only memory. But it's not part of any shell or scripting language, it's a device: /dev/null.
The write-only device /dev/null is used to discard output you don't want. Generally by allowing the caller to redirect stdout and/or stderr to it.
There are other write-only memory on a computer. One example is your sound card which on some (older) unix machines are mapped to /dev/audio or /dev/dsp. Writing values to it makes your speaker produce sound but reading from it gets you nothing.
At the lower level of the device drivers themselves, these hardware devices are often connected to a specific memory or I/O address (some CPU architectures don't have separate memory and I/O address - just a single address space shared by RAM and all other hardware). So in a real sense these memory locations are really write-only.
There were certainly some FPUs for PCs that used a somewhat weird setup, by existing as memory-mapped devices. To perform some operations, you would simply write the value you wanted to operate on, to a memory address indicating what operation you wanted performed, the value would then (eventually) be available at another address.
I don't know if you would define this, strictly, as "write-only memory", it is rather memory where (part of) the address is used as an opcode.

How to find number of memory accesses

Can anybody tell me a unix command that can be used to find the number of memory accesses that took place in a given interval. vmstat, top and sar only give the amount of physical memory space occupied/available .. But do not give the number of memory of accesses in a given interval
If I understand what you're asking, such a feature would almost certainly require hardware support at a very low level (e.g. a counter of some sort that monitors memory bus activity).
I don't think such support is available for the common architectures supported by
Unix or Linux, so I'm going to go out on a limb and say that no such Unix command exists.
The situation is somewhat different when considering memory in units of pages,
because most architectures that support virtual memory have dedicated MMU hardware
which operates at that level of granularity, and can be accessed by the operating
system. But as far as I know, the sorts of counter data you'd get from the MMU would
represent events like page faults, allocations, and releases, rather than individual
reads or writes.

Resources