ARM: memory address ... why is it 0x04030201...? - memory

can someone explain to me why we represent, the memory address itself in this way:
"Word on address =0x00":
0x04030201,
I know each of the 01, 02, 03, 04 is one byte, but can someone explain to me where that byte is, what does it represent? a memory cell in a register? I am totally confused...

An address, memory or otherwise is really no different than an address on a building. Sometimes systematically and well chosen, sometimes haphazardly. In any case some buildings are fire stations, some are grocery stores, some are apartments and some are houses and so on. But the addressing system used by a city or nation can get your item directly to that building.
when we talk about addresses in software it is no different. the processor at the lowest level doesnt know or care there is an address bus, an interface where the address is projected outside the processor. As you add on each layer of logic, like an onion, around the processor and eventually other chips, memory, usb controllers, hard drive controllers, etc. Just like the parts of an address on an envelope, portions of that address are extracted and the read or write command is delivered to the individual logic who wears that address on the side of their building.
You cant simply ask what is address 0x04030201 without any context. Addresses schemes are fairly specific their system there are hundreds or thousands or tens of thousands of arm based systems all of which have different address schemes and that address could point to nothing, with nobody to answer that request dies and possibly hangs the processor or it could be some ram or it could be a register in a usb controller or video controller or disk drive controller.
Generally you have read and write operations, in this example that would be once the letter makes it to the individual at the address on the envelope the contents of the letter contain instructions. Do this (write), or get this and mail it back (read). And the individual in the case of hardware just does what it is told without asking. If it is a read then it performs a read within the context of whatever that device is. A hard disk controller a read of a particular address might be a temperature sensor, or a status register that contains the speed at which the motor is spinning, or it might be some data that had been recently read from the hard disk. In the simple case of memory it is likely just some memory, some bytes.
how much stuff is being read is also another item that is specified on the processors bus, and this varies from processor to processor as to what is available to the programmer. Sometimes you can request to read or write individual bytes, sometimes 16 bit items or 32 or 64, etc.
then you get into address translation. Using the mail analogy this is kind of like having your mail forwarded to another address. You write one address on the letter, the post office has a forwarding request for that address, so they change your address to the new address and then complete the delivery of the letter. When you hear of a memory management unit, MMU, and in some uses of the word virtual memory, that is the kind of thing that is going on. Lets say that we want to make the programmers life simple and we tell every one that ram starts at address 0x00000000. that makes it much easier to have a compiler choose memory locations where our variables and arrays and programs live, it can compile every program the same way based on that address. But how is it that I can have many programs running at once if they all share the same memory. well they dont. One program thinks it is writing to address 0x00000000 but in reality there is some unique address which can be completely different that does belong only to that program lets say address 0x10000000, the mmu is like the mail carrier at the post office that changes the address, the processor knows from information as to which task is running that it needs to convert 0x00000000 to 0x10000000. When another program accesses what it thinks is 0x00000000 it might have that address changed to 0x11000000, and another 0x00000000 might map to physical address 0x12000000. The address that actually hits the memory is called the physical address, the address that the program uses is called the virtual address, it isnt real it is destined to be changed.
This mmu stuff not only allows for compilers and programmers to have an easier job but also the mmu allows us to protect one program or the operating system from another. Application programs run at a certain protection level which the mmu uses to know what that user is allowed to do. if a program generates a virtual address that is outside of its address space, say the system has 1 gig of memory and the program tries to address 1 gig plus a little bit more. the mmu instead of converting that to a physical address instead generates an interrupt to the processor which switches that processor into a mode that has more permissions, basically the operating system, and the operating system can then decided to try to use that other kind of virtual memory and allow the program to have more memory, or it may kill the program and put up a warning message to the user that such and such program has had a protection fault and was killed.
Address schemes for computers are generally a lot more thought out than developers that number houses in new neighborhoods, but not always, but it is not that far removed from an address on an envelope. You pick apart bits in the address and those chunks of bits mean something and deliver this processor request to the individual at that address. How the bits are parsed is very specific to the processor and platform and in some cases is dynamic or programmable on the fly, so if your next question is what is 0xabcd on my system, we may still not be able to help you. You may have to do more research or give is a lot of info...

Think of memory as an array of bytes. 'Word on address' can mean different things depending what the CPU designers consider a Word. In your case it seems a Word is 32 bits long.
So 'Word on address=0x00: 0x04030201' means:
'Beginning at memory cell 0x00 (inclusive), the 'next' four bytes are 0x04 0x03 0x02 0x01.
Also, depending on the endianness of your CPU the meaning of 'next' changes. It could be that 0x04 is stored in cell 0x00, or that 0x01 is stored there.

Related

Committed vs Reserved Memory

According to "Windows Internals, Part 1" (7th Edition, Kindle version):
Pages in a process virtual address space are either free, reserved, committed, or shareable.
Focusing only on the reserved and committed pages, the first type is described in the same book:
Reserving memory means setting aside a range of contiguous virtual addresses for possible future use (such as an array) while consuming negligible system resources, and then committing portions of the reserved space as needed as the application runs. Or, if the size requirements are known in advance, a process can reserve and commit in the same function call.
Both reserving or committing will initially get you entries in the VADs (virtual address descriptors), but neither operation will touch the PTE (page table entries) structures. It used to cost PTEs for reserving before Windows 8.1, but not anymore.
As described above, reserved means blocking a range of virtual addresses, NOT blocking physical memory or paging file space at the OS level. The OS doesn't include this in the commit limit, therefore when the time comes to allocate this memory, you might get a surprise. It's important to note that reserving happens from the perspective of the process address space. It's not that there's any physical resource reserved - there's no stamping of "no vacancy" against RAM space or page file(s).
The analogy with plots of land might be missing something: take reserved as the area of land surrounded by wooden poles, thus letting others now that the land is taken. But how about committed ? It can't be land on which structures (eg houses) have already been build, since those would require PTEs and there's none there yet, since we haven't accessed anything. It's only when touching committed data that the PTEs will get built, which will make the pages available to the process.
The main problem is that committed memory - at least in its initial state - is functionally very much alike reserved memory. It's just an area blocked within VADs. Try to touch one of the addresses, and you'll get an access violation exception for a reserved address:
Attempting to access free or reserved memory results in an access violation exception because the page isn’t mapped to any storage that can resolve the reference
...and an initial page fault for a committed one (immediately followed by the required PTE entries being created).
Back to the land analogy, once houses are build, that patch of land is still committed. Yet this is a bit peculiar, since it was still committed when the original grass was there, before the very first shovel was excavated to start construction. It resembled the same state as that of a reserved patch. Maybe it would be better to think of it like terrain eligible for construction. Eg you have a permit to build (albeit you might never build as much as a wall on that patch of land).
What would be the reasons for using one type of memory versus the other ? There's at least one: the OS guarantees that there will be room to allocate committed memory, should that ever occur in the future, but doesn't guarantee anything for reserved memory aside from blocking that process' address space range. The only downside for committed memory is that one or more paging files might need to be extended in size as to be able to make the commit limit take into account the recently allocated block, so should the requester demand the use of part of all the data in the future, the OS can provide access to it.
I can't really think how the land analogy can capture this detail of "guarantee". After all, the reserved patch also physically existed, covered by the same grass as a committed one in its pristine state.
The stack is another scenario where reserved and committed memory are used together:
When a thread is created, the memory manager automatically reserves a predetermined amount of virtual memory, which by default is 1 MB.[...] Although 1 MB is reserved, only the first page of the stack will be committed [...]
along with a guard page. When a thread’s stack grows large enough to touch the guard page, an exception occurs, causing an attempt to allocate another guard. Through this mechanism, a user stack doesn’t immediately consume all 1 MB of committed memory but instead grows with demand."
There is an answer here that deals with why one would want to use reserved memory as opposed to committed . It involves storing continuously expanding data - which is actually the stack model described above - and having specific absolute address ranges available when needed (although I'm not sure why one would want to do that within a process).
Ok, what am I actually asking ?
What would be a good analogy for the reserved/committed concept ?
Any other reason aside those depicted above that would mandate the
use of reserved memory ? Are there any interesting use cases when
resorting to reserved memory is a smart move ?
Your question hits upon the difference between logical memory translation and virtual memory translation. While CPU documentation likes to conflate these two concepts, they are different in practice.
If you look at logical memory translation, there are are only two states for a page. Using your terminology, they are FREE and COMMITTED. A free page is one that has no mapping to a physical page frame and a COMMITTED page has such a mapping.
In a virtual memory system, the operating system has to maintain a copy of the address space in secondary storage. How this is done depends upon the operating system. Typically, a process will have its mapping to several different files for secondary storage. The operating system divides the address space into what is usually called a SECTION.
For example, the code and read only data could be stored virtually as one or more SECTIONS in the executable file. Code and static data in shared libraries could each be in a different section that are paged to the shared libraries. You might have a map to a shared filed to the process that uses memory that can be accessed by multiple processes that forms another section. Most of the read/write data is likely to be in a page file in one or more sections. How the operating system tracks where it virtually stores each section of data is system dependent.
For windows, that gives the definition of one of your terms: Sharable. A sharable section is one where a range of addresses can be mapped to different processes, at different (or possibly the same) logical addresses.
Your last term is then RESERVED. If you look at the Windows' VirtualAlloc function documentation, you can see that (among your options) you can RESERVE or COMMIT. If you reserve you are creating a section of VIRTUAL MEMORY that has no mapping to physical memory.
This RESERVE/COMMIT model is Windows-specific (although other operating systems may do the same). The likely reason was to save disk space. When Windows NT was developed, 600MB drives the size of washing machine were still in use.
In these days of 64-bit address spaces, this system works well for (as you say) expanding data. In theory, an exception handler for a stack overrun can simply expand the stack. Reserving 4GB of memory takes no more resources than reserving a single page (which would not be practicable in a 32-bit system—see above). If you have 20 threads, this makes reserving stack space efficient.
What would be a good analogy for the reserved/committed concept ?
One could say RESERVE is like buying options to buy and COMMIT is exercising the option.
Any other reason aside those depicted above that would mandate the use of reserved memory ? Are there any interesting use cases when resorting to reserved memory is a smart move ?
IMHO, the most likely places to RESERVE without COMMITTING are for creating stacks and heaps with the former being the most important.

Why is the memory address printed with {:p} much bigger than my RAM specs?

I want to print the memory location (address) of a variable with:
let x = 1;
println!("{:p}", &x);
This prints the hex value 0x7fff51ef6380 which in decimal is 140734568031104.
My computer has 16GB of RAM, so why this huge number? Does the x64 architecture use a big interval sequence instead of just simple 1 increment for accessing memory location?
In x86, usually the first location starts at 0, then 1, 2, etc. so the highest number you can have is around 4 billion, so the address number was always equals or less than 4 billion.
Why is this not the case with x64?
What you see here is an effect of virtual memory. Memory management is hard and it becomes even harder when the operating system and tens of hundreds of processes have to share the memory. In order to handle this huge complexity, the concept of virtual memory was used. I'll just briefly explain the basics here; the topic is far more complex and you should read about it somewhere else, too.
On most modern computers, each process thinks that it owns (almost) the complete memory space. But processes never deal with physical addresses, but with virtual ones. These virtual addresses are mapped to physical ones each time the process actually reads from memory. This translation of addresses is done by the so called MMU (memory management unit). The rules for how to map the addresses are setup by the operating system.
When you boot your PC, the operating system creates an initial mapping. Every time you start a process, the operating system adds a few slices of physical memory to the process and modifies the mapping appropriately. That way, the process has memory to play with.
On x86_64, the address space is 64 bit wide, so each process thinks it owns all of those 2^64 addresses. This is not true, of course:
There isn't a single PC on the world with that much memory. (In fact, most CPUs today can merely use 280 TB of RAM, since they internally can only use 48bit for addressing physical memory. And even these 280TB are enough for now, apparently.)
Even if you had that much memory, there are other processes which use part of that memory, too.
So what happens when you try to read an address which isn't mapped (which in 64bit land, are the vast majority of the addresses)? The MMU triggers a page fault. This makes the CPU notify the operating system to handle this.
What I mean is that in x86, usually first location starts at 0, then 1, 2, etc. so the highest number you can have is around 4 billion.
That is true, but it is also true if your x86 system has less than 4GB of RAM. Virtual memory exists for quite some time already.
So that's a short summary of why you see such big addresses. Again, please note that I glossed over many details here.
The pointers your program works with are in virtual address space. x86-64 uses 64-bit pointers. This was one of the major goals of AMD64, along with adding more integer and XMM registers. You are correct that i386 only has 32-bit pointers which only cover 4GB of address space in each process.
0x7fff51ef6380 looks like a stack pointer, which I guess makes sense for that code.
Linux on x86-64 (for example) puts the stack near the top of the lower canonical address range: current x86-64 hardware only implements 48-bit virtual addresses and this is the mechanism to prevent software from depending on it. This allows the address space to be extended in the future without breaking software.
The amount of phyiscal RAM in your system has nothing to do with this. You'd see (approximately) the same number on an x86-64 system with 128MB of RAM, +/- stack address space layout randomization (ASLR).

Where does code memory in Harvard architecture refers to?

Harvard Architecture is a computer architecture with separate bus for code and data memory. Is that architecture referring code memory which is in RAM or ROM (for Micro-controllers). I was confused when the architecture says about code memory. As far as i know for small scale embedded systems code will always be executing from ROM, whereas in Medium scale and Sophisticated Embedded systems Code memory can be transferred to RAM from ROM for faster execution. If that is the case is RAM connected with two buses one for code and other for data memory. Can any one please help me in understanding this.
You might want to see this
https://en.wikipedia.org/wiki/Modified_Harvard_architecture
The first time I came across this Harvard architecture thing is on PICs, and they do have their RAM and ROM separated on 2 different address space. But it seems like this is not the only way to do it. Having the data & code accessible at the same time is the key. For example, having a single RAM memory space virtually partitioned to store code & data separately, but accessible by the processor at the same time. It's not a pure Harvard architecture system, but close enough.
Harvard Architecture is for the most part an academic exercise. First you have to ask how do they determine the split to the four busses? An internal von neumann that splits by address? many von nuemann implementations if not all split by address and if you draw a bigger box you see many separate busses sometimes data and instruction are joined, sometimes not.
Because you cant use pure harvard for a bootloader or operating system it is really just a mental exercise. A label like von neumann that folks like to toss about if for no other reason to create confusion. The real world is somewhere in between. AMBA/AXI and other busses are labelled modified harvard because they tag the data and instruction transactions as such but share the same busses (there isnt a single bus on a number of these there are separate read address, read data, write address, write data). the processor has not been the bottleneck in a long time the processor and these busses can be and are idle so you can have room for instruction and data and peripherals on the same set of busses, particularly if you separate read address, read data, write address, write data into separate busses with id tags being used to connect the dots and complete transactions.
as mentioned on wikipedia the closest you are really going to see in the real world is something like a microcontroller. And when they talk about memory the really just mean address space, what is out there on the other end of the bus can be sram, dram, flash, eeprom, etc or a combination. On either side, as well as all the peripherals on that bus. So in a microcontroller the instructions are in flash in this model and the sram is the data and if a pure harvard architecture there is no way to load code to sram and run it there, likewise you cant use the data bus to program the flash either or to buffer up data to be flashed, the rom/flash gets magically loaded by a not shown on hardvard architecture path. likely a crossover between the I/O bus resources and the instruction bus resources, which begs to be called modified harvard.
for Von Neumann, you have early address decoders that spit the bus into instructions, data, I/O, and sub divisions of those, perhaps the data and instruction stay combined but you dont have a pure single bus from end to end. not practical.
Look at the pictures on wikipedia, understand one has separate busses for things the other is combined. Pass the test and forget the terms you wont need them after that they are not really relevant.
Harvard has almost nothing to do with RAM or ROM - It just says that, in principle, instruction fetches and data read/write is done over separate buses.
That simply implies that at least some ROM (bootstrap code) needs to be found on the instruction memory bus - the rest can be RAM. The non-instruction bus can access RAM or ROM as well - ROM could hold constant data.
On "real" implementations like the AVR MCUs, however, the instruction bus addresses Flash ROM, while the non-instruction bus (I'm deliberately not writing "data bus", that's something different) addresses SRAM. You don't even "see" these buses on an AVR - They are purely internal to most of these MCUs.

I/O-mapped I/O - are port addresses a part of the RAM

In I/O-mapped I/O (as opposed to memory-mapped I/O), a certain set of addresses are fixed for I/O devices. Are these addresses a part of the RAM, and thus that much physical address space is unusable ? Does it correspond to the 'Hardware Reserved' memory in the attached picture ?
If yes, how is it decided which bits of an address are to be used for addressing I/O devices (because the I/O address space would be much smaller than the actual memory. I have read this helps to reduce the number of pins/bits used by the decoding circuit) ?
What would happen if one tries to access, in assembly, any address that belongs to this address space ?
I/O mapped I/O doesn't use the same address space as memory mapped I/O. The later does use part of the address space normally used by RAM and therefore, "steals" addresses that no longer belong to RAM memory.
The set of address ranges that are used by different memory mapped I/O is what you see as "Hardware reserved".
About how is it decided how to address memory mapped devices, this is largely covered by the PnP subsystem, either in BIOS, or in the SO. Memory-mapped devices, with few exceptions, are PnP devices, so that means that for each of them, its base address can be changed (for PCI devices, the base address of the memory mapped registers, if any, is contained in a BAR -Base Address Register-, which is part of the PCI configuration space).
Saving pins for decoding devices (lazy decoding) is (was) done on early 8-bit systems, to save decoders and reduce costs. It haven't anything to do with memory mapped / IO mapped devices. Lazy decoding may be used in both situations. For example, a designer could decide that the 16-bit address range C000-FFFF is going to be reserved for memory mapped devices. To decide whether to enable some memory chip, or some device, it's enough to look at the value of A15 and A14. If both address lines are high, then the block addressed is C000-FFFF and that means that memory chip enables will be deasserted. On the other hand, a designer could decide that the 8 bit IO port 254 is going to be assigned to a device, and to decode this address, it only looks at the state of A0, needing no decoders to find out the port address (this is for example, what the ZX Spectrum does for addressing the ULA)
If a program (written in whatever language that allows you to access and write to arbitrary memory locations) tries to access a memory address reserved for a device, and assuming that the paging and protection mechanism allows such access, what happens will depend solely on what the device does when that address is accessed. A well known memory mapped device in PC's is the frame buffer. If the graphics card is configured to display color text mode with its default base address, any 8-bit write operation performed to even physical addresses between B8000 and B8F9F will cause the character whose ASCII code is the value written to show on screen, in a location that depends on the address chosen.
I/O mapped devices don't collide with memory, as they use a different address space, with different instructions to read and write values to addresses (ports). These devices cannot be addressed using machine code instructions that targets memory.
Memory mapped devices share the address space with RAM. Depending on the system configuration, memory mapped registers can be present all the time, using some addresses, and thus preventing the system to use them for RAM, or memory mapped devices may "shadow" memory at times, so allowing the program to change the I/O configuration to choose if a certain memory region will be decoded as in use by a device, or used by regular RAM (for example, what the Commodore 64 does to let the user have 64KB of RAM but allowing it to access device registers some times, by temporarily disabling access to the RAM that is "behind" the device that is currently being accessed at that very same address).
At the hardware level, what is happening is that there are two different signals: MREQ and IOREQ. The first one is asserted on every memory instruction, the second one, on every I/O insruction. So this code...
MOV DX,1234h
MOV AL,[DX] ;reads memory address 1234h (memory address space)
IN AL,DX ;reads I/O port 1234h (I/O address space)
Both put the value 1234h on the CPU address bus, and both assert the RD pin to indicate a read, but the first one will assert MREQ to indicate that the address belong to the memory address space, and the second one will assert IOREQ to indicate that it belongs to the I/O address space. The I/O device at port 1234h is connected to the system bus so that it is enabled only if the address is 1234h, RD is asserted and IOREQ is asserted. This way, it cannot collide with a RAM chip addressed at 1234h, because the later will be enabled only if MREQ is asserted (the CPU ensures that IOREQ and MREQ cannot be asserted at the same time).
These two address spaces don't exist in all CPU's. In fact, the majority of them don't have this, and therefore, they have to memory map all its devices.

What is paging?

Paging is explained here, slide #6 :
http://www.cs.ucc.ie/~grigoras/CS2506/Lecture_6.pdf
in my lecture notes, but I cannot for the life of me understand it. I know its a way of translating virtual addresses to physical addresses. So the virtual addresses, which are on disks are divided into chunks of 2^k. I am really confused after this. Can someone please explain it to me in simple terms?
Paging is, as you've noted, a type of virtual memory. To answer the question raised by #John Curtsy: it's covered separately from virtual memory in general because there are other types of virtual memory, although paging is now (by far) the most common.
Paged virtual memory is pretty simple: you split all of your physical memory up into blocks, mostly of equal size (though having a selection of two or three sizes is fairly common in practice). Making the blocks equal sized makes them interchangeable.
Then you have addressing. You start by breaking each address up into two pieces. One is an offset within a page. You normally use the least significant bits for that part. If you use (say) 4K pages, you need 12 bits for the offset. With (say) a 32-bit address space, that leaves 20 more bits.
From there, things are really a lot simpler than they initially seem. You basically build a small "descriptor" to describe each page of memory. This will have a linear address (the address used by the client application to address that memory), and a physical address for the memory, as well as a Present bit. There will (at least usually) be a few other things like permissions to indicate whether data in that page can be read, written, executed, etc.
Then, when client code uses an address, the CPU starts by breaking up the page offset from the rest of the address. It then takes the rest of the linear address, and looks through the page descriptors to find the physical address that goes with that linear address. Then, to address the physical memory, it uses the upper 20 bits of the physical address with the lower 12 bits of the linear address, and together they form the actual physical address that goes out on the processor pins and gets data from the memory chip.
Now, we get to the part where we get "true" virtual memory. When programs are using more memory than is actually available, the OS takes the data for some of those descriptors, and writes it out to the disk drive. It then clears the "Present" bit for that page of memory. The physical page of memory is now free for some other purpose.
When the client program tries to refer to that memory, the CPU checks that the Present bit is set. If it's not, the CPU raises an exception. When that happens, the CPU frees up a block of physical memory as above, reads the data for the current page back in from disk, and fills in the page descriptor with the address of the physical page where it's now located. When it's done all that, it returns from the exception, and the CPU restarts execution of the instruction that caused the exception to start with -- except now, the Present bit is set, so using the memory will work.
There is one more detail that you probably need to know: the page descriptors are normally arranged into page tables, and (the important part) you normally have a separate set of page tables for each process in the system (and another for the OS kernel itself). Having separate page tables for each process means that each process can use the same set of linear addresses, but those get mapped to different set of physical addresses as needed. You can also map the same physical memory to more than one process by just creating two separate page descriptors (one for each process) that contain the same physical address. Most OSes use this so that, for example, if you have two or three copies of the same program running, it'll really only have one copy of the executable code for that program in memory -- but it'll have two or three sets of page descriptors that point to that same code so all of them can use it without making separate copies for each.
Of course, I'm simplifying a lot -- quite a few complete (and often fairly large) books have been written about virtual memory. There's also a fair amount of variation among machines, with various embellishments added, minor changes in parameters made (e.g., whether a page is 4K or 8K), and so on. Nonetheless, this is at least a general idea of the core of what happens (and it's still at a high enough level to apply about equally to an ARM, x86, MIPS, SPARC, etc.)
Simply put, its a way of holding far more data than your address space would normally allow. I.e, if you have a 32 bit address space and 4 bit virtual address, you can hold (2^32)^(2^4) addresses (far more than a 32 bit address space).
Paging is a storage mechanism that allows OS to retrieve processes from the secondary storage into the main memory in the form of pages. In the Paging method, the main memory is divided into small fixed-size blocks of physical memory, which is called frames. The size of a frame should be kept the same as that of a page to have maximum utilization of the main memory and to avoid external fragmentation.

Resources