Does a hex memory address represent the position in the memory as a whole?
e.g. 4 gb ram and there is a memory adress. Does it point to the position(in bytes) that the data starts at? e.g. at 2.1 gb.
How do memory address work on hard disk before the data is loaded into memory?
Is there ever a case where parts of the data are fetched from memory and other data is fetched from disk? How are the locations differentiated?
Thanks
A hex memory address (like what you would see if you printed out the value of a pointer) points to a location in virtual memory.
On a 32 bit system, each process has a full 4GB of virtual memory. This virtual memory is managed by the CPU and the operating system. When you access a location in virtual memory, the CPU and operating system determine where in the systems actual physical memory that location is mapped, and the data is retrieved from there.
The operating system may also take things out of physical memory and swap them to disk, in order to make room in physical memory for other things. Then, if you try to access the virtual memory location of something that was swapped from physical memory to disk, a "page fault" is generated which causes the OS to re-load the page from disk into physical memory.
Modern operating systems have virtual memory.
This means that the address which uses your program to access some byte in memory, is purely "virtual", non-existent. The operating system maps it via special hardware controllers to real memory locations which are completely different and there may be no physical memory location at all for a given address. For example, you may mmap() a file into (virtual) memory, and accessing a byte at the virtual addresses would mean accessing a byte of file. Similarly, if some memory page wasn't used for a long time, the OS may swap off the page from physical RAM to disk. In this case virtual memory wouldn't point to physical memory locations too.
In most cases - yes. But some processors uses 2 values to calculate real address. For example Intel 8086.
Hardisk is only storage, that has its own system to store information. So before any CPU operation is executed, data has to be loaded into RAM.
Related
I'm aware that memory allocated by kmalloc is physically continuous and virtual memory it returned has just an offset from its physical memory.
But if CPU tries to access the virtual memory it returned, will MMU and page table be used? I heard all address CPU use are virtual memory and has to be passed to MMU. But now that there is only an offset between its physical and virtual memory, I think there is no need to use page table and mmu anymore.
I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;
In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.
CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).
As described here http://forums.nvidia.com/index.php?showtopic=164661
host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.
I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:
The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().
CUDA use DMA to transfer pinned memory to GPU. Pageable host memory cannot be used with DMA because they may reside on the disk.
If the memory is not pinned (i.e. page-locked), it's first copied to a page-locked "staging" buffer and then copied to GPU through DMA.
So using the pinned memory you save the time to copy from pageable host memory to page-locked host memory.
If the memory pages had not been accessed yet, they were probably never swapped in to begin with. In particular, newly allocated pages will be virtual copies of the universal "zero page" and don't have a physical instantiation until they're written to. New maps of files on disk will likewise remain purely on disk until they're read or written.
A verbose note on copying non-locked pages to locked pages.
It could be extremely expensive if non-locked pages are swapped out by OS on a busy system with limited CPU RAM. Then page fault will be triggered to load pages into CPU RAM through expensive disk IO operations.
Pinning pages can also cause virtual memory thrashing on a system where CPU RAM is precious. If thrashing happens, the throughput of CPU can be degraded a lot.
In I/O-mapped I/O (as opposed to memory-mapped I/O), a certain set of addresses are fixed for I/O devices. Are these addresses a part of the RAM, and thus that much physical address space is unusable ? Does it correspond to the 'Hardware Reserved' memory in the attached picture ?
If yes, how is it decided which bits of an address are to be used for addressing I/O devices (because the I/O address space would be much smaller than the actual memory. I have read this helps to reduce the number of pins/bits used by the decoding circuit) ?
What would happen if one tries to access, in assembly, any address that belongs to this address space ?
I/O mapped I/O doesn't use the same address space as memory mapped I/O. The later does use part of the address space normally used by RAM and therefore, "steals" addresses that no longer belong to RAM memory.
The set of address ranges that are used by different memory mapped I/O is what you see as "Hardware reserved".
About how is it decided how to address memory mapped devices, this is largely covered by the PnP subsystem, either in BIOS, or in the SO. Memory-mapped devices, with few exceptions, are PnP devices, so that means that for each of them, its base address can be changed (for PCI devices, the base address of the memory mapped registers, if any, is contained in a BAR -Base Address Register-, which is part of the PCI configuration space).
Saving pins for decoding devices (lazy decoding) is (was) done on early 8-bit systems, to save decoders and reduce costs. It haven't anything to do with memory mapped / IO mapped devices. Lazy decoding may be used in both situations. For example, a designer could decide that the 16-bit address range C000-FFFF is going to be reserved for memory mapped devices. To decide whether to enable some memory chip, or some device, it's enough to look at the value of A15 and A14. If both address lines are high, then the block addressed is C000-FFFF and that means that memory chip enables will be deasserted. On the other hand, a designer could decide that the 8 bit IO port 254 is going to be assigned to a device, and to decode this address, it only looks at the state of A0, needing no decoders to find out the port address (this is for example, what the ZX Spectrum does for addressing the ULA)
If a program (written in whatever language that allows you to access and write to arbitrary memory locations) tries to access a memory address reserved for a device, and assuming that the paging and protection mechanism allows such access, what happens will depend solely on what the device does when that address is accessed. A well known memory mapped device in PC's is the frame buffer. If the graphics card is configured to display color text mode with its default base address, any 8-bit write operation performed to even physical addresses between B8000 and B8F9F will cause the character whose ASCII code is the value written to show on screen, in a location that depends on the address chosen.
I/O mapped devices don't collide with memory, as they use a different address space, with different instructions to read and write values to addresses (ports). These devices cannot be addressed using machine code instructions that targets memory.
Memory mapped devices share the address space with RAM. Depending on the system configuration, memory mapped registers can be present all the time, using some addresses, and thus preventing the system to use them for RAM, or memory mapped devices may "shadow" memory at times, so allowing the program to change the I/O configuration to choose if a certain memory region will be decoded as in use by a device, or used by regular RAM (for example, what the Commodore 64 does to let the user have 64KB of RAM but allowing it to access device registers some times, by temporarily disabling access to the RAM that is "behind" the device that is currently being accessed at that very same address).
At the hardware level, what is happening is that there are two different signals: MREQ and IOREQ. The first one is asserted on every memory instruction, the second one, on every I/O insruction. So this code...
MOV DX,1234h
MOV AL,[DX] ;reads memory address 1234h (memory address space)
IN AL,DX ;reads I/O port 1234h (I/O address space)
Both put the value 1234h on the CPU address bus, and both assert the RD pin to indicate a read, but the first one will assert MREQ to indicate that the address belong to the memory address space, and the second one will assert IOREQ to indicate that it belongs to the I/O address space. The I/O device at port 1234h is connected to the system bus so that it is enabled only if the address is 1234h, RD is asserted and IOREQ is asserted. This way, it cannot collide with a RAM chip addressed at 1234h, because the later will be enabled only if MREQ is asserted (the CPU ensures that IOREQ and MREQ cannot be asserted at the same time).
These two address spaces don't exist in all CPU's. In fact, the majority of them don't have this, and therefore, they have to memory map all its devices.
Can you have virtual memory without a secondary storage ( hard disk ) ?
In a pure sense, yes you can: Virtual Memory
What makes memory virtual is the fact that all memory accesses by the process are intercepted at the CPU level and a hardware Memory Management Unit is used to manage a mapping of the process address space onto the physical memory, no matter where that storage is presently really located.
You can have computing systems with virtual memory that have no backing storage (which is what people call it when you can move pages of memory out to disk for later retrieval).
In this case, the virtual memory system is used to allow the OS to intercept and prevent illegal memory references, but not in order to increase the working-set size of processes beyond the amount of installed physical memory.
First, malloc a buffer from userspace and fill the buffer with all 'A'
Then, pass the pointer of the buffer to kernel ,using netlink socket,
Finally, I can read and write the buffer, using the raw pointer directly passed from userspace.
Why ?
Why directly access to user space memory from kernel is allowed?
Linux Device Driver, Third Edition, Page 415, said that The kernel cannot directly manipulate memory that is not mapped into the kernel’s address space.
The point is that accessing user addresses directly in kernel only sometimes work.
As long as you try to access the user address in the context of the same process that allocated it and that the process has already faulted it in and you are using a kernel with a 3:1 memory mapping (as opposed to 4:4 mapping that is sometimes used) and that the kernel did not swap out the page the allocation is in - the access will work.
The problem is that all these conditions are not always true and they can change even from run time of the program to another. Therefore the kernel driver writers needs to not count on being able to access user addresses.
The worst thing that can happen is for you to assume it works, have it always work in the lab, and have it crash at a customer site every so often. This is the reason for the book statement.
In this book - words 'The kernel cannot directly manipulate memory that is not mapped into the kernel’s address space' is about physical memory. Other words - kernel has only 800-900 MB (on x86) that can be mapped to physical memory at one time. To access whole physical memory kernel need constantly remap this region.
Netlink not dealing with physical memory at all - it is designed for bidirectional communication between userspace<->userspace or userspace<->kernelspace.