When attempting to access the physical memory through a known address, e.g. memory mapped PCI configuration space, once the address is calculated, why is it not possible to use pread/pwrite to access that location as a simple offset from memory address zero? Why do we have to mmap it and use pointer arithmetic to access?
A similar access to /dev/cpu/*/msr is possible using pread/pwrite
Related
I have a basic understanding of Memory-Mapped I/O (MMIO). Below is copied from the Wikipedia page:
Memory-mapped I/O uses the same address space to address both memory and I/O devices. The memory and registers of the I/O devices are mapped to (associated with) address values. So a memory address may refer to either a portion of physical RAM, or instead to memory and registers of the I/O device. Thus, the CPU instructions used to access the memory can also be used for accessing devices. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. To accommodate the I/O devices, areas of the addresses used by the CPU must be reserved for I/O and must not be available for normal physical memory.
My question is, suppose the address of the MMIO area is addr, if we issue a write to addr, will it be written to addr in main memory as well? Or only written to the memory within the I/O device?
My thought is as follows:
Since it is stated that "a memory address may refer to either a portion of physical RAM, or instead to memory and registers of the I/O device", the data will not be written to RAM, i.e. the data never goes to RAM, instead, it would be snooped by the memory controller of the I/O device, and written to the device memory.
If we issue a read from address addr, such read instruction will be captured by memory controller of I/O device and the data will be transferred from the device memory to the destination register in CPU. If we want the data to be in memory, then we need to issue another write, to another address addr_new (no overlap with addr).
May I know if my understanding is correct?
In memory-mapped I/O, there is no address that maps to both RAM and I/O registers -- it's one or the other.
It's really about the processor instruction set.
x86 processors have special instructions for reading and writing IO registers.
Memory-mapped I/O is the alternative. You use the same instructions to use memory or I/O, and only the address you use determines which is which.
The simplest old-time implementation of memory-mapped I/O could just use one of the address lines to select either memory or I/O ports, requiring both of those to implement similar protocols. That's not really practical today, though, because RAM is now complicated.
In I/O-mapped I/O (as opposed to memory-mapped I/O), a certain set of addresses are fixed for I/O devices. Are these addresses a part of the RAM, and thus that much physical address space is unusable ? Does it correspond to the 'Hardware Reserved' memory in the attached picture ?
If yes, how is it decided which bits of an address are to be used for addressing I/O devices (because the I/O address space would be much smaller than the actual memory. I have read this helps to reduce the number of pins/bits used by the decoding circuit) ?
What would happen if one tries to access, in assembly, any address that belongs to this address space ?
I/O mapped I/O doesn't use the same address space as memory mapped I/O. The later does use part of the address space normally used by RAM and therefore, "steals" addresses that no longer belong to RAM memory.
The set of address ranges that are used by different memory mapped I/O is what you see as "Hardware reserved".
About how is it decided how to address memory mapped devices, this is largely covered by the PnP subsystem, either in BIOS, or in the SO. Memory-mapped devices, with few exceptions, are PnP devices, so that means that for each of them, its base address can be changed (for PCI devices, the base address of the memory mapped registers, if any, is contained in a BAR -Base Address Register-, which is part of the PCI configuration space).
Saving pins for decoding devices (lazy decoding) is (was) done on early 8-bit systems, to save decoders and reduce costs. It haven't anything to do with memory mapped / IO mapped devices. Lazy decoding may be used in both situations. For example, a designer could decide that the 16-bit address range C000-FFFF is going to be reserved for memory mapped devices. To decide whether to enable some memory chip, or some device, it's enough to look at the value of A15 and A14. If both address lines are high, then the block addressed is C000-FFFF and that means that memory chip enables will be deasserted. On the other hand, a designer could decide that the 8 bit IO port 254 is going to be assigned to a device, and to decode this address, it only looks at the state of A0, needing no decoders to find out the port address (this is for example, what the ZX Spectrum does for addressing the ULA)
If a program (written in whatever language that allows you to access and write to arbitrary memory locations) tries to access a memory address reserved for a device, and assuming that the paging and protection mechanism allows such access, what happens will depend solely on what the device does when that address is accessed. A well known memory mapped device in PC's is the frame buffer. If the graphics card is configured to display color text mode with its default base address, any 8-bit write operation performed to even physical addresses between B8000 and B8F9F will cause the character whose ASCII code is the value written to show on screen, in a location that depends on the address chosen.
I/O mapped devices don't collide with memory, as they use a different address space, with different instructions to read and write values to addresses (ports). These devices cannot be addressed using machine code instructions that targets memory.
Memory mapped devices share the address space with RAM. Depending on the system configuration, memory mapped registers can be present all the time, using some addresses, and thus preventing the system to use them for RAM, or memory mapped devices may "shadow" memory at times, so allowing the program to change the I/O configuration to choose if a certain memory region will be decoded as in use by a device, or used by regular RAM (for example, what the Commodore 64 does to let the user have 64KB of RAM but allowing it to access device registers some times, by temporarily disabling access to the RAM that is "behind" the device that is currently being accessed at that very same address).
At the hardware level, what is happening is that there are two different signals: MREQ and IOREQ. The first one is asserted on every memory instruction, the second one, on every I/O insruction. So this code...
MOV DX,1234h
MOV AL,[DX] ;reads memory address 1234h (memory address space)
IN AL,DX ;reads I/O port 1234h (I/O address space)
Both put the value 1234h on the CPU address bus, and both assert the RD pin to indicate a read, but the first one will assert MREQ to indicate that the address belong to the memory address space, and the second one will assert IOREQ to indicate that it belongs to the I/O address space. The I/O device at port 1234h is connected to the system bus so that it is enabled only if the address is 1234h, RD is asserted and IOREQ is asserted. This way, it cannot collide with a RAM chip addressed at 1234h, because the later will be enabled only if MREQ is asserted (the CPU ensures that IOREQ and MREQ cannot be asserted at the same time).
These two address spaces don't exist in all CPU's. In fact, the majority of them don't have this, and therefore, they have to memory map all its devices.
In CUDA, we can achieve kernel managed data transfer from host memory to device shared memory by device side pointer of host memory. Like this:
int *a,*b,*c; // host pointers
int *dev_a, *dev_b, *dev_c; // device pointers to host memory
…
cudaHostGetDevicePointer(&dev_a, a, 0); // mem. copy to device not need now, but ptrs needed instead
cudaHostGetDevicePointer(&dev_b, b, 0);
cudaHostGetDevicePointer(&dev_c ,c, 0);
…
//kernel launch
add<<<B,T>>>(dev_a,dev_b,dev_c);
// dev_a, dev_b, dev_c are passed into kernel for kernel accessing host memory directly.
In the above example, kernel code can access host memory via dev_a, dev_b and dev_c. Kernel can utilize these pointers to move data from host to shared memory directly without relaying them by global memory.
But seems that it is an mission impossible in OpenCL? (local memory in OpenCL is the counterpart of shared memory in CUDA)
You can find exactly identical API in OpenCL.
How it works on CUDA:
According to this presentation and the official documentation.
The money quote about cudaHostGetDevicePointer :
Passes back device pointer of mapped host memory allocated by
cudaHostAlloc or registered by cudaHostRegister.
CUDA cudaHostAlloc with cudaHostGetDevicePointer works exactly like CL_MEM_ALLOC_HOST_PTR with MapBuffer works in OpenCL. Basically if it's a discrete GPU the results are cached in the device and if it's a discrete GPU with shared memory with the host it will use the memory directly. So there is no actual 'zero copy' operation with discrete GPU in CUDA.
The function cudaHostGetDevicePointer does not take raw malloced pointers in, just like what is the limitation in OpenCL. From the API users point of view those two are exactly identical approaches allowing the implementation to do pretty much identical optimizations.
With discrete GPU the pointer you get points to an area where the GPU can directly transfer stuff in via DMA. Otherwise the driver would take your pointer, copy the data to the DMA area and then initiate the transfer.
However in OpenCL2.0 that is explicitly possible, depending on the capabilities of your devices. With the finest granularity sharing you can use randomly malloced host pointers and even use atomics with the host, so you could even dynamically control the kernel from the host while it is running.
http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
See page 162 for the shared virtual memory spec. Do note that when you write kernels even these are still just __global pointers from the kernel point of view.
First, malloc a buffer from userspace and fill the buffer with all 'A'
Then, pass the pointer of the buffer to kernel ,using netlink socket,
Finally, I can read and write the buffer, using the raw pointer directly passed from userspace.
Why ?
Why directly access to user space memory from kernel is allowed?
Linux Device Driver, Third Edition, Page 415, said that The kernel cannot directly manipulate memory that is not mapped into the kernel’s address space.
The point is that accessing user addresses directly in kernel only sometimes work.
As long as you try to access the user address in the context of the same process that allocated it and that the process has already faulted it in and you are using a kernel with a 3:1 memory mapping (as opposed to 4:4 mapping that is sometimes used) and that the kernel did not swap out the page the allocation is in - the access will work.
The problem is that all these conditions are not always true and they can change even from run time of the program to another. Therefore the kernel driver writers needs to not count on being able to access user addresses.
The worst thing that can happen is for you to assume it works, have it always work in the lab, and have it crash at a customer site every so often. This is the reason for the book statement.
In this book - words 'The kernel cannot directly manipulate memory that is not mapped into the kernel’s address space' is about physical memory. Other words - kernel has only 800-900 MB (on x86) that can be mapped to physical memory at one time. To access whole physical memory kernel need constantly remap this region.
Netlink not dealing with physical memory at all - it is designed for bidirectional communication between userspace<->userspace or userspace<->kernelspace.
Does a hex memory address represent the position in the memory as a whole?
e.g. 4 gb ram and there is a memory adress. Does it point to the position(in bytes) that the data starts at? e.g. at 2.1 gb.
How do memory address work on hard disk before the data is loaded into memory?
Is there ever a case where parts of the data are fetched from memory and other data is fetched from disk? How are the locations differentiated?
Thanks
A hex memory address (like what you would see if you printed out the value of a pointer) points to a location in virtual memory.
On a 32 bit system, each process has a full 4GB of virtual memory. This virtual memory is managed by the CPU and the operating system. When you access a location in virtual memory, the CPU and operating system determine where in the systems actual physical memory that location is mapped, and the data is retrieved from there.
The operating system may also take things out of physical memory and swap them to disk, in order to make room in physical memory for other things. Then, if you try to access the virtual memory location of something that was swapped from physical memory to disk, a "page fault" is generated which causes the OS to re-load the page from disk into physical memory.
Modern operating systems have virtual memory.
This means that the address which uses your program to access some byte in memory, is purely "virtual", non-existent. The operating system maps it via special hardware controllers to real memory locations which are completely different and there may be no physical memory location at all for a given address. For example, you may mmap() a file into (virtual) memory, and accessing a byte at the virtual addresses would mean accessing a byte of file. Similarly, if some memory page wasn't used for a long time, the OS may swap off the page from physical RAM to disk. In this case virtual memory wouldn't point to physical memory locations too.
In most cases - yes. But some processors uses 2 values to calculate real address. For example Intel 8086.
Hardisk is only storage, that has its own system to store information. So before any CPU operation is executed, data has to be loaded into RAM.