Does it mean that buffers of I/O devices are assigned addresses in the total memory space just like the bytes of the main memory are assigned??
That's basically it. You have I/O devices which monitor the address lines (and data lines, and control lines) of your processor to "capture" certain addresses and act on them.
For example, you may have a memory mapped keyboard device (using address 0xff00) that basically collects the keystrokes from the physical keyboard and buffers them ready to be received by the processor.
So, when it see address 0xff00 on the address lines and a read signal (such as a memio line and the r/not-w line both going high (indicating a memory read is desired), it will inject the code for the keypress onto the data lines and signal the processor to read it.
If no keypresses are buffered, it may just give back a code of 0 (it depends entirely on the protocol).
Pretty much. Not that the actual peripheral hardware buffers must be mapped but the OS / Mapper will take care of it somehow.
Related
I have a basic understanding of Memory-Mapped I/O (MMIO). Below is copied from the Wikipedia page:
Memory-mapped I/O uses the same address space to address both memory and I/O devices. The memory and registers of the I/O devices are mapped to (associated with) address values. So a memory address may refer to either a portion of physical RAM, or instead to memory and registers of the I/O device. Thus, the CPU instructions used to access the memory can also be used for accessing devices. Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register. To accommodate the I/O devices, areas of the addresses used by the CPU must be reserved for I/O and must not be available for normal physical memory.
My question is, suppose the address of the MMIO area is addr, if we issue a write to addr, will it be written to addr in main memory as well? Or only written to the memory within the I/O device?
My thought is as follows:
Since it is stated that "a memory address may refer to either a portion of physical RAM, or instead to memory and registers of the I/O device", the data will not be written to RAM, i.e. the data never goes to RAM, instead, it would be snooped by the memory controller of the I/O device, and written to the device memory.
If we issue a read from address addr, such read instruction will be captured by memory controller of I/O device and the data will be transferred from the device memory to the destination register in CPU. If we want the data to be in memory, then we need to issue another write, to another address addr_new (no overlap with addr).
May I know if my understanding is correct?
In memory-mapped I/O, there is no address that maps to both RAM and I/O registers -- it's one or the other.
It's really about the processor instruction set.
x86 processors have special instructions for reading and writing IO registers.
Memory-mapped I/O is the alternative. You use the same instructions to use memory or I/O, and only the address you use determines which is which.
The simplest old-time implementation of memory-mapped I/O could just use one of the address lines to select either memory or I/O ports, requiring both of those to implement similar protocols. That's not really practical today, though, because RAM is now complicated.
Say I have an InfiniBand or similar PCIe device and a fast Intel Core CPU and I want to send e.g. 8 bytes of user data over the IB link. Say also that there is no device driver or other kernel: we're keeping this simple and just writing directly to the hardware. Finally, say that the IB hardware has previously been configured properly for the context, so it's just waiting for something to do.
Q: How many CPU cycles will it take the local CPU to tell the hardware where the data is and that it should start sending it?
More info: I want to get an estimate of the cost of using PCIe communication services compared to CPU-local services (e.g. using a coprocessor). What I am expecting is that there will be a number of writes to registers on the PCIe bus, for example setting up an address and length of a packet, and possibly some reads and writes of status and/or control registers. I expect each of these will take several hundred CPU cycles each, so I would expect the overall setup would take order of 1000 to 2000 CPU cycles. Would I be right?
I am just looking for a ballpark answer...
Your ballpark number is correct.
If you want to send an 8 byte payload using an RDMA write, first you will write the request descriptor to the NIC using Programmed IO, and then the NIC will fetch the payload using a PCIe DMA read. I'd expect both the PIO and the DMA read to take between 200-500 nanoseconds, although the PIO should be faster.
You can get rid of the DMA read and save some latency by putting the payload inside the request descriptor.
In I/O-mapped I/O (as opposed to memory-mapped I/O), a certain set of addresses are fixed for I/O devices. Are these addresses a part of the RAM, and thus that much physical address space is unusable ? Does it correspond to the 'Hardware Reserved' memory in the attached picture ?
If yes, how is it decided which bits of an address are to be used for addressing I/O devices (because the I/O address space would be much smaller than the actual memory. I have read this helps to reduce the number of pins/bits used by the decoding circuit) ?
What would happen if one tries to access, in assembly, any address that belongs to this address space ?
I/O mapped I/O doesn't use the same address space as memory mapped I/O. The later does use part of the address space normally used by RAM and therefore, "steals" addresses that no longer belong to RAM memory.
The set of address ranges that are used by different memory mapped I/O is what you see as "Hardware reserved".
About how is it decided how to address memory mapped devices, this is largely covered by the PnP subsystem, either in BIOS, or in the SO. Memory-mapped devices, with few exceptions, are PnP devices, so that means that for each of them, its base address can be changed (for PCI devices, the base address of the memory mapped registers, if any, is contained in a BAR -Base Address Register-, which is part of the PCI configuration space).
Saving pins for decoding devices (lazy decoding) is (was) done on early 8-bit systems, to save decoders and reduce costs. It haven't anything to do with memory mapped / IO mapped devices. Lazy decoding may be used in both situations. For example, a designer could decide that the 16-bit address range C000-FFFF is going to be reserved for memory mapped devices. To decide whether to enable some memory chip, or some device, it's enough to look at the value of A15 and A14. If both address lines are high, then the block addressed is C000-FFFF and that means that memory chip enables will be deasserted. On the other hand, a designer could decide that the 8 bit IO port 254 is going to be assigned to a device, and to decode this address, it only looks at the state of A0, needing no decoders to find out the port address (this is for example, what the ZX Spectrum does for addressing the ULA)
If a program (written in whatever language that allows you to access and write to arbitrary memory locations) tries to access a memory address reserved for a device, and assuming that the paging and protection mechanism allows such access, what happens will depend solely on what the device does when that address is accessed. A well known memory mapped device in PC's is the frame buffer. If the graphics card is configured to display color text mode with its default base address, any 8-bit write operation performed to even physical addresses between B8000 and B8F9F will cause the character whose ASCII code is the value written to show on screen, in a location that depends on the address chosen.
I/O mapped devices don't collide with memory, as they use a different address space, with different instructions to read and write values to addresses (ports). These devices cannot be addressed using machine code instructions that targets memory.
Memory mapped devices share the address space with RAM. Depending on the system configuration, memory mapped registers can be present all the time, using some addresses, and thus preventing the system to use them for RAM, or memory mapped devices may "shadow" memory at times, so allowing the program to change the I/O configuration to choose if a certain memory region will be decoded as in use by a device, or used by regular RAM (for example, what the Commodore 64 does to let the user have 64KB of RAM but allowing it to access device registers some times, by temporarily disabling access to the RAM that is "behind" the device that is currently being accessed at that very same address).
At the hardware level, what is happening is that there are two different signals: MREQ and IOREQ. The first one is asserted on every memory instruction, the second one, on every I/O insruction. So this code...
MOV DX,1234h
MOV AL,[DX] ;reads memory address 1234h (memory address space)
IN AL,DX ;reads I/O port 1234h (I/O address space)
Both put the value 1234h on the CPU address bus, and both assert the RD pin to indicate a read, but the first one will assert MREQ to indicate that the address belong to the memory address space, and the second one will assert IOREQ to indicate that it belongs to the I/O address space. The I/O device at port 1234h is connected to the system bus so that it is enabled only if the address is 1234h, RD is asserted and IOREQ is asserted. This way, it cannot collide with a RAM chip addressed at 1234h, because the later will be enabled only if MREQ is asserted (the CPU ensures that IOREQ and MREQ cannot be asserted at the same time).
These two address spaces don't exist in all CPU's. In fact, the majority of them don't have this, and therefore, they have to memory map all its devices.
I have a simple theoretical question. The DMAs I know usually have half full or full interrupts. If I want to use a DMA for data transfer from a peripheral, how can I ensure I got all the data since data may not be at the dma transfer boundary.
For example, serial port might send 5 bytes, I would get and interrupt for the first 4 combined together (assuming dma size is 4), but nothing for the 5th one. What is the method people usually use to solve such a problem.
My best approach is this:
Setup a DMA memory region. lets say it's address 0x2 to 0x1000
The serial device writes bytes in this region, as a circular buffer
Each time the serial device writes, it updates it's "write pointer" and saves in bytes 0x0 and 0x1
The PC Host can dma the write pointer, and compare with it's own read pointer. The read pointer can be kept on the pc host and not deal with DMA at all. Then the PC knows how much memory to read, and it also knows if there has been an underflow or overflow.
This should be a good starting point for what you want.
I've been going through many technical documents on packet capture/processing and host stacks trying to understand it all, there's a few areas where I'm troubled, hopefully someone can help.
Assuming you're running tcpdump:
After a packet gets copied from a NIC's ring buffer (physical NIC memory right?)
does it immediately get stored into an mbuf? and then BPF gets a copy of the packet from the mbuf , which is then stored in the BPF buffer, so there are two copies in memory at the same time? I'm trying to understand the exact process.
Or is it more like: the packet gets copied from the NIC to both the mbuf (for host stack processing) and to the BPF pseudo-simultaneously?
Once a packet goes through host stack processing by ip/tcp input functions taking the mbuf as the location(pointing to an mbuf) i.e. packets are stored in mbufs, if the packet is not addressed for the system, say received by monitoring traffic via hub or SPAN/Monitor port, the packet is discarded and never makes its way up the host stack.
I seem to have come across diagrams which show the NIC ring buffer(RX/TX) in a kernel "box"/separating it from userspace, which makes me second guess whether a ring buffer is actually allocated system memory different from the physical memory on a NIC.
Assuming that a ring buffer refers to the NIC's physical memory, is it correct that the device driver determines the size of the NIC ring buffer, setting physical limitations aside? e.g. can I shrink the buffer by modifying the driver?
Thanks!
ETHER_BPF_MTAP macro calls bpf_mtap(), which excepts packet in mbuf format, and bpf copies data from this mbuf to internal buffer.
But mbufs can use external storage, so there can be or not be copying from NIC ring buffer to mbuf. Mbufs can actually contain packet data or serve just as a header with reference to receiving buffer.
Also, current NICs use their little (128/96/... Kb) onboard memory for FIFO only and immediately transfer all data to ring buffers in main memory. So you really can adjust buffer size in device driver.