I need to know how the computer handles Local Area Network Input and Output Processor interruptions. I have been looking for a while but can't seem to find anything. Came across some RJ-45 port information but not much of what I specifically need. If someone has some information on how the CPU interrupts a process to call the pointer and therefore the driver, plus how this process works it would be much appreciated.
Thanks
Typically, the driver for the LAN card configured the card to issue an interrupt when the receive buffer gets close to full or the send buffer gets close to empty. Typically, these buffers live in system memory and the network hardware uses DMA to pull transmitted packets and store received packets in system memory.
When the interrupt triggers, some process on some core is typically interrupted and the network code begins executing. If it's a send interrupt and there are more packets to send, more packets are attached to the send buffer. If it's a receive interrupt, typically more packet buffers are attached to the receive buffer. The driver typically arranges for a "bottom half" to be dispatched to handle whatever other work needs to be done (such as processing the received packets) and the the interrupts completes.
There's a ton of possible variation based upon many factors, but this is the basic idea.
Related
I have a simple configuration of two XBees: one coordinator and one end device/router. The coordinator continuously sends data to the end device at 9600 bps without expecting any sort of response from it (I cannot increase the bps because of standardisation issues in my application). I managed to make it send data, but it arrives after a random amount of seconds at the end device, which I do not want - it must ideally be instantaneous. Which XBee parameters do I need to modify in order to make the transmission much faster?
If you're using the XBee module in transparent mode (ATAP=0), then you want to look at ATRO, the packetization timeout value. This is the amount of idle time on the serial interface that the XBee waits for before considering a packet complete and ready to send.
If this is a sleeping end device, you may experience delays if it's sleeping and the coordinator is waiting for it to wake up before sending. Try configuring it as a router and see if that helps with the delay.
Note that the serial port speed (ATBD) of the coordinator and end device do not need to match. The XBee module buffers packets and always sends them over the air at 250kbps. When possible, you should run the serial interface at at least 115,200bps to minimize the latency and maximize the throughput of the wireless interface.
Finally, how are you handling addressing of your packets? Using 64-bit or 16-bit addresses? If 16-bit addresses, there could be discovery overhead, but that should go away after the first packet gets through.
And if you're not using modules with chip antennas, do you have antennas attached?
The problem was that I did not configure them in transparent mode, they were working in another mode which was laggy.
So i'm curious, is there any info on a router's config page or manual, or is there any way to measure it's buffer? (I'm talking about the "memory" it has to keep packets until the router can transmit them)
You should be able to measure your outgoing buffer by sending lots of (UDP) data out, as quickly as possible, and see how much goes out before data starts getting dropped. You will need to send it really fast, and have something at the other end to capture it as it comes in; your send speed has to be a lot faster than your receive speed.
Your buffer will be a little smaller than that, since in the time it takes you to send the data, at least the first packet will have left the router.
Note that you will be measuring the smallest buffer between you and the remote end; if you have, for example, a firewall and a router in two separate devices, you don't really know which buffer you are testing.
Your incoming buffer is a lot more difficult to test, since you can't fill it fast enough (from the internet side) not to be deliverable quickly enough.
Say I have an InfiniBand or similar PCIe device and a fast Intel Core CPU and I want to send e.g. 8 bytes of user data over the IB link. Say also that there is no device driver or other kernel: we're keeping this simple and just writing directly to the hardware. Finally, say that the IB hardware has previously been configured properly for the context, so it's just waiting for something to do.
Q: How many CPU cycles will it take the local CPU to tell the hardware where the data is and that it should start sending it?
More info: I want to get an estimate of the cost of using PCIe communication services compared to CPU-local services (e.g. using a coprocessor). What I am expecting is that there will be a number of writes to registers on the PCIe bus, for example setting up an address and length of a packet, and possibly some reads and writes of status and/or control registers. I expect each of these will take several hundred CPU cycles each, so I would expect the overall setup would take order of 1000 to 2000 CPU cycles. Would I be right?
I am just looking for a ballpark answer...
Your ballpark number is correct.
If you want to send an 8 byte payload using an RDMA write, first you will write the request descriptor to the NIC using Programmed IO, and then the NIC will fetch the payload using a PCIe DMA read. I'd expect both the PIO and the DMA read to take between 200-500 nanoseconds, although the PIO should be faster.
You can get rid of the DMA read and save some latency by putting the payload inside the request descriptor.
I've been going through many technical documents on packet capture/processing and host stacks trying to understand it all, there's a few areas where I'm troubled, hopefully someone can help.
Assuming you're running tcpdump:
After a packet gets copied from a NIC's ring buffer (physical NIC memory right?)
does it immediately get stored into an mbuf? and then BPF gets a copy of the packet from the mbuf , which is then stored in the BPF buffer, so there are two copies in memory at the same time? I'm trying to understand the exact process.
Or is it more like: the packet gets copied from the NIC to both the mbuf (for host stack processing) and to the BPF pseudo-simultaneously?
Once a packet goes through host stack processing by ip/tcp input functions taking the mbuf as the location(pointing to an mbuf) i.e. packets are stored in mbufs, if the packet is not addressed for the system, say received by monitoring traffic via hub or SPAN/Monitor port, the packet is discarded and never makes its way up the host stack.
I seem to have come across diagrams which show the NIC ring buffer(RX/TX) in a kernel "box"/separating it from userspace, which makes me second guess whether a ring buffer is actually allocated system memory different from the physical memory on a NIC.
Assuming that a ring buffer refers to the NIC's physical memory, is it correct that the device driver determines the size of the NIC ring buffer, setting physical limitations aside? e.g. can I shrink the buffer by modifying the driver?
Thanks!
ETHER_BPF_MTAP macro calls bpf_mtap(), which excepts packet in mbuf format, and bpf copies data from this mbuf to internal buffer.
But mbufs can use external storage, so there can be or not be copying from NIC ring buffer to mbuf. Mbufs can actually contain packet data or serve just as a header with reference to receiving buffer.
Also, current NICs use their little (128/96/... Kb) onboard memory for FIFO only and immediately transfer all data to ring buffers in main memory. So you really can adjust buffer size in device driver.
We have a simple architecture :
Main chip (arm9 based)
PIC controller
The PIC communicates to ARM via an interrupt based I2C communication protocol for transfer of data. Inside the interrupt we signal a task which reads the data from the I2C layer (bus).
In case the data is limited we usually won't have much problem to read the data and send it to upper layer. In case this data is very huge the interrupt will be tied for a long time.
The first question is:
Am I right?
In case I am right, how to avoid the same? ...or can we a different solution?
Have some kind of 'worker thread', sometimes called a kernel thread, whose job it is to pull data out of the I2C interface and buffer it, hand it off to other parts of your system, etc. Use the interrupt routine only to un-block the kernel thread. That way, if there are other duties the system has to perform, it is not prevented from doing so by the interrupt handler, and you still get your data in from your device in a timely manner.
You shouldn't read a complete packet in one execution of the interrupt routine. Depending on the hardware support you should handle one sample/bit/byte, store data in a buffer and only signal the task when the packet is complete.