Regarding interrupt based communication - communication

We have a simple architecture :
Main chip (arm9 based)
PIC controller
The PIC communicates to ARM via an interrupt based I2C communication protocol for transfer of data. Inside the interrupt we signal a task which reads the data from the I2C layer (bus).
In case the data is limited we usually won't have much problem to read the data and send it to upper layer. In case this data is very huge the interrupt will be tied for a long time.
The first question is:
Am I right?
In case I am right, how to avoid the same? ...or can we a different solution?

Have some kind of 'worker thread', sometimes called a kernel thread, whose job it is to pull data out of the I2C interface and buffer it, hand it off to other parts of your system, etc. Use the interrupt routine only to un-block the kernel thread. That way, if there are other duties the system has to perform, it is not prevented from doing so by the interrupt handler, and you still get your data in from your device in a timely manner.

You shouldn't read a complete packet in one execution of the interrupt routine. Depending on the hardware support you should handle one sample/bit/byte, store data in a buffer and only signal the task when the packet is complete.

Related

The reasons why rps procedure use spinlock with local_irq_disable

These days, I'm studying kernel internal network code, especially RPS code. You know, there are a lot of functions about that. But I am focusing on some functions about SMP queue processing such as enqueue_to_backlog and process_backlog.
I wonder about synchronization btw two cores(or single core) by using two functions -enqueue_to_backlog and process_backlog-.
In that functions, A core(A) holds a spin_lock of the other core(B) for queueing packets into input_pkt_queue and scheduling napi of the core(B). And A Core(B) also holds a spin_lock for splicing input_pkt_queue to process_queue of the core(B) and removing napi schedule by itself. I know that spin_lock should be held to prevent two core from accessing the same queue each other during processing queue.
But I can't understand why spin_lock is called with local_irq_disable(or local_irq_save). I think that there is no accessing the queues or rps_lock of the core(B) by Interrupts Context(TH), when interrupts(TH) preempt current context(softirq, BH). - Of course, napi struct can be accessed for scheduling napi by TH, but it holds disabling irq until queueing packet- So I wonder about why spin_lock is called with irq disable.
I think it is impossible to preempt current context(napi, softirq) by other BH such as tasklet. Is it true? And I want to know whether local_irq_disable disable all cores irq or just current core's irq literally? Actually, I read a book about kernel development, but I think i don't understand preemption enough.
Would explain the reasons why rps procedure use spin_lock with local_irq_disable?
Disabling interrupts affects the current core (only). When disabled, therefore, no other code on the same core will be able to interfere with an update to a data structure. The point of spinlocks is to extend the "lock-out" to other cores (although it's cooperative, not hardware-enforced).
It's dangerous/irresponsible to take a spin lock in the kernel without disabling interrupts because, when an interrupt then occurs, the current code will be suspended, and now you are preventing other cores from making progress while some unrelated interrupt handler is running (even if another user process or tasklet on the original core won't be able to preempt). Other cores might be in an interrupt or BH context themselves and now you're delaying the entire system. Spin locks are supposed to be held for very brief periods to do critical updates to shared data structures.
It's also a good way to generate deadlocks. Consider if the scenario above were replicated in another subsystem (or possibly another device in the same subsystem, but I'll describe the former).
Here, core A takes a spinlock in subsystem 1 without disabling interrupts. At the same time, core B takes a spinlock in subsystem 2 also without disabling interrupts. Now what happens if an interrupt related to subsystem 2 happens on core A, and while executing the subsystem 2 interrupt handler, core A needs to update a structure protected by the spinlock held in core B. But at about the same time, a subsystem 1 interrupt happens on core B, which needs to update a data structure in that subsystem. Now both cores are busy-waiting for a spinlock held by the other core, and the entire system is frozen until you do a hard reset.

LAN Driver Interruptions

I need to know how the computer handles Local Area Network Input and Output Processor interruptions. I have been looking for a while but can't seem to find anything. Came across some RJ-45 port information but not much of what I specifically need. If someone has some information on how the CPU interrupts a process to call the pointer and therefore the driver, plus how this process works it would be much appreciated.
Thanks
Typically, the driver for the LAN card configured the card to issue an interrupt when the receive buffer gets close to full or the send buffer gets close to empty. Typically, these buffers live in system memory and the network hardware uses DMA to pull transmitted packets and store received packets in system memory.
When the interrupt triggers, some process on some core is typically interrupted and the network code begins executing. If it's a send interrupt and there are more packets to send, more packets are attached to the send buffer. If it's a receive interrupt, typically more packet buffers are attached to the receive buffer. The driver typically arranges for a "bottom half" to be dispatched to handle whatever other work needs to be done (such as processing the received packets) and the the interrupts completes.
There's a ton of possible variation based upon many factors, but this is the basic idea.

Use dma transfert with Cyclone V Avalon-MM for PCIe

Is it possible to do DMA transferts with the IP core «Cyclone V Avalon-MM for PCIe» provided by altera in Qsys (quartus 14.0) ?
Altera provide an ip-core named «Cyclone V Avalon-MM DMA for PCIe» to do dma transfert. But this ip-core does not support PCIe Gen1 with 1x lane.
The demo (ep_g1x1) design for «Cyclone V Avalon-MM for PCIe» include a DMA block that is connected on Avalon-mm TX bus of PCIe ip-core.
Then I'm wondering if it's possible to write data from this DMA block to the root-complex (host) ? Because I can't find how to do that.
From my brief skim of the material, it should be possible to issue DMA reads or writes from an RC to your Cyclone V (EP) using the IP core you're interested in.
I've done DMA reads and writes on a Stratix V, however it was in a non-Qsys design just using the PCIe core HIP block (custom TLP encoding and decoding logic). This block just seems to be a wrapper around their PCIe HIP block that also handles the transaction layer for you.
The first step will be to get your RC to issue PCIe DMA read or writes requests. In the case of a read request, you'll want to send a memory read complete with data (CplD) request with a length greater than 1 DWORD. I would suggest dedicating an entire BAR to map the memory space you want to DMA from on the FPGA to keep your address targeting simple.
On the FPGA side, I would suggest using Signal Tap and probing the Rxm* interface signals on the core. This way you can see the exact timing of the DMA read request that comes out of the core. My guess is that the RXMRead_<n>_o signal will go high indicating the start of the request. At which point you'll have to decode and pass the RxmAddress_<n>_o and RXMBurstCount_<n>_o to some glue logic that will fetch the requested data from the FPGA's memory. Once you're ready to send back the data, assert the RXMReadDataValid_<n>_i for each valid word being sent.
I'm guessing that the «Cyclone V Avalon-MM DMA for PCIe» core that you referenced takes care of that 'glue' logic I mentioned for you, and allows you to connect straight to a SDRAM controller on your Qsys bus. Altera doesn't usually encrypt their megafuction code, so if your system verilog is strong, it might be worth digging through their generated files and seeing if you can reuse that bit of code in some way.
As for core settings, the only thing that I saw that you need to look out for is making sure the Single DW Completer setting is turned OFF. Otherwise the core will abort any requests it receives with a length greater than 1 DWORD.
Hope that helped somewhat.
I finally managed to make DMA request with the «Cyclone V Avalon-MM for PCIe» altera core-ip. Then yes it's possible.
On my Linux system, rootcomplex (RC) is included under i.MX6 with Linux operating system. Then most of the tricks are on the Linux side in fact.
Under the Linux driver a PAGE must be requested with dma_alloc_coherent() call and the address of this page must be written on the CRA register named ADDR_MAP_LO0 and ADDR_MAP_HI0.
On my system, memory pages are 4k sized, then I had to configure the «address translation settings» of the PCIe hard ip with pages of 4k to be coherent.
Once that done, I simply connected the DMA controller provided by Qsys on the TX avalon-MM slave port of PCIe IP.
Telling the DMA to write data on this port will automatically generate TLPs from the FPGA to write on i.MX6 ram.

How long does it take to set up an I/O controller on PCIe bus

Say I have an InfiniBand or similar PCIe device and a fast Intel Core CPU and I want to send e.g. 8 bytes of user data over the IB link. Say also that there is no device driver or other kernel: we're keeping this simple and just writing directly to the hardware. Finally, say that the IB hardware has previously been configured properly for the context, so it's just waiting for something to do.
Q: How many CPU cycles will it take the local CPU to tell the hardware where the data is and that it should start sending it?
More info: I want to get an estimate of the cost of using PCIe communication services compared to CPU-local services (e.g. using a coprocessor). What I am expecting is that there will be a number of writes to registers on the PCIe bus, for example setting up an address and length of a packet, and possibly some reads and writes of status and/or control registers. I expect each of these will take several hundred CPU cycles each, so I would expect the overall setup would take order of 1000 to 2000 CPU cycles. Would I be right?
I am just looking for a ballpark answer...
Your ballpark number is correct.
If you want to send an 8 byte payload using an RDMA write, first you will write the request descriptor to the NIC using Programmed IO, and then the NIC will fetch the payload using a PCIe DMA read. I'd expect both the PIO and the DMA read to take between 200-500 nanoseconds, although the PIO should be faster.
You can get rid of the DMA read and save some latency by putting the payload inside the request descriptor.

Inter thread data transfer - Linux

My program have two thread created from main thread. Each thread operates on seperate external communicating device connected.
main thread
thread_1 thread_2
Thread_1 receives data packet from external device. Each data packet is an structure of 20 bytes each.
Now i want thread_2 to read data received by thread_1 & transfer it to device connected to it.
How can we transfer data between my two threads.
What exact name of the linux variables types to use in this case ?
Your problem is a classic example of the Producer Consumer Problem.
There a number of possible ways to implement this depending on the context - your post is tagged with both pthreads, and linux-device-drivers. Is this kernel-space, user-space, or kernel-space -> userspace?
Kernel Space
A solution is likely to involve a ring buffer (if you anticipate that multiple messages between threads can be in flight at once) and a semaphore.
Chapter 5 of Linux Device Drivers 3rd Edition would be a good place to start.
User-space
If both threads are in user-space, the producer-consumer pattern in the same process is usually implemented with a pthread condition variable. An worked example of how to do it is here
Kernel-space -> User-space
The general approach used in Linux is for user-space thread thread_2 to block on a filing system object signalled by kernel-space thread_1. Typically the filing system object in question is in /dev or /sys. LDD3 has examples of both approaches.

Resources