Interfacing peripheral drivers with RTOS - driver

For one of my project the controller selection made was STM32L1 series. ST provides the drivers for USB, I2C, SPI etc. So while making a decision on RTOS is there any consideration needed to be given to the drivers. Or in another way after deciding an RTOS, is there any standard way of interfacing peripheral drivers of the microcontroller with RTOS?

No, microcontroller peripheral drivers and the RTOS are typically independent so compatibility doesn't need to be a consideration. The microcontroller peripheral drivers are basic drivers that aren't reliant on any RTOS services. In fact the peripheral library can be used without any RTOS. And an RTOS typically does not rely on any microcontroller peripherals beyond a timer. Even the setup of the timer is not built-in to the RTOS. The timer is typically setup by user code, before starting the RTOS.
If I haven't convinced you and you still want some assurance of compatibility then explore CMSIS.

While ST's low level drivers do not have RTOS dependencies or requirements, you might build a higer-level driver architecture around these using RTOS mechanisms to support mutual exclusion, buffering, and to manager handler priority for example.
You could for example manage multi-thread access to a device either through a device manager thread, or via mutual exclusion.

There is no standard defined way to interface peripheral drivers to an RTOS as it depends on the RTOS. However, a common way is to take advantage of blocking mutex or semaphore that are provided by the RTOS. A blocking mutex means that if a mutex is not available, a task will wait until it is free and not use any CPU time until then.
Usually when running an RTOS, you want the peripheral driver to grab the input data as quickly as possibly, using an interrupt, and then pass off the data to an RTOS task that can take its time processing the data. This is a nice clean way of managing peripheral interrupts and RTOS multitasking.
The general scenario is then you have a task that waits on the mutex. Most of the time it does not take any CPU time. When a peripheral driver gets invoked by an interrupt, the driver grabs the data off the hardware, and frees the mutex so the waiting task will wake up. The actual data can be passed between the peripheral driver and the task using global variable or other RTOS defined mechanism. Similar mechanism can be done using a semaphore.
The ST's provided peripheral drivers (whether it is StdPeripheralLib, HAL, or LL) can operate in this model. Therefore when making a decision on which RTOS to use, you should consider an RTOS with API that supports this model.

Related

When to write a Custom Kernel Module

Problem Statement:
I have a very high bandwidth data link that is UDP based. The source of this data is not configurable, and sends on UDP a stream of datagrams. We have code that uses the standard methods for receiving data on the UDP socket that works adequately. I wanted to know if
Does there exist a command interface to extract multiple UDP datagrams at a time? to improve efficiency?
If one doesn't exist, does it make sense to create a kernel module to provide the capability?
I am a novice, and i wanted to understand what thought process has to happen when writing your own kernel module seems appropriate. I know that such a surgical procedure isn't meant to done lightly, but there must be a set of criteria where that action is prudent. Maybe not in my case, but in general.
HW / Kernel Module Perspective
A typical network adapter these days would be capable of distributing received packets across multiple hardware Rx queues thus letting the host run multiple software Rx queues bound to different CPU cores reading out packets in parallel. From a single HW/SW queue perspective, the host may poll it for new packets (see Linux NAPI), with each poll ideally yielding a batch of packets, and, alternatively, the host may still use interrupt-driven approach for Rx signalling with interrupt coalescing turned on for improved efficiency.
Existing NIC drivers in Linux kernel strive to stick with the most performant techniques, and the kernel itself should be able to leverage all of that properly.
Userland / Application Perspective
There's PACKET_MMAP interface provided by Linux kernel for improved Rx/Tx efficiency on the application side. Long story short, an application can set up a memory buffer shared between the kernel- and userspace and read out incoming packets from it, ideally in batches, or blocks, thus avoiding costly kernel-to-userspace copies and context switches so customary when using regular methods.
For added efficiency, the application may have multiple sockets bound to the NIC in separate threads / processes and demand that packet reception be load balanced across these sockets (see AF_PACKET fanout mode description).
DPDK Perspective
Kernel bypass framework that allows an application to seize full control of a network adapter by means of a vendor-specific poll-mode driver, or PMD, effectively running in userspace as part of the application and by its very nature not needing any kernel-to-userspace copies, context switches and, most likely, locking. Multi-queue receive operation, load balancing (round robin, RSS, you name it) and more cutting edge offloads are likely to be available, too (it's vendor specific).
Summary
The short of it, given the fact that multiple network acceleration techniques already exist, one need never write their own kernel module to solve the problem in question. By the looks of it, your application, which, as you say, uses standard methods, is not aware of PACKET_MMAP technique. So I'd be tempted to suggest looking at this one closely. DPDK approach might require that the application be effectively re-implemented from scratch, so I would first go for PACKET_MMAP approach as a low-hanging fruit.

STM32F407ZET6, Is it possible for Multiple streams of a DMA to run in Parallel?

Hi I am using STM32F407ZET6 Microcontroller and I want to use multiple streams of DMA1. Is it possible to trigger two different streams of the same DMA for transferring data to two different peripherals simulatenously. (Like in Parallel).
In the advanced AHB bus matrix I observe that for each DMA there are only two lines, one for memory and one for peripheral, which suggest to me that at any time at max two streams can perhaps run in parallel and that also if none of the streams are really doing memory<->peripheral transfer. Is this assumption correct? And, is this also correct that to run two streams in parallel through a single DMA they should not be doing memory<->peripheral transfer? what I mean is that by the look of AHB matrix it felt if only Mem to Mem and Periph to Periph transfers are done then probably two streams can run in parallel, but if any one of them does memory<->peripheral transfer then the use of DMA memory and peripheral interface for a single transfer will probably make that NOT possible. Can you shed some light on this?
I would like to request some guidance on this particular topic as i could not find satisfactory information on it... And if it is dependent on the bus bandwidth to transfer streams in parallel then how the bandwidth is divided among multiple channels for a single bus to perform multiple transfer.... Some If there is any such example, i would be thankful. As a reference I have put the AHB matrix below:
You can only select one channel per stream, but you can enable all 8 streams per DMA peripheral at once if you like, subject to the hardware defect listed in the errata sheet*.
Each of the masters take turns to access the buses. Once a master takes the bus it decides how long to use it for. For the DMA master, this is configured with the MBURST and PBURST bits of the DMA_SxCR register. If you require very low latency in the system and do not want the processor or another master (ethernet etc) to be stalled and have to wait for the DMA to get off the bus then set the burst configuration short (but even the longest burst you can configure will still only be a microsecond or so).
(*) there is a hardware defect in DMA2 which disallows concurrent use of AHB and APB peripherals, see the errata for details.

Using kqueue for simple async io

How does one actually use kqueue() for doing simple async r/w's?
It's inception seems to be as a replacement for epoll(), and select(), and thus the problem it is trying to solve is scaling to listening on large number of file descriptors for changes.
However, if I want to do something like: read data from descriptor X, let me know when the data is ready - how does the API support that? Unless there is a complimentary API for kicking-off non-blocking r/w requests, I don't see a way other than managing a thread pool myself, which defeats the purpose.
Is this simply the wrong tool for the job? Stick with aio?
Aside: I'm not savvy with how modern BSD-based OS internals work - but is kqueue() built on aio or visa-versa? I would imagine it would depend on whether the OS io subsystem system is fundamentally interrupt-driven or polling.
None of the APIs you mention, aside from aio itself, has anything to do with asynchronous IO, as such.
None of select(), poll(), epoll(), or kqueue() are helpful for reading from file systems (or "vnodes"). File descriptors for file system items are always "ready", even if the file system is network-mounted and there is network latency such that a read would actually block for a significant time. Your only choice there to avoid blocking is aio or, on a platform with GCD, dispatch IO.
The use of kqueue() and the like is for other kinds of file descriptors such as sockets, pipes, etc. where the kernel maintains buffers and there's some "event" (like the arrival of a packet or a write to a pipe) that changes when data is available. Of course, kqueue() can also monitor a variety of other input sources, like Mach ports, processes, etc.
(You can use kqueue() for reads of vnodes, but then it only tells you when the file position is not at the end of the file. So, you might use it to be informed when a file has been extended or truncated. It doesn't mean that a read would not block.)
I don't think either kqueue() or aio is built on the other. Why would you think they were?
I used kqueues to adapt a Linux proxy server (based on epoll) to BSD. I set up separate GCD async queues, each using a kqueue to listen on a set of sockets. GCD manages the threads for you.

Please tell about the query of network packet traversal in linux

I was reading Understanding linux networking Internal book and the pdf Network packet capture in Linux kernelspace on the link networkkernel.pdf
In the Understanding linux networking Internal under topic 9.2.2 it is given that
The code that takes care of an input frame is split into two parts: first the driver copies the frame into an input queue accessible by the kernel, and then the kernel processes it (usually passing it to a handler dedicated to the associated protocol such as IP). The first part is executed in interrupt context and can preempt the execution of the second part.
Now the query is when the 2nd part is scheduled? Who schedules them? Is the call given itself in interrupt handler? and in Network packet capture in Linux kernel space the packet input flow is described as:-
When working in interrupt driven model, the nic registers an
interrupt handler;
• This interrupt handler will be called when a frame is received;
• Typically in the handler, we allocate sk buff by calling
dev alloc skb();
• Copies data from nic’s buffer to this struct just created;
• nic call generic reception routine `netif_rx();`
• `netif rx()` put frame in per cpu queue;
• if queue is full, drop!
• net rx action() decision based on skb->protocol;
• This function basically dequeues the frame and delivery a copy
for every protocol handler;
• ptype all and ptype base queues
I want to know when netif rx(); and net rx action() are called? Who call them i mean who schedule them.
Please guide.
This scheduling is done by NAPI structure. Packets are cptured by method as descibed. This softirq comes into scenario when there is 'livelock' problem or flood of packet; these are handled.
Packet egresses are significantly more complex than packet ingresses, with Queue management and QOS (perhaps even packet-shaping) being implemented. “Queue Discplines” are used to implement user-specifiable QOS policies.
NAPI Structure Scheduling:
For NAPI structure is defined as; its driver Design & Hardware Architecture.
Linux uses the ksoftirqd as the general solution to schedule softirq's to run before next interrupt and by putting them under scheduler control. Also this prevents consecutive softirq's from monopolizing the CPU. This also has the effect that the priority of ksoftirq needs to be considered when running very CPU-intensive applications and networking to get the proper balance of softirq/user balance. Increasing ksoftirq priority to 0 (eventually more) is reported to cure problems with low network performance at high CPU load.
There is also a research paper on NAPI Scheduling.

Is contiki scheduler preemptive?

Is contiki scheduler preemptive? Tinyos is not; there's nanork which i'm not sure of what state its development is in.
Contiki supports preemptive threads. Refer to:
https://github.com/contiki-os/contiki/wiki/Multithreading
Contiki-OS for IoT's supports preemptive multi-threading. In contiki, multi-threading is implemented as a library on top of the event-driven kernel for dynamic loading and replacement of individual services. The library can be linked with applications that require multi-threading. Contiki multithreading library is divided in two parts: (i) a platform independent part (2) platform specific. The platform independent part interfaces to the event kernel and the platform specific part of the library implements stack switching and preemption primitives. Contiki uses protothreads for implementing so called multi-threading. Protothreads are designed for severely memory constraint devices because they are stack-less and lightweight. The main features of protothreads are: very small memory overhead (only two bytes per protothread), no extra stack for a thread, highly portable (i.e., they are fully written in C and hence there is no architecture-specific assembly code). Contiki does not allow interrupt handlers to post new events, no process synchronization is provided in Contiki. The interrupt handler (when required) and the reader functions must be synchronized to avoid race condition. Please have a look at the following link [The Ring Buffer Library] also: https://github.com/contiki-os/contiki/wiki/Libraries
It might be worth noting that the port for the most widely used sensor node, the TelosB, does not support preemption.

Resources