How long is a "tick" in FreeRTOS? - freertos

For the functions xTaskGetTickCount() and xTaskGetTickCountFromISR(), the FreeRTOS documentation doesn't give any indication of what a "tick" is, or how long it is, or any links to where to find out.
Returns:
The count of ticks since vTaskStartScheduler was called.
What is a "tick" in FreeRTOS? How long is it?

First found the answer in an archived thread at FreeRTOS forums:
The tick frequency is set by configTICK_RATE_HZ in FreeRTOSConfig.h. FreeRTOSConfig.h settings are described here:
http://www.freertos.org/a00110.html
If you set configTICK_RATE_HZ to 1000 (1KHz), then a tick is 1ms (one one thousandth of a second). If you set configTICK_RATE_HZ to 100 (100Hz), then a tick is 10ms (one one hundredth of a second). Etc.
And from the linked FreeRTOS doc:
configTICK_RATE_HZ
The frequency of the RTOS tick interrupt.
The tick interrupt is used to measure time. Therefore a higher tick frequency means time can be measured to a higher resolution. However, a high tick frequency also means that the RTOS kernel will use more CPU time so be less efficient. The RTOS demo applications all use a tick rate of 1000Hz. This is used to test the RTOS kernel and is higher than would normally be required.
More than one task can share the same priority. The RTOS scheduler will share processor time between tasks of the same priority by switching between the tasks during each RTOS tick. A high tick rate frequency will therefore also have the effect of reducing the 'time slice' given to each task.

Related

Is there an autonimus realtime clock wilth month loss less than 10milliseconds?

Looking for a realtime clock for IoT project. Need a millisecond resolution for my app protocol and its loss is critical. So I wonder if there is an autonimus realtime clock (with a battery) that will loose less than 10ms per month and work for a year?
The drift parameters you're asking for here -- 10 ms / 30 days -- imply <4 ppb accuracy. This will be a very difficult target to hit. A typical quartz timing crystal of the type used by most RTCs will drift by 50 - 100+ ppm (50,000 - 100,000 ppb) just based on temperature fluctuations.
Most of the higher-quality timing options (TCXO, OCXO, etc) will not be usable within your power budget -- a typical OCXO may require as much as 1W of (continuous) power to run its heater. About the only viable option I can think of would be a GPS timing receiver, which can synchronize a low-quality local oscillator to the GPS time, which is highly accurate.
Ultimately, though, I suspect your best option will be to modify your protocol to loosen or remove these timing requirements.
Sync it with the precise clock source like GPS for example.
You can also use tiny atomic clock https://physicsworld.com/a/atomic-clock-is-smallest-on-the-market/
or in Europe DCF77 receiver.

FreeRTOS ISR at 50 KHz

I want to blink a led (toggle a GPIO pin) synchronously with an hardware timer configured to rise an interrupt at 50KHz on an ARM Cortex M4.
In my current code, I toggle one GPIO pin into one specific ISR handler triggered on a 50KHz external clock signal. The result is that the GPIO pin is toggled very erratically at random frequencies from 1KHz to 96KHz.
The Operating System is not running any other task apart from the Timer Tick Interrupt (100Hz with the lowest priority) the IDLE Task and my specific ISR handler.
Otherwise, this "toggling solution" is working perfectly with a Bare Metal implementation on the same MCU. So, my problem seems to come from my lack of knowledge in the FreeRTOS environment.
Is it feasible to toggle a LED into a FreeRTOS ISR at 50 KHz ?
Do I need to do it into a task waiting for the 50KHz interrupt signal
?
Should I create a timed toggling task at 50KHz and synchronize it
periodically with the external clock signal ?
If you want the 50K to be accurate (no jitter) the priority of the interrupt would have to be at or above configMAX_SYSCALL_INTERRUPT_PRIORITY: https://www.freertos.org/a00110.html#kernel_priority and https://www.freertos.org/RTOS-Cortex-M3-M4.html (assuming you are using a port that supports interrupt nesting).
I finally find out that the instruction cache was not enabled in the FreeRTOS implementation of my 50KHz ISR unlike in the Bare Metal version. It's now almost working as expected (some jitter remains).
In reply to my questions I suggest the following :
Is it feasible to toggle a LED into a FreeRTOS ISR at 50 KHz ?
Definitely yes. FreeRTOS can perform ISR at any frequencies. The feasibility only depends on the computational capablities of the MCU added to the instruction and data access performances. For sure FreeRTOS will add some delays compared to a Bare Metal implementation to process an ISR (ISR overhead and port efficiency) but once again, this will be more or less significant depending of the MCU's computational performances and the considerate frequency.
Do I need to do it into a task waiting for the 50KHz interrupt signal ?
Not necessarily. Anyway, this is a good alternative if an irregular or a rather important processing is needed. However the FreeRTOS deferring instructions will cost time and may be too much to end the processing before the next ISR. More resilient but less efficient. In my case at 50KHz, this cost too much time regarding the average amount of processing per ISR.
Should I create a timed toggling task at 50KHz and synchronize it periodically with the external clock signal ?
Not feasible unless reduce the system tick frequency significantly which will cause a severe loss of performance.
You could not get 50 KHz by FreeRTOS mechanisms.
Оf course, you can try but this bad idea. So you should change system tick at least 10uS(1 / 50KHz / 2 ) in this case your task will have small latency(it allows react "immediately"), but context switch ISR will be called very often and this reduces performance.
The right way is using hardware timer to generate the interrupt (for toggle GPIO) or use timer with PWM output in this case frequency accuracy depends on clock source.
In order to synchronize timer from external sources, you should use the external interrupt and additional timer with high accuracy (at least 10uS)

How does a gnuradio source block know how many samples to output?

I'm trying to understand how gnuradio source blocks work. I know how to make a simple one that outputs a constant and I understand what sample rate means, but I'm not sure how (or where) to combine the two.
Is the source block in charge of regulating the amount of data to output? Or does the amount that it outputs depend upon other blocks in the flow graph and how much they consume? Some source blocks take sample_rate as an input, which makes me think it's the former. But other blocks don't, which makes me think it's the later.
If a source block is in charge of its sample rate, how does it regulate it? Do they check the system clock and output samples based upon that?
Do they check the system clock and output samples based upon that?
Definitely not. All GNU Radio blocks operate at the maximum speed the processor can give.
However, GNU Radio relies on the fact that each flowgraph may have a source and/or sink device (e.g USRP, other SDR device, sound card) that produces/consumes samples in a constant rate. Consequently, the flowgraph is throttled at the rate of the hardware.
In order to avoid CPU saturation, if none of these hardware devices exist, GNU Radio provides the Throttle block that tries (it is not so accurate) to throttle the samples per second at the given rate, by sleeping for suitable amount of time between each sample that passes through the Throttle block.
As far the sample_rate parameter concerns, excluding the Throttle and device specific blocks, it is used only for graphical representation or internal calculations.

Xperfview: What's the difference between CPU sampling and CPU Usage?

This question pertains to xperf and xperfview, utilities that are part of the Windows Performance Toolkit (in turn part of Windows SDK 7.1).
Comparing two charts, "CPU sampling by thread" and "CPU Usage by thread", there are several differences I don't understand. I'll use audiodg.exe as an example.
In the Threads pulldown, there is only one thread for audiodg on the CPU Sampling chart; the CPU Usage chart shows several audiodg threads.
Both graphs have a Y-axis marked "% Usage", but the measurements differ. Typically the % usage for a given thread is lower on the CPU Sampling chart than on the CPU Usage chart.
The CPU Sampling summary table shows Weight and % weight for each module/process. If I load symbols, I can dig pretty deep into the audiodg process. The CPU Scheduling Aggregate Summary table (launched from the CPU Usage graph) shows CPU Usage and % CPU usage -- Weight is not available. (Conversely, CPU Usage is not available on the CPU Sampling summary table.) I cannot dig as deep into audiodg -- I only see the main thread and a few ntdll.dll threads.
The numbers for any process in the % CPU usage and % Weight columns are always different. Sometimes they differ by more than 75%.
So my questions ... what is the reliable measure of CPU usage here? Aren't the CPU Usage numbers derived from CPU Samples? Shouldn't the numbers relate somehow?
Xperf does make this a bit confusing, this is my understanding of what's going on:
CPU sample data, enabled with the PROFILE kernel flag. CPU sample data is collected at some regular interval, and records information about what the CPU was doing at that time (e.g. the process, thread Id, and callstack at the time of the sample.)
Context switch data, enabled with the CSWITCH kernel flag. This records data about every context switch that happens (e.g. who was switched in/out and the callstacks.)
CPU sampling by thread shows the number of profile events that were recorded for each thread, aggregated over some interval of time for the duration of the trace. For example, if audiodg was executing 10% of the time for 2 seconds, we would expect to see about 10 "% usage" over that time. However, because this is based on sampling, it's possible that at each sample event, threads from another process happened to be executing--in other words, the 10% was 'missed' by the sample events.
CPU Usage by thread is calculated using the context switch data. The 'usage' is the amount of time between being context switched in and then out later (and of course, this data is aggregated over some small interval).
There are benefits to each data:
CPU sampling will actually tell you what the thread is doing at the time of the sample because it collects call stacks during the execution of the thread. The context switch information will only tell you when a thread gets switched in or out, but nothing between.
Context switch information will tell you exactly how much time every thread got to execute. This data is correct. Sampling, of course, is only probabilistic.
So to answer your question, the CPU Usage chart is "more accurate" for understanding how much time each thread was executing. However, don't rule out the use of the sampling data because it can be much more helpful for understanding where your threads were actually spending their time! For the CPU sampling data, the summary table is more valuable because it will show you the stacks. For the CPU usage data, the chart is probably more helpful than the summary table.
Hope that helps!

Does a one cycle instruction take one cycle, even if RAM is slow?

I am using an embedded RISC processor. There is one basic thing I have a problem figuring out.
The CPU manual clearly states that the instruction ld r1, [p1] (in C: r1 = *p1) takes one cycle. Size of register r1 is 32 bits. However, the memory bus is only 16 bits wide. So how can it fetch all data in one cycle?
The clock times are assuming full width zero wait state memory. The time it takes for the core to execute that instruction is one clock cycle.
There was a time when each instruction took a different number of clock cycles. Memory was relatively fast then too, usually zero wait state. There was a time before pipelines as well where you had to burn a clock cycle fetching, then a clock cycle decoding, then a clock cycle executing, plus extra clock cycles for variable length instructions and extra clock cycles if the instruction had a memory operation.
Today clock speeds are high, chip real estate is relatively cheap so a one clock cycle add or multiply is the norm, as are pipelines and caches. Processor clock speed is no longer the determining factor for performance. Memory is relatively expensive and slow. So caches (configuration, number of and size), bus size, memory speed, peripheral speed determine the overall performance of a system. Normally increasing the processor clock speed but not the memory or peripherals will show minimal if any performance gain, in some occasions it can make it slower.
Memory size and wait states are not part of the clock execution spec in the reference manual, they are talking about only what the core itself costs you in units of clocks for each of the instructions. If it is a harvard architecture where the instruction and data bus are separate, then one clock is possible with the memory cycle. The fetch of the instruction happens at least the prior clock cycle if not before that, so at the beginning of the clock cycle the instruction is ready, decode, and execute (the read memory cycle) happen during the one clock at the end of the one clock cycle the result of the read is latched into the register. If the instruction and data bus are shared, then you could argue that it still finishes in one clock cycle, but you do not get to fetch the next instruction so there is a bit of a stall there, they might cheat and call that one clock cycle.
My understanding is : when saying some instruction take one cycle , it is not that instruction will be finished in one cycle. We should take in count of instruction pipe-line. Suppose your CPU has 5 stage pipe line , that instruction would takes 5 cycles if it were exectued sequentially.

Resources