When using an RTOS (ex FreeRTOS), we have separate stack spaces for each thread. So what about ISR (Interrupt Service Routines), does they have a separate stack in the memory? Or is this configurable?
If they don't have a stack where the local variables declared in ISR get stored?
I have the exact same question and a lot of searching leads me to this conclusion: the answer is dependent on your chip and how the OS you use configures that chip.
So looking at one of my favorite chips ARM Cortex-M3 (for which interrupts are a form of exception), the documentation at various spots reads:
Operating Modes
The Cortex-M3 supports Privileged and User (non-privileged) execution.
Code run as Privileged has full access rights whereas code executed as
User has limited access rights. The limitations include restrictions
on instruction use such as MSR fields, access to memory and
peripherals based on system design, and restrictions imposed by the
MPU configuration.
The processor supports two operation modes, Thread mode and Handler
mode. Thread mode is entered on reset and normally on return from an
exception. When in Thread mode, code can be executed as either
Privileged or Unprivileged.
Handler mode will be entered as a result of an exception. Code in
Handler mode is always executed as Privileged, therefore the core will
automatically switch to Privileged mode when exceptions occur. You can
change between Privileged Thread mode and User Thread mode when
returning from an exception by modifying the EXC_RETURN value in the
link register (R14). You can also change from Privileged Thread to
User Thread mode by clearing CONTROL[0] using an MSR instruction.
However, you cannot directly change to privileged mode from
unprivileged mode without going through an exception, for example an
SVC.
Main and Process Stacks
The Cortex-M3 supports two different stacks, a main stack and a
process stack. To support this the Cortex-M3 has two stack pointers
(R13). One of these is banked out depending on the stack in use. This
means that only one stack pointer at a time is visible as R13.
However, both stack pointers can be accessed using the MRS and MSR
instructions. The main stack is used at reset, and is always used in
Handler mode (when entering an exception handler). The process stack
pointer is only available as the current stack pointer when in Thread
mode. You can select which stack pointer (main or process) is used in
Thread mode in one of two ways, either by using the EXC_RETURN value
when exiting from Handler Mode or while in Thread Mode by writing to
CONTROL[1] using an MSR instruction.
And...
When the processor takes an exception, unless the exception is a
tail-chained or a late-arriving exception, the processor pushes
information onto the current stack. This operation is referred to as
stacking and the structure of eight data words is referred as the
stack frame. ...
Immediately after stacking, the stack pointer indicates the lowest
address in the stack frame
From the book "The Definitive Guide to the ARM Cortex-M3":
The MSP, also called SP_main in ARM documentation, is the default SP
after power-up; it is used by kernel code and exception handlers. The
PSP, or SP_process in ARM documentation, is typically used by thread
processes in system with embedded OS running.
Because exception handlers always use the Main Stack Pointer, the main
stack memory should contain enough space for the largest number of
nesting interrupts.
When an exception takes place, the registers R0–R3, R12, LR, PC,
and Program Status (PSR) are pushed to the stack. If the code that is
running uses the Process Stack Pointer (PSP), the process stack will
be used; if the code that is running uses the Main Stack Pointer
(MSP), the main stack will be used. Afterward, the main stack will
always be used during the handler, so all nested interrupts will use
the main stack.
UPDATE 6/2017:
My previous answer was incorrect, I have analyzed FreeRTOS for cortex processors and rewritten my answer to:
The standard FreeRTOS version for the Cortex-M3 does in fact configure and use both the MSP and PSP. When the very first task runs it modifies MSP to point to the first address specified in the vector table (0x00000000), this tends to be the very last word in SRAM, then it triggers a system call, in the system call exception handler it sets the PSP to the next task stack location, then it modifies the exception LR value such that "return to thread mode and on return use the process stack".
This means that the interrupt service routine (AKA exception handler) stack is grows down from the address specified in the vector table.
You can configure your linker and startup code to locate the exception handler stack wherever you like, make sure your heap or other memory areas do not overlap the exception handler area and make sure the area is large enough.
The answer for other chips and operating systems could be completely different!
To help ensure your application has appropriate space on the ISR stack (MSP),
here's some additional code to check actual ISR stack use. Use in addition to the checking you're already doing on FreeRTOS task stack use:
https://sourceforge.net/p/freertos/discussion/382005/thread/8418dd523e/
Update: I've posted my version of port.c that includes the ISR stack use check on github:
https://github.com/DRNadler/FreeRTOS_helpers
Related
I am currently working on a project to develop an application in STM32 microcontroller using RTOS (micrium).
Are there any tools to calculate the stack usage of a particular thread in RTOS application?
No tools I know of. However, two simple methods to estimate stack usage have always worked for me.
Fill all RAM with a value like 0x55 or 0xAA. Let the program run long enough while using all of the device's options to have the most code execution coverage. Stop (under some debugger), and examine RAM for the above values being overwritten. That should give you a good approximation. This works with or without an OS.
Modify the OS just a bit so that on task switches you record to some global variable (array) and for each task the lowest stack pointer found by comparing to the previous value for the same task. After running the app long enough as in [1], examine the counters. Although there is no guarantee the moment a task switch happens you will have the maximum stack used for that task, statistically, after long enough time and assuming preemptive switching, you will have managed to record an accurate enough value.
If you are using GCC or clang -fstack-usage compiler switch generates a stack frame size for each function. You need to combine that information with call-graph information generated by the linker to find the deepest stack usage starting from a specific function. Starting at main(), a task entry-point and and ISR will then give you the worst-case usage for that thread.
Helpfully the work to create such a tool has been done for you as discussed here, using a Perl script from here.
ARM's armcc compiler v5 and earlier (v6 is clang/llvm) has this functionality built-in and can include detailed stack analysis in the link map, including the worst-case call path and warnings of non-deterministic stack usage (due to recursion or call-backs through function pointers for example). You may be using armcc if you are using Keil ARM MDK for example. Again for multi-threaded systems (tasks/ISRs) you need to look at the stack usage for the thread entry point.
Note also that on ARM Cortex-M, the "system stack" is shared by the main() thread and all ISRs, and if you use the ISR preemption priorities multiple interrupts may be active simultaneously. So in theory worst case stack usage is the sum of the stack usage for each of main() and all ISRs that may occur concurrently. Whilst it is good practice to keep ISRs short and simple, beware of third-party code. ST's USB library for example runs the entire USB device stack in the ISR context for example!
Does anyone know of documentation on the memory consistency model guarantees for a memory region allocated with cudaHostAlloc(..., cudaHostAllocMapped)? For instance, when writes from the device become visible to reads from the host would be useful (could be after the kernel completes, at earliest possible time during kernel execution, etc).
Writes from the device are guaranteed to be visible on the host (or on peer devices) after the performing thread has executed a __threadfence_system() call (which is only available on compute capability 2.0 or higher).
They are also visible after the kernel has finished, i.e. after a cudaDeviceSynchronize() or after one of the other synchronization methods listed in the "Explicit Synchronization" section of the Programming Guide has been successfully completed.
Mapped memory should never be modified from the host while a kernel using it is or could be running, as CUDA currently does not provide any way of synchronization in that direction.
Does the interrupt handler use the stack of the task that's interrupted or a separate stack as its stack? (PowerPC, VxWorks)
This is architecture dependent. From the VxWorks Kernel Programmer's Guide (v6.8):
All ISRs use the same interrupt stack. [...]
CAUTION: Some architectures do not permit using a separate interrupt stack, and
ISRs use the stack of the interrupted task. [...] See
the VxWorks reference for your BSP to determine whether your architecture
supports a separate interrupt stack.
In your case, PowerPC does support a separate shared interrupt stack (per core).
In VxWorks, there is a specific stack for interrupts. All Interrupt handlers share that same stack, which is located just above where the vxWorks image is loaded.
I believe the default stack size is 5K, but can easily be changed with the kernel configurator.
The ISR mechanism works roughly this way:
You can think of VxWorks as typically installs an assembly code wrapper around your ISR code.
On Entry, it automatically saves the general purpose registers (on the ISR stack) so the executing context (another ISR or a task) state is preserved.
On Exit, the registers are restored, but in addition, the OS scheduler is called to see if the just finished ISR changed the state of a higher priority task. If this happened, then the higher priority task resumes. If no higher priority tasks are available, then the original task is restored.
xiaokaoy,
There is a pretty good description of how interrupts work in the VxWorks Programmer's Guide section 2.6. If you don't have a copy, it's available online from many sources.
I understand the basic concept of stack and heap but great if any1 can solve following confusions:
Is there a single stack for entire application process or for each thread starting in a project a new stack is created?
Is there a single Heap for entire application process or for each thread starting in a project a new stack is created?
If Stack are created for each thread, then how process manage sequential flow of threads (and hence stacks)
There is a separate stack for every thread. This is true not only for CLR, and not only for Windows, but pretty much for every OS or platform out there.
There is single heap for every Application Domain. A single process may run several app domains at once. A single app domain may run several threads.
To be more precise, there are usually two heaps per domain: one regular and one for really large objects (like, say, a 64K array).
I don't understand what you mean by "sequential flow of threads".
One stack for each thread, all threads share the same heaps.
There is no 'sequential flow' of threads. A thread is an operating system object that stores a copy of the processor state. The processor state includes the register values. One of them is ESP, the stack pointer. Another really important one is EIP, the instruction pointer. When the operating system switches between threads, it stores the processor state in the current thread object and reloads the state from the thread object for the thread that was selected to run next. The processor now simply continues executing where it left off previously.
Getting a thread started is perhaps now easy to understand as well. The operating system allocates a megabyte of memory for the stack. And initializes the ESP register value to point to that memory. And sets the value of the EIP register to the address of the method where the thread should start executing. The value of the ThreadStart delegate in C#.
Each thread must have it's own stack, that's where local variables and parameters are held, and the return addresses of the previous functions.
i have a probably stack overflow in my application (off course, only in release mode...), and would like to add some protection/investigation code to it.
i am looking for a windows API to tell me the current state of a thread stack (i..e, the total size and used size).
anyone ?
thx
Noam
The total size of the stack will be the size of the stack you asked for when you created the thread ( or linked the program if it's the main thread ).
There are some preliminary references to getting the stack size for a thread pool in Windows 7 on MSDN ( QueryThreadpoolStackInformation ).
As an approximation, you can compare the address of a local variable with the address of another local variable further down the stack to get a measure of the amount us. I believe that how a program running in windows chooses to lay its local variables out in the virtual memory space windows allocates to a thread is up to the implementation of that language's runtime, rather than something that Windows really knows about; instead you get an exception when you attempt to access an address just below the memory allocated for the stack.
The other alternative to complicating your code with a check whether the stack has reached a limit is to add an exception handler for EXCEPTION_STACK_OVERFLOW, which will get called by the OS when it checks that the stack has reached its limit. There's an example here.