why "A stack overflow in task iot_thread has been detected" is coming continuously? - stack

I'm working on an aws/amazon-freertos project. In there I found some unusual error "A stack overflow in task iot_thread has been detected".
Many time I got this error and somehow I managed to remove it by changing the code.
I just want to know what this error means actually?
As per what I know, it simply means that the iot_thread ask stack size is not sufficient. So it's getting overflow.
Is this the only reason why this error comes or can there be another reason for this?
If yes then where should I increase the stack size of the iot_thread task?
Full Log:
***ERROR*** A stack overflow in task iot_thread has been detected.
abort() was called at PC 0x4008cf94 on core 0
0x4008cf94: vApplicationStackOverflowHook at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/esp32/panic.c:122
ELF file SHA256: 5252c38b96a3325472de5cf0f1bbc2cae90ffda1a169e71823c5aebbaa4fce2d
Backtrace: 0x4008cdac:0x3ffd74c0 0x4008cf7d:0x3ffd74e0 0x4008cf94:0x3ffd7500 0x40093e35:0x3ffd7520 0x400953d4:0x3ffd7540 0x4009538a:0x3ffd7560 0x401390e5:0x3ffc81bc
0x4008cdac: invoke_abort at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/esp32/panic.c:136
0x4008cf7d: abort at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/esp32/panic.c:171
0x4008cf94: vApplicationStackOverflowHook at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/esp32/panic.c:122
0x40093e35: vTaskSwitchContext at /home/horsemann/Desktop/WorkSpace/TestingRepo/freertos_kernel/tasks.c:5068
0x400953d4: _frxt_dispatch at /home/horsemann/Desktop/WorkSpace/TestingRepo/freertos_kernel/portable/ThirdParty/GCC/Xtensa_ESP32/portasm.S:406
0x4009538a: _frxt_int_exit at /home/horsemann/Desktop/WorkSpace/TestingRepo/freertos_kernel/portable/ThirdParty/GCC/Xtensa_ESP32/portasm.S:206

It simply means that the iot_thread ask stack size is not sufficient. [...] Is this the only reason why this error comes or can there be another reason for this?
Either it is insufficient you your stack usage is excessive (due to recursion error or instantiation of instantiation of large objects or arrays. Either way the cause is the same. Whether it is due insufficient stack or excessive stack usage is a matter of design an intent.
If yes then where should I increase the stack size of the iot_thread task?
The stack for a thread is assigned in the task creation function. For a dynamically allocated stack that would be the xTaskCreate() call usStackDepth parameter:
BaseType_t xTaskCreate( TaskFunction_t pvTaskCode,
const char * const pcName,
configSTACK_DEPTH_TYPE usStackDepth, // <<<<<<<<<<
void *pvParameters,
UBaseType_t uxPriority,
TaskHandle_t *pxCreatedTask
);
and for a statically allocated stack the xTaskCreateStatic() ulStackDepth and puxStackBuffer().
TaskHandle_t xTaskCreateStatic( TaskFunction_t pxTaskCode,
const char * const pcName,
const uint32_t ulStackDepth, //<<<<<<
void * const pvParameters,
UBaseType_t uxPriority,
StackType_t * const puxStackBuffer, //<<<<<<<
StaticTask_t * const pxTaskBuffer );
In some case you might also use xTaskCreateRestrictedStatic() but that is more complicated - read the documentation if that is what you are doing.
Appropriate allocation of task stacks is fundamental to any RTOS application design. You need to read the documentation and understand the concepts.
The FreeRTOS API is extensively documented at https://www.freertos.org/a00106.html (with plenty of documentation and tutorial content at https://www.freertos.org/RTOS.html besides. There really is no substitute for reading the documentation.

Related

How to make FastMM log file more readable

I recently started using FastMM, now I want to know if is it possible to configure FastMM log file to be a bit more readable so that it can give me direction on where exactly the the memory is being leaked, like specifying the actual unit and function/line number, in most cases I would get something like...
A memory block has been leaked. The size is: 20
This block was allocated by thread 0xE7C, and the stack trace (return addresses) at the time was:
5005995A [System.pas][System][System.#GetMem][4316]
5005E99F [System.pas][System][System.TObject.NewInstance][15447]
5005F292 [System.pas][System][System.#ClassCreate][16757]
23A606 [FireDAC.Stan.Util.pas][FireDAC.Stan.Util][FireDAC.Stan.Util.TFDBuffer.Create][1516]
4C46D3 [FireDAC.Phys.IBWrapper][Phys.Ibwrapper.TIBDatabase.$bctr$qqrp29Firedac.Phys.Ibwrapper.TIBEnvp14System.TObject]
4C4730 [FireDAC.Phys.IBWrapper][Phys.Ibwrapper.TIBDatabase.$bctr$qqrp29Firedac.Phys.Ibwrapper.TIBEnvpvp14System.TObject]
4CEF79 [FireDAC.Phys.IBBase][Phys.Ibbase.TFDPhysIBConnectionBase.InternalConnect$qqrv]
E4A553 [FireDAC.Phys.pas][FireDAC.Phys][FireDAC.Phys.TFDPhysConnection.ConnectBase][3161]
E4A60E [FireDAC.Phys.pas][FireDAC.Phys][FireDAC.Phys.TFDPhysConnection.DoConnect][3187]
E4B11C [FireDAC.Phys.pas][FireDAC.Phys][FireDAC.Phys.TFDPhysConnection.Open][3361]
34A14C [FireDAC.Comp.Client.pas][FireDAC.Comp.Client][FireDAC.Comp.Client.TFDCustomConnection.DoInternalLogin][3642]
The block is currently used for an object of class: TFDBuffer
Which does not really give me what I need.

How do I debug a memory issue in Rust?

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

printing the stack of a task in FreeRTOS

I am working on a STM32F4-discovery board, i installed FreeRTOS on the board and was able to run two tasks created by main function.
Now i want task 1 to access the local variables of task 2 with any passing of variable by reference or by value.
I thought it would be good to print the stack content of task 2 and then locate the local variables and use that in task1
Can somebody guide me with this?
I tried to print the address of each variable and tried to use in task1, but my program did not compile and returned -1.
It is not clear to me what you are trying to achieve, or what you mean by the program returning -1 because it didn't compile, but it would not be normal for one task to access a stack variable of another task.
Each task has its own stack, and the stack is private to the task. There are lots of ways tasks can communicate with each other though, without the need to access each other's stack. The simplest way is just to make the variable global, although global variables are rarely a good thing. Other than that you could send the value from one task to another on a queue (http://www.freertos.org/Inter-Task-Communication.html), or even using a task notification as a mailbox (http://www.freertos.org/RTOS_Task_Notification_As_Mailbox.html).
First, a cautionary note: what you're trying to do -- access local variables of one task from another -- can be error-prone. Particularly if the local is declared in Task A and goes out of scope before Task B accesses it. Since stack memory can be reused for different variables in different functions, Task B might be accessing some other variable.
But I've actually used this pattern in practice -- specifically, to allow one task to stack-allocate a buffer for communication serviced by another task.
So I'll just assume that you know what you're doing. :-)
It's difficult to compute the address of a local variable ahead of time. If you derive it today, it's likely to change if you change the code or compiler version. What you probably want to do is to capture its address at runtime and somehow make it available to the other task. This can be be tricky, however: it's possible for the other task to try to use the local before your task starts up, and nothing prevents other tasks from getting to it.
A slightly cleaner approach would be to provide the address to the other task through a queue shard by the two tasks.
QueueHandle_t shared_queue;
void common_startup() {
// This code might go in main, or wherever you initialize things.
shared_queue = xQueueCreate(1, sizeof(unsigned *));
}
void task_a() {
// This task wants to share the address of 'local' with task_b.
unsigned local;
// Stuff the address of our local in the queue. We use a
// non-blocking send here because the queue will be empty (nobody
// else puts things in it). In real code you'd probably do some
// error checking!
unsigned * ptr = &local;
xQueueSend(shared_queue, &ptr, 0);
while (1) {
do_stuff();
}
}
void task_b() {
// This task wants to use task_a's 'local'.
// Real code would do error checking!
unsigned * ptr;
xQueueReceive(shared_queue, &ptr, portMAX_DELAY);
while (1) {
// Just increment the shared variable
(*ptr)++;
}
}

Passing Data through the Stack

I wanted to see if you could pass struct through the stack and I manage to get a local var from a void function in another void function.
Do you guys thinks there is any use to that and is there any chance you can get corrupted data between the two function call ?
Here's the Code in C (I know it's dirty)
#include <stdio.h>
typedef struct pouet
{
int a,b,c;
char d;
char * e;
}Pouet;
void test1()
{
Pouet p1;
p1.a = 1;
p1.b = 2;
p1.c = 3;
p1.d = 'a';
p1.e = "1234567890";
printf("Declared struct : %d %d %d %c \'%s\'\n", p1.a, p1.b, p1.c, p1.d, p1.e);
}
void test2()
{
Pouet p2;
printf("Element of struct undeclared : %d %d %d %c \'%s\'\n", p2.a, p2.b, p2.c, p2.d, p2.e);
p2.a++;
}
int main()
{
test1();
test2();
test2();
return 0;
}
Output is :
Declared struct : 1 2 3 a '1234567890'
Element of struct undeclared : 1 2 3 a '1234567890'
Element of struct undeclared : 2 2 3 a '1234567890'
Contrary to the opinion of the majority, I think it can work out in most of the cases (not that you should rely on it, though).
Let's check it out. First you call test1, and it gets a new stack frame: the stack pointer which signifies the top of the stack goes up. On that stack frame, besides other things, memory for your struct (exactly the size of sizeof(struct pouet)) is reserved and then initialized. What happens when test1 returns? Does its stack frame, along with your memory, get destroyed?
Quite the opposite. It stays on the stack. However, the stack pointer drops below it, back into the calling function. You see, this is quite a simple operation, it's just a matter of changing the stack pointer's value. I doubt there is any technology that clears a stack frame when it is disposed. It's just too costy a thing to do!
What happens then? Well, you call test2. All it stores on the stack is just another instance of struct pouet, which means that its stack frame will most probably be exactly the same size as that of test1. This also means that test2 will reserve the memory that previously contained your initialized struct pouet for its own variable Pouet p2, since both variables should most probably have the same positions relative to the beginning of the stack frame. Which in turn means that it will be initialized to the same value.
However, this setup is not something to be relied upon. Even with concerns about non-standartized behaviour aside, it's bound to be broken by something as simple as a call to a different function between the calls to test1 and test2, or test1 and test2 having stack frames of different sizes.
Also, you should take compiler optimizations into account, which could break things too. However, the more similar your functions are, the less chances there are that they will receive different optimization treatment.
Of course there's a chance you can get corrupted data; you're using undefined behavior.
What you have is undefined behavior.
printf("Element of struct undeclared : %d %d %d %c \'%s\'\n", p2.a, p2.b, p2.c, p2.d, p2.e);
The scope of the variable p2 is local to function test2() and as soon as you exit the function the variable is no more valid.
You are accessing uninitialized variables which will lead to undefined behavior.
The output what you see is not guaranteed at all times and on all platforms. So you need to get rid of the undefined behavior in your code.
The data may or may not appear in test2. It depends on exactly how the program was compiled. It's more likely to work in a toy example like yours than in a real program, and it's more likely to work if you turn off compiler optimizations.
The language definition says that the local variable ceases to exist at the end of the function. Attempting to read the address where you think it was stored may or may produce a result; it could even crash the program, or make it execute some completely unexpected code. It's undefined behavior.
For example, the compiler might decide to put a variable in registers in one function but not in the other, breaking the alignment of variables on the stack. It can even do that with a big struct, splitting it into several registers and some stack — as long as you don't take the address of the struct it doesn't need to exist as an addressable chunk of memory. The compiler might write a stack canary on top of one of the variables. These are just possibilities at the top of my head.
C lets you see a lot behind the scenes. A lot of what you see behind the scenes can completely change from one production compilation or run to the next.
Understanding what's going on here is useful as a debugging skill, to understand where values that you see in a debugger might be coming from. As a programming technique, this is useless since you aren't making the computer accomplish any particular result.
Just because this works for one compiler doesn't mean that it will for all. How uninitialized variables are handled is undefined and one computer could very well init pointers to null etc without breaking any rules.
So don't do this or rely on it. I have actually seen code that depended on functionality in mysql that was a bug. When that was fixed in later versions the program stopped working. My thoughts about the designer of that system I'll keep to myself.
In short, never rely on functionality that is not defined. If you knowingly use it for a specific function and you are prepared that an update to the compiler etc can break it and you keep an eye out for this at all times it might be something you could explain and live with. But most of the time this is far from a good idea.

"EXC_BAD_ACCESS" vs "Segmentation fault". Are both same practically?

In my first few dummy apps(for practice while learning) I have come across a lot of EXC_BAD_ACCESS, that somehow taught me Bad-Access is : You are touching/Accessing a object that you shouldn't because either it is not allocated yet or deallocated or simply you are not authorized to access it.
Look at this sample code that has bad-access issue because I am trying to modify a const :
-(void)myStartMethod{
NSString *str = #"testing";
const char *charStr = [str UTF8String];
charStr[4] = '\0'; // bad access on this line.
NSLog(#"%s",charStr);
}
While Segmentation fault says : Segmentation fault is a specific kind of error caused by accessing memory that “does not belong to you.” It’s a helper mechanism that keeps you from corrupting the memory and introducing hard-to-debug memory bugs. Whenever you get a segfault you know you are doing something wrong with memory (more description here.
I wanna know two things.
One, Am I right about objective-C's EXC_BAD_ACCESS ? Do I get it right ?
Second, Are EXC_BAD_ACCESS and Segmentation fault same things and Apple has just improvised its name?
No, EXC_BAD_ACCESS is not the same as SIGSEGV.
EXC_BAD_ACCESS is a Mach exception (A combination of Mach and xnu compose the Mac OS X kernel), while SIGSEGV is a POSIX signal. When crashes occur with cause given as EXC_BAD_ACCESS, often the signal is reported in parentheses immediately after: For instance, EXC_BAD_ACCESS(SIGSEGV). However, there is one other POSIX signal that can be seen in conjunction with EXC_BAD_ACCESS: It is SIGBUS, reported as EXC_BAD_ACCESS(SIGBUS).
SIGSEGV is most often seen when reading from/writing to an address that is not at all mapped in the memory map, like the NULL pointer, or attempting to write to a read-only memory location (as in your example above). SIGBUS on the other hand can be seen even for addresses the process has legitimate access to. For instance, SIGBUS can smite a process that dares to load/store from/to an unaligned memory address with instructions that assume an aligned address, or a process that attempts to write to a page for which it has not the privilege level to do so.
Thus EXC_BAD_ACCESS can best be understood as the set of both SIGSEGV and SIGBUS, and refers to all ways of incorrectly accessing memory (whether because said memory does not exist, or does exist but is misaligned, privileged or whatnot), hence its name: exception – bad access.
To feast your eyes, here is the code, within the xnu-1504.15.3 (Mac OS X 10.6.8 build 10K459) kernel source code, file bsd/uxkern/ux_exception.c beginning at line 429, that translates EXC_BAD_ACCESS to either SIGSEGV or SIGBUS.
/*
* ux_exception translates a mach exception, code and subcode to
* a signal and u.u_code. Calls machine_exception (machine dependent)
* to attempt translation first.
*/
static
void ux_exception(
int exception,
mach_exception_code_t code,
mach_exception_subcode_t subcode,
int *ux_signal,
mach_exception_code_t *ux_code)
{
/*
* Try machine-dependent translation first.
*/
if (machine_exception(exception, code, subcode, ux_signal, ux_code))
return;
switch(exception) {
case EXC_BAD_ACCESS:
if (code == KERN_INVALID_ADDRESS)
*ux_signal = SIGSEGV;
else
*ux_signal = SIGBUS;
break;
case EXC_BAD_INSTRUCTION:
*ux_signal = SIGILL;
break;
...
Edit in relation to another of your questions
Please note that exception here does not refer to an exception at the level of the language, of the type one may catch with syntactical sugar like try{} catch{} blocks. Exception here refers to the actions of a CPU on encountering certain types of mistakes in your program (they may or may not be be fatal), like a null-pointer dereference, that require outside intervention.
When this happens, the CPU is said to raise what is commonly called either an exception or an interrupt. This means that the CPU saves what it was doing (the context) and deals with the exceptional situation.
To deal with such an exceptional situation, the CPU does not start executing any "exception-handling" code (catch-blocks or suchlike) in your application. It first gives the OS control, by starting to execute a kernel-provided piece of code called an interrupt service routine. This is a piece of code that figures out what happened to which process, and what to do about it. The OS thus has an opportunity to judge the situation, and take the action it wants.
The action it does for an invalid memory access (such as a null pointer dereference) is to signal the guilty process with EXC_BAD_ACCESS(SIGSEGV). The action it does for a misaligned memory access is to signal the guilty process with EXC_BAD_ACCESS(SIGBUS). There are many other exceptional situations and corresponding actions, not all of which involve signals.
We're now back in the context of your program. If your program receives the SIGSEGV or SIGBUS signals, it will invoke the signal handler that was installed for that signal, or the default one if none was. It is rare for people to install custom handlers for SIGSEGV and SIGBUS and the default handlers shut down your program, so what you usually get is your program being shut down.
This sort of exceptions is therefore completely unlike the sort one throws in try{}-blocks and catch{}es. Those exceptions are handled purely within the application, without involving the OS at all. Here what happens is that a throw statement is simply a glorified jump to the inner-most catch block that handles that exception. As the exception bubbles through the stack, it unwinds the stack behind it, running destructors and suchlike as needed.
Basically yes, indeed an EXC_BAD_ACCESS is usually paired with a SIGSEGV which is a signal that warns about the segmentation failure.
A segmentation failure is risen whenever you are working with a pointer that points to invalid data (maybe not belonging to the process, maybe read-only, maybe an invalid address in general).
Don't think about the segmentation fault in terms of "accessing an object", you are accessing a memory location, so an address. That address must be considered coherent by the OS memory protection system.
Not all errors which are related to accessing invalid data can be tracked by the memory manager, think about a pointer to a stack allocated variable, which is considered valid although its content is not valid anymore upon restoring the stack frame.

Resources