helgrind says "Bug in libpthread" - pthreads

I am running helgrind to check for data races in my program. Helgrind reports 222 errors, all of them are:
Thread #21: Bug in libpthread: pthread_cond_wait succeeded without prior pthread_cond_post
I could not find anything about this error message on google. Within the valgrind source code, this seems to originate here:
if (!timeout && !libhb_so_everSent(cvi->so)) {
/* Hmm. How can a wait on 'cond' succeed if nobody signalled
it? If this happened it would surely be a bug in the threads
library. Or one of those fabled "spurious wakeups". */
HG_(record_error_Misc)( thr, "Bug in libpthread: pthread_cond_wait "
"succeeded"
" without prior pthread_cond_post");
}
However, I can not believe that I got 222 spurious wake-ups in about a second.
What could be the cause of this?
There are two condition variables, both in shared memory. The error always seems to happen with one of them, not with the other.

Related

How to break out of an assert in iOS / swift

I've hit an assertion in code and wondering if there's a way to create a wrapper around the assert that would enable breaking out and continuing execution or some other function that would enable a way to suppress the assert through the lldb debugger.
assert(condition(), makeCriticalEvent().event.description, file: fileID, line: UInt(line))
This is the standard assertion in apples libraries here. When I hit this assertion I tried continuing execution but it stays stuck at the assertion. I'd like to silence the assertion (likely through the lldb debugger by typing some command). Anyone have an idea how to do this?
You have to do two things. First, you have to tell lldb to suppress the SIGABRT signal that the assert delivers. Do this by running:
(lldb) process handle SIGABRT -p 0
in lldb. Normally SIGABRT is not maskable, so I was a little surprised this worked. Maybe because this is a SIGABRT the process sends itself? I don't think there's any guarantee suppressing SIGABRT's has to work in the debugger so YMMV, but it seems to currently. Anyway, do this when you've hit the assert.
Then you need to forcibly unwind the assert part of the stack. You can do that using thread return, which returns to the thread above the currently selected one w/o executing the code in that frame or any of the others below it. So just select the frame that caused the assert, go down one frame on the stacks and do thread return.
Now when you continue you won't hit the abort and you'll be back in your code.

How do I debug a memory issue in Rust?

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

iOS: convert stacktrace entry to method name with line number

In a production app with the debug information stripped out, how do you convert the output of:
NSLog(#"Stack Trace: %#", [exception callStackSymbols]);
To an legible class and method name? A line number would be a blessing.
Here's the output I'm getting:
0 CoreFoundation 0x23d82f23 <redacted> + 154
1 libobjc.A.dylib 0x23519ce7 objc_exception_throw + 38
2 CoreFoundation 0x23cb92f1 <redacted> + 176
3 MyApp 0x23234815 MyApp + 440341
The final line is the bread and butter line, but when I use dwarf to find the address, nothing exists.
dwarfdump --arch armv7 MyApp.dSYM --lookup 0x00234815 | grep 'Line table'
I've read here that you need to convert the stack address to something else for dwarf or atos:
https://stackoverflow.com/a/12464678/2317728
How would I find the load address or slide address to perform the calculation? Is there not a way to calculate all this prior to sending the stacktrace to the log from within the app? If not, how would I determine and calculate these values after receiving the stack trace? Better yet, is there an easier solution I'm missing?
Note I cannot just wait for crash reports since the app is small and they will never come. I'm planning on sending the stack traces to our server to be fixed as soon as they appear.
EDITORIAL
The crash reporting tools in iOS are very rough, particularly when compared to Android. In android, the buggy lines are sent to Google analytics, you use the map to debug the line--simple (comparatively). For iOS, you are confronted with: a) waiting on user bug reports (not reasonable for a small app), b) sending stack traces to a server where there are scant tools or information on how to symbolicate the stack traces, c) relying on large quasi-commercial 3rd party libraries. This definitely makes it harder to build and scale up--hoping Apple will eventually take notice. Even more hopeful someone has spotted an easier solution I might have missed ;)
Thanks for your help!
A suggestion, you can easily get the method name, exception reason and line number using:
NSLog(#"%# Exception in %s on %d due to %#",[exception name],__PRETTY_FUNCTION__,__LINE__,[exception reason]);

Assertion failed: xdrPtr && xdrPtr == *xdrLPP, file xx.cpp, line 2349

Have a system build using C++ Builder 2010 that after running for about 20 hours it starts firing of assertion failures.
Assertion failed: xdrPtr && xdrPtr == *xdrLPP, file xx.cpp, line 2349
Tried google on it like crazy but not much info. Some people seem to refer a bunch of different assertions in xx.cpp to shortcomings in the exception handling in C++ Builder. But I haven't found anything referencing this particular line in the file.
We have integrated madExcept and it seems like somewhere along the way this catches an out of memory exception, but not sure if it's connected. No matter what an assertion triggering doesn't seem correct.
Edit:
I found an instance of a if-statement that as part of it's statement used a function that could throw an exception. I wonder if this could be the culprit somehow messing up the flow of the exception handling or something?
Consider
if(foo() == 0) {
...
}
wrapped in a try catch block.
If an exception is thrown from within foo() so that no int is returned here how will the if statement react? I'm thinking it still might try to finish executing that line and this performing the if check on the return of the function which will barf since no int was returned. Is this well defined or is this undefined behaviour?
Wouldn't
int fooStatus = foo();
if(fooStatus == 0) {
...
}
be better (or should I say safer)?
Edit 2:
I just managed to get the assertion on my dev machine (the application just standing idle) without any exception about memory popping up and the app only consuming around 100 mb. So they were probably not connected.
Will try to see if I can catch it again and see around where it barfs.
Edit 3:
Managed to catch it. First comes an assertion failure notice like explained. Then the debugger shows me this exception notification.
If I break it takes me here in the code
It actually highlights the first code line after
pConnection->Open();
But it seems I can change this to anything and that line is still highlighted. So my guess is that the error is in the code above it somehow. I have seen more reports about people getting this type of assertion failure when working with databases in RAD Studio... hmmmm.
Update:
I found a thread that recursively called it's own Execute function if it wasn't able to reach the DB server. I think this is at least part of the issue. This will just keep on trying and as more and more worker threads spawn and also keep trying it can only end in disaster.
If madExcept is hinting that you have an out of memory condition, the assert could fail if the pointers are NULL (i.e. the allocation failed). What are the values of xdrPtr and xdrLPP when the assert occurs? Can you trace back to where they are allocated?
I would start looking for memory leaks.

pthread_create and EAGAIN

I got an EAGAIN when trying to spawn a thread using pthread_create. However, from what I've checked, the threads seem to have been terminated properly.
What determines the OS to give EAGAIN when trying to create a thread using pthread_create? Would it be possible that unclosed sockets/file handles play a part in causing this EAGAIN (i.e they share the same resource space)?
And lastly, is there any tool to check resource usage, or any functions that can be used to see how many pthread objects are active at the time?
Okay, found the answer. Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable.
Putting a pthread_join(tid, NULL) in the end did the trick.
edit (was not waitpid, but rather pthread_join)
As a practical matter EAGAIN is almost always related to running out of memory for the process. Often this has to do with the stack size allocated for the thread which you can adjust with pthread_attr_setstacksize(). But there are process limits to how many threads you can run. You can query the hard and soft limits with getrlimit() using RLIMIT_NPROC as the first parameter.
There are quite a few questions here dedicated to keeping track of threads, their number, whether they are dead or alive, etc. Simply put, the easiest way to keep track of them is to do it yourself through some mechanism you code, which can be as simple as incrementing and decrementing a global counter (protected by a mutex) or something more elaborate.
Open sockets or other file descriptors shouldn't cause pthread_create() to fail. If you reached the maximum for descriptors you would have already failed before creating the new thread and the new thread would have already have had to be successfully created to open more of them and thus could not have failed with EAGAIN.
As per my observation if one of the parent process calls pthread_join(), and chilled processes are trying to release the thread by calling pthread_exit() or pthread_cancel() then system is not able to release that thread properly. In that case, if pthread_detach() is call immediately after successful call of pthread_create() then this problem has been solved. A snapshot is here -
err = pthread_create(&(receiveThread), NULL, &receiver, temp);
if (err != 0)
{
MyPrintf("\nCan't create thread Reason : %s\n ",(err==EAGAIN)?"EAGAUIN":(err==EINVAL)?"EINVAL":(err==EPERM)?"EPERM":"UNKNOWN");
free(temp);
}
else
{
threadnumber++;
MyPrintf("Count: %d Thread ID: %u\n",threadnumber,receiveThread);
pthread_detach(receiveThread);
}
Another potential cause: I was getting this problem (EAGAIN on pthread_create) because I had forgotten to call pthread_attr_init on the pthread_attr_t I was trying to initialize my thread with.

Resources