Drake -- Coredump in Simulation When There is a Contact - drake

I have a Kuka arm and some objects set up in my simulation(very similar to manipulation station example), and I have been running into coredump error below whenever there is a contact between the robot and the objects.
"abort: Failure at multibody/plant/multibody_plant.cc:1640 in CalcImplicitStribeckResults(): condition 'info == ImplicitStribeckSolverResult::kSuccess' failed.
Aborted (core dumped)"
Decreasing the integration step size for the simulator did not help, so I ended up tracing back the error and commented out the condition that is causing the error( "DRAKE_DEMAND(info == ImplicitStribeckSolverResult::kSuccess);" ), which seems to coredump a lot less often.
However, I am guessing that condition is there for a reason, so would commenting the line out cause any other issues in the simulation? What is the proper way to fix the coredump problem?

In Drake PR #12503, the ImplicitStribeck code was refactored to reflect the notation in the TAMSI arXiv paper, and in #12361 it was changed to provide a more helpful exception with troubleshooting tips:
https://github.com/RobotLocomotion/drake/blob/v0.15.0/multibody/plant/multibody_plant.cc#L1866-L1878
Can you try out a later version (e.g. 0.15.0), and then try out the troubleshooting instructions there? (You've already tried changing step size in the simulator, but you may want to check on the stiffness of your overall system, etc.)

Related

AKTubularBells() and AKRhodesPiano() together cause error on the second

I'm using AKRhodesPiano() and AkTubularBells(). Both work alone. When I try to initialize both, I get the following error.
AKRhodesPiano.swift:init(frequency:amplitude:):88:Parameter Tree Failed
Notably, if I change the order of initialization, the error occurs for the last one of the two instantiated.
Adding the following line to the AKTubularBells playground right under the initialization of AKTubularBells is enough to trigger the error.
let tubularBells = AKTubularBells()
let temp = AKRhodesPiano() /// <- Add this line.
I saw in another post AKRhodesPiano error (crush) on AudioKit v4.2 that there was a recent error in the STK Physical models, so perhaps this is part of that. Any insight appreciated as always.
Thanks for noticing this, it only occurred when using those two nodes simultaneously, but it was basically just a cut-and-paste job gone bad. I fixed it on develop, so if you can rebuild the framework you'll be fine or else wait for the next release which should be soon.
Here's the fix:
https://github.com/AudioKit/AudioKit/commit/05651ff97a7ea7815a27de6a53eee0b5f7998920

How do I debug a memory issue in Rust?

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

Assertion failed: xdrPtr && xdrPtr == *xdrLPP, file xx.cpp, line 2349

Have a system build using C++ Builder 2010 that after running for about 20 hours it starts firing of assertion failures.
Assertion failed: xdrPtr && xdrPtr == *xdrLPP, file xx.cpp, line 2349
Tried google on it like crazy but not much info. Some people seem to refer a bunch of different assertions in xx.cpp to shortcomings in the exception handling in C++ Builder. But I haven't found anything referencing this particular line in the file.
We have integrated madExcept and it seems like somewhere along the way this catches an out of memory exception, but not sure if it's connected. No matter what an assertion triggering doesn't seem correct.
Edit:
I found an instance of a if-statement that as part of it's statement used a function that could throw an exception. I wonder if this could be the culprit somehow messing up the flow of the exception handling or something?
Consider
if(foo() == 0) {
...
}
wrapped in a try catch block.
If an exception is thrown from within foo() so that no int is returned here how will the if statement react? I'm thinking it still might try to finish executing that line and this performing the if check on the return of the function which will barf since no int was returned. Is this well defined or is this undefined behaviour?
Wouldn't
int fooStatus = foo();
if(fooStatus == 0) {
...
}
be better (or should I say safer)?
Edit 2:
I just managed to get the assertion on my dev machine (the application just standing idle) without any exception about memory popping up and the app only consuming around 100 mb. So they were probably not connected.
Will try to see if I can catch it again and see around where it barfs.
Edit 3:
Managed to catch it. First comes an assertion failure notice like explained. Then the debugger shows me this exception notification.
If I break it takes me here in the code
It actually highlights the first code line after
pConnection->Open();
But it seems I can change this to anything and that line is still highlighted. So my guess is that the error is in the code above it somehow. I have seen more reports about people getting this type of assertion failure when working with databases in RAD Studio... hmmmm.
Update:
I found a thread that recursively called it's own Execute function if it wasn't able to reach the DB server. I think this is at least part of the issue. This will just keep on trying and as more and more worker threads spawn and also keep trying it can only end in disaster.
If madExcept is hinting that you have an out of memory condition, the assert could fail if the pointers are NULL (i.e. the allocation failed). What are the values of xdrPtr and xdrLPP when the assert occurs? Can you trace back to where they are allocated?
I would start looking for memory leaks.

Random bad_object_header mnesia/dets error

I am having a very weird error with mnesia. I have about 10 tables that mnesia is recording, and usually it works fine. However, in a certain place in my code, whenever I try to read from a particular table (trying to read from other tables is fine) I get a DETS error.
I reduced my code to
{atomic, ok} = mnesia:transaction(fun() ->
[Entry] = mnesia:read(table_name, Key),
ok
end)
I have a try/catch block around the transaction, and the error I get is this:
error:{badmatch,
{aborted,
{{badmatch,
{error,
{bad_object_header,
"/path/to/table_name.DAT"}}},
[{callback,
'-handle/2-fun-0-',
1,
[{file,
"src/src.erl"},
{line,
234}]},
{mnesia_tm,
apply_fun,
3,
[{file,
"mnesia_tm.erl"},
{line,
830}]},
{mnesia_tm,
execute_transaction,
5,
[{file,
"mnesia_tm.erl"},
{line,
810}]},
]}}}
Unfortunately I cannot reproduce the error with a short example. Even if I call the function from the REPL, it doesn't error. It only errors when it happens in my actual code. But it does happen reliably every time.
If I take out the mnesia:read line, everything works fine. I have tried remaking the schema and the tables, and that didn't help. It's really weird because my code goes on later to use the table successfully. It is only if it's used from this one place that it fails.
What could be going wrong?
Update
I experimented some more and it seems that the error only happens when two of these transactions happen (in different processes) nearly simultaneously. Isn't mnesia meant to be used in this way?
Update 2
Turned out that the problem was fixed by downgrading my erlang installation on Arch Linux to R16B-3 from R16B-6. Hopefully this bug will be ironed out soon.
The symptoms means that either that the file operation to read at a specific part of the file runs out of memory or you are trying to read from an non existing position in the file. So if you do not run out of memory (which you should have noticed), it is likely that there is a race condition in the handling of that file by DETS.
I've got the same error from time to time. It occurs since I did an upgrade on my Debian server maybe one or two months ago.
Here is my error:
Error in process <0.84.0> on node 'yaws#overnux' with exit value: {{case_clause,{error,{bad_object_header,"/var/www/d-lan/db/d_lan_downloads_count.dets"}}},[{d_lan_db,loop,0,[]},{string,strip,1,[]}]}
I think it's an Erlang regression because I didn't change the code for a long time and it was working fine before the upgrade.
I'm only using DETS, not Mnesia. I have no concurrent access to the file.
Here is my code, it's very simple: https://github.com/Ummon/D-LAN/blob/website/modules/erl/d_lan_db.erl#L103

How to track down "incorrect checksum for freed object"

I have spent quite some time trying to trace this problem and read multiple suggestion from others with the same problem. I deal with a large code base so finding the problem without some hints is like looking for a needle in a hay stack.
On of the suggestion I read is to add a break point on *malloc_error_break* - but how do I do that. I understand that I have to add a symbolic break point but I'm not sure what exactly to enter in the two text fields, Symbol and Module?
I tried to enable Malloc Scribble and Malloc Guard Edges - but none of it results in any break point or crashes.
If I enable Zombie Objects the programs stops crashing but there is nothing in the output log showing any problems.
Finally I tried to enable Guard Malloc. I understand that it only works with the simulator so I tries that - but the problem is that the programs crashes in the start up phase before any line in my program is executed:
0x958e0cd4 <+0000> mov 0x4(%esp),%eax
0x958e0cd8 <+0004> mov %gs:0x0(,%eax,4),%eax < Crash
0x958e0ce0 <+0012> ret
and the call stack looks like this:
pthread_getspecific
__dyld__dyld_start
I'm not sure what I'm doing wrong here?
To add a breakpoint on malloc_error_break, simply stop in the debugger and type b malloc_error_break at the "gdb" or "lldb" prompt.

Resources