How do I debug a memory issue in Rust? - memory

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?

In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.

Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle

In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

Related

Drake -- Coredump in Simulation When There is a Contact

I have a Kuka arm and some objects set up in my simulation(very similar to manipulation station example), and I have been running into coredump error below whenever there is a contact between the robot and the objects.
"abort: Failure at multibody/plant/multibody_plant.cc:1640 in CalcImplicitStribeckResults(): condition 'info == ImplicitStribeckSolverResult::kSuccess' failed.
Aborted (core dumped)"
Decreasing the integration step size for the simulator did not help, so I ended up tracing back the error and commented out the condition that is causing the error( "DRAKE_DEMAND(info == ImplicitStribeckSolverResult::kSuccess);" ), which seems to coredump a lot less often.
However, I am guessing that condition is there for a reason, so would commenting the line out cause any other issues in the simulation? What is the proper way to fix the coredump problem?
In Drake PR #12503, the ImplicitStribeck code was refactored to reflect the notation in the TAMSI arXiv paper, and in #12361 it was changed to provide a more helpful exception with troubleshooting tips:
https://github.com/RobotLocomotion/drake/blob/v0.15.0/multibody/plant/multibody_plant.cc#L1866-L1878
Can you try out a later version (e.g. 0.15.0), and then try out the troubleshooting instructions there? (You've already tried changing step size in the simulator, but you may want to check on the stiffness of your overall system, etc.)

ASAN doesn't report memory leak for glib's GPtrArray related functions

I find ASAN doesn't report memory leak for glib's GPtrArray related functions. For example:
$ cat test_asan.c
#include <glib.h>
int main()
{
GPtrArray *gparray = g_ptr_array_new_with_free_func(g_free);
g_ptr_array_add(gparray, g_strdup("--"));
}
Build and run this file:
$ clang -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fsanitize=address -g test_asan.c -o test_asan -lglib-2.0
$ ./test_asan
$
Nothing is reported. But in fact, the above program forget to call g_ptr_array_free (gparray, TRUE); at the end of main function.
Anyone can give some explanation of this behaviour? Or I missed something?
LeakSanitizer is using a simple algorithm which scans stack frames, registers, global and thread-local variables and reachable heap allocations for data which looks like memory addresses. This happens without compiler feedback so unrelated, random or stale data may cause lost memory to be considered reachable. The algorithm is thus imprecise and is known to generate false negatives (see e.g. here or here) which in practice depend on compiler/library version and/or flags.
Usually issues like the one you describe happen when stale contents of main's frame (gparray in your case) happens to not be overwritten when LSan scan starts.
As this a design issue, there isn't much you can do about it.

Erlang: No Crash Dump

I'm running ejabberd, and every so often it crashes. To figure out why it crashed, I know to look in the erl_crash.dump. The problem is, there doesn't seem to be any erl_crash.dump file. There is a core dump file though. Loading it into gdb and running "bt full," here are the top two frames:
(gdb) bt full
#0 0x000000000054df83 in prepare_crash_dump (secs=<optimized out>) at sys/unix/sys.c:735
max = <optimized out>
env = "\005", '\000' <repeats 15 times>"\200, \373!ڴ"
heart_port = 0x7fb46f31eab0
hp = 0x7fb4d6efb938
heart_fd = {865035, -1}
has_heart = 0
i = <optimized out>
envsz = <optimized out>
heap = {4460060, 140412855877120, 1}
list = 18446744073709551611
#1 erts_sys_prepare_crash_dump (secs=<optimized out>) at sys/unix/sys.c:780
So, it appears that it crashed while it was trying to write the crash dump, but didn't get all the way. I did some research, and it sounds a lot like a problem that had been posted earlier (https://groups.google.com/forum/#!msg/erlang-programming/XH2Uly6hsLY/aeR2Yx2UkZMJ). Heart was not enabled on the command line, which means this shouldn't be the problem, but... in the core dump, heart_port is set to something non-null. This should mean that heart is lurking somewhere, shouldn't it? If so, is there a way to tell heart to really not run?
This is the erlang VM crashing, not an erlang process crashing, so there is no erl_crash.dump generated. From my experience, I suspect it did not core in prepare_crash_dump, but that you have the wrong binaries loading into gdb. If you are not debugging on the system that crashed, you should copy the erlang binaries down and point GDB to them.
In erts 8.0 you have: Make sure to create a crash dump when running out of memory. This was accidentally removed in the erts-7.3 release.
So in case your VM is affected by this bug and it's crashing for this reason it won't generate.

iOS Patch program instruction at runtime

How would one go about modifying individual assembly instructions in an application while it is running?
I have a Mobile Substrate tweak that I am writing for an existing application. In the tweak's constructor (MSInitialize), I need to be able to rewrite individual instruction(s) in the app's code. What I mean by this is that there may be multiple places in the application's address space that I wish to modify, but in each instance, only a single instruction needs to be modified. I have already disabled ASLR for the application and know the exact memory address of the instruction to be patched, and I have the hex bytes (as a char[], but this is uninportant and can be changed if necessary) of the new instruction. I just need to figure out how to perform the change.
I know that iOS uses Data Execution Prevention (DEP) to specify that executable memory pages cannot also be writeable and vice versa, but I know that it is possible to bypass this on a jailbroken device. I also know that the ARM processor used by iDevices has an instruction cache that needs to be updated to reflect the change. However, I do not even know where to begin to do this.
So, to answer the question that would surely otherwise be asked, I have not tried anything. This is not because I am lazy; rather, it is because I have absolutely no clue how this could be accomplished. Any help at all would be greatly appreciated.
Edit:
If it helps at all, my ultimate goal is to use this in a Mobile Substrate tweak that hooks an App Store application. Previously, in order to mod this application, one would have to first crack it to decrypt the app so the binary could be patched. I want to make it so people wouldn't have to crack the app, since that can lead to piracy which I am strongly against. I can't use Mobile Substrate normally because all of the work is done in C++, not Objective-C, and the application is stripped, leaving no symbols to use MSHookFunction on.
Completely forgot I asked this question, so I'll show what I ended up with now. The comments should explain how and why it works.
#include <stdio.h>
#include <stdbool.h>
#include <mach/mach.h>
#include <libkern/OSCacheControl.h>
#define kerncall(x) ({ \
kern_return_t _kr = (x); \
if(_kr != KERN_SUCCESS) \
fprintf(stderr, "%s failed with error code: 0x%x\n", #x, _kr); \
_kr; \
})
bool patch32(void* dst, uint32_t data) {
mach_port_t task;
vm_region_basic_info_data_t info;
mach_msg_type_number_t info_count = VM_REGION_BASIC_INFO_COUNT;
vm_region_flavor_t flavor = VM_REGION_BASIC_INFO;
vm_address_t region = (vm_address_t)dst;
vm_size_t region_size = 0;
/* Get region boundaries */
if(kerncall(vm_region(mach_task_self(), &region, &region_size, flavor, (vm_region_info_t)&info, (mach_msg_type_number_t*)&info_count, (mach_port_t*)&task))) return false;
/* Change memory protections to rw- */
if(kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY))) return false;
/* Actually perform the write */
*(uint32_t*)dst = data;
/* Flush CPU data cache to save write to RAM */
sys_dcache_flush(dst, sizeof(data));
/* Invalidate instruction cache to make the CPU read patched instructions from RAM */
sys_icache_invalidate(dst, sizeof(data));
/* Change memory protections back to r-x */
kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_EXECUTE | VM_PROT_READ));
return true;
}
vm_protect to w^x, assuming you're jailbroken with a decent jailbreak (e.g. if mobilesubstrate works)
Writing to instruction memory from processor registers is, as others say above, a bit tricky. Especially with iPhones, since Apple tries to keep the processor details secret.
Permissions on memory access are the first problem. Executable memory is not normally writable. However, if this is overcome, then there is a little dance to go through to get data out of the processor registers and into the instruction pipeline. In general, there are synchronisation instructions, which force a specific order on the memory accesses before and after them, and cache commands, which force dirty write data out to memory and flush out clean and possibly stale read data. Both of these are highly dependent on the detailed implementation of the processor.
Arm Has nice manuals on the web that explain these in detail for specific processors. However, whether the processors inside iPhones do what the public Arm manuals say, I have no idea.
Here's a place to start understanding the Arm memory synchronisation model for one processor:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/ch04s03s04.html
and that goes on to tell how to flush the instruction cache by a write to a control register. It certainly is possible to write self-modifying code for Arm processors because somewhere in that manual I found a statement that said that it is sometimes unavoidable and the has to be supported.
(I'm not claiming this is an answer. But it wouldn't fit in a comment.)

Debugging Erlang heart timeouts

I use the heart program to restart an Erlang node when it becomes unresponsive. However, I am finding it hard to understand why the node freezes. SASL logs don't show any errors, and my own logs don't seem to show anything remarkable happening at those times. Can anybody give advice on debugging this sort of thing?
By default the heart program issues a SIGKILL to kill off the unresponsive VM so it can quickly start a new one. This makes getting any useful information about the VM pretty much impossible. Something I've tried in the past is to patch the heart program to avoid the hard kill and instead get the VM to create a crash dump and a coredump. I used a patch like this (this one is for Erlang/OTP R14B02):
--- erts/etc/common/heart.c.orig 2011-04-17 12:11:24.000000000 -0400
+++ erts/etc/common/heart.c 2011-04-17 12:12:36.000000000 -0400
## -559,10 +559,11 ##
int res;
if(heart_beat_kill_pid != 0){
pid = (pid_t) heart_beat_kill_pid;
- res = kill(pid,SIGKILL);
+ res = kill(pid,SIGUSR1);
+ sleep(4);
for(i=0; i < 5 && res == 0; ++i){
sleep(1);
- res = kill(pid,SIGKILL);
+ res = kill(pid,i < 2 ? SIGQUIT : SIGKILL);
}
if(errno != ESRCH){
print_error("Unable to kill old process, "
As you can see, with this patch heart will first issue a SIGUSR1 to try to get the VM to create a crash dump. Since this can take awhile, heart then sleeps for 4 seconds. You might have to increase this sleep time if you're not getting full crash dumps. After that, heart then tries twice to issue a SIGQUIT with the hope of getting a coredump, and if that fails, issues a SIGKILL.
Note that this patch will slow down heart's VM restart due to the time required to wait for the crash dumps and coredumps. If you use it in production, be aware of this limitation.
You could try to call erlang:halt/1 from your HEART_COMMAND thus creating a crash dump from the unresponsive node.
You can try using the erl_call tool with e.g. -a erlang halt 123.
If the erlang node can't respond to this is also interesting information.
Did you try increasing `HEART_BEAT_TIMEOUT? Maybe the node is just bogged down a bit an misses the timeout but doesn't freeze.
If you have any idea of why it is freezing you could try to trace the module using dbg.
http://www.erlang.org/doc/man/dbg.html
In short try
dbg:tracer(), dbg:p(all,c), dbg:tpl(Module, Function, x).
If you want to stop this tracing issue
dbg:ctpl()
See documentation for more info.
Note: Change Module and Function to whatever you want to trace, leave x as it is. You can also skip Function and only give Module, x.
Warning: Running this on a live system can be dangerous as the amount of information that is going to be printed to the shell can be enormous.

Resources