libfuzzer fuzzing harness crash not reproducible - clang

I want to fuzz an existing harness from stbi harness and make a small change. From free(img) to if(img) free(img);
compile with this command clang -fsanitize=fuzzer,address -ggdb -O0 stbi_read_fuzzer.c -o fuzzer, and run with ./fuzzer corpus -fork=1 -ignore_crashes=1 -dict=jpeg.dict -seed=123
After few hours it produce some crash (global buffer overflow, heap use after free, buffer overflow). But when I run all crash file it didn't crash
aldo#vps:~/stb/tests$ ./fuzzer crash-edab9036233c269e258fe93c2a46d46d5d6e7112
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2279336272
INFO: Loaded 1 modules (2132 inline 8-bit counters): 2132 [0x61b510, 0x61bd64),
INFO: Loaded 1 PC tables (2132 PCs): 2132 [0x5d0258,0x5d8798),
./fuzzer: Running 1 inputs 1 time(s) each.
Running: crash-edab9036233c269e258fe93c2a46d46d5d6e7112
Executed crash-edab9036233c269e258fe93c2a46d46d5d6e7112 in 3 ms
***
*** NOTE: fuzzing was not performed, you have only
*** executed the target code on a fixed set of inputs.
***
Why it didn't crash?
I'm using ubuntu 20.04 with llvm12 from apt.llvm.org

Old question but I had a similar issue.
In my case I fuzzed a stateful API and forgot to reset the API at the beginning of LLVMFuzzerTestOneInput. So a previous invocation set the API in a invalid state but it didn't crash right away. Only on second invocation the API did crash.
So my guess would be that similarly in your harness some internal state / global variable was changed in a previous invocation. Try to reset everything.
This is because libFuzzer runs in-process and just calls the LLVMFuzzerTestOneInput function as often as possible. The program won't get re-initialized on its own. This is documented:
The fuzzing engine will execute the fuzz target many times with different inputs in the same process.
Ideally, it should not modify any global state (although that’s not strict).

Related

Uploading Program to OpenMPI gives initialization error, on IntelMPI memory leak

I am a graduate student (master's) and use an in-house code for running my simulations that use MPI. Earlier, I used OpenMPI on a supercomputer we used to access and since it shut down I've been trying to switch to another supercomputer that has Intel MPI installed on it. The problem is, the same code that was working perfectly fine earlier now gives memory leaks after a set number of iterations (time steps). Since the code is relatively large and my knowledge of MPI is very basic, it is proving very difficult to debug it.
So I installed OpenMPI onto this new supercomputer I am using, but it gives the following error message upon execution and then terminates:
Invalid number of PE
Please check partitioning pattern or number of PE
NOTE: The error message is repeated for as many numbers of nodes I used to run the case (here, 8). Compiled using mpif90 with -fopenmp for thread parallelisation.
There is in fact no guarantee that running it on OpenMPI won't give the memory leak, but it is worth a shot I feel, as it was running perfectly fine earlier.
PS: On Intel MPI, this is the error I got (compiled with mpiifort with -qopenmp)
Abort(941211497) on node 16 (rank 16 in comm 0): Fatal error in PMPI_Isend: >Unknown error class, error stack:
PMPI_Isend(152)...........: MPI_Isend(buf=0x2aba1cbc8060, count=4900, dtype=0x4c000829, dest=20, tag=0, MPI_COMM_WORLD, request=0x7ffec8586e5c) failed
MPID_Isend(662)...........:
MPID_isend_unsafe(282)....:
MPIDI_OFI_send_normal(305): failure occurred while allocating memory for a request object
Abort(203013993) on node 17 (rank 17 in comm 0): Fatal error in PMPI_Isend: >Unknown error class, error stack:
PMPI_Isend(152)...........: MPI_Isend(buf=0x2b38c479c060, count=4900, dtype=0x4c000829, dest=21, tag=0, MPI_COMM_WORLD, request=0x7fffc20097dc) failed
MPID_Isend(662)...........:
MPID_isend_unsafe(282)....:
MPIDI_OFI_send_normal(305): failure occurred while allocating memory for a request object
[mpiexec#cx0321.obcx] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:357): write error (Bad file descriptor)
[mpiexec#cx0321.obcx] cmd_bcast_root (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:164): error sending cmd 15 to proxy
[mpiexec#cx0321.obcx] send_abort_rank_downstream (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:557): unable to send response downstream
[mpiexec#cx0321.obcx] control_cb (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1576): unable to send abort rank to downstreams
[mpiexec#cx0321.obcx] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:79): callback returned error status
[mpiexec#cx0321.obcx] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1962): error waiting for event"
I will be happy to provide the code in case somebody is willing to take a look at it. It is written using Fortran with some of the functions written in C. My research progress has been completely halted due to this problem and nobody at my lab has enough experience with MPI to resolve this.

Drake -- Coredump in Simulation When There is a Contact

I have a Kuka arm and some objects set up in my simulation(very similar to manipulation station example), and I have been running into coredump error below whenever there is a contact between the robot and the objects.
"abort: Failure at multibody/plant/multibody_plant.cc:1640 in CalcImplicitStribeckResults(): condition 'info == ImplicitStribeckSolverResult::kSuccess' failed.
Aborted (core dumped)"
Decreasing the integration step size for the simulator did not help, so I ended up tracing back the error and commented out the condition that is causing the error( "DRAKE_DEMAND(info == ImplicitStribeckSolverResult::kSuccess);" ), which seems to coredump a lot less often.
However, I am guessing that condition is there for a reason, so would commenting the line out cause any other issues in the simulation? What is the proper way to fix the coredump problem?
In Drake PR #12503, the ImplicitStribeck code was refactored to reflect the notation in the TAMSI arXiv paper, and in #12361 it was changed to provide a more helpful exception with troubleshooting tips:
https://github.com/RobotLocomotion/drake/blob/v0.15.0/multibody/plant/multibody_plant.cc#L1866-L1878
Can you try out a later version (e.g. 0.15.0), and then try out the troubleshooting instructions there? (You've already tried changing step size in the simulator, but you may want to check on the stiffness of your overall system, etc.)

How do I debug a memory issue in Rust?

I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.

destroying an orphaned process-shared condition variable

Is the behavior of pthread_cond_destroy on an orphaned, process-shared condition variable specified, unspecified, implementation-defined, or undefined? Also, is the behavior I'm seeing on Linux (spelled out below) a bug?
By an "orphaned" cv here I mean one that was in a pthread_cond_wait call at the time of its waiter's death.
Adapting a scenario from this question, I find that if I do this on Linux:
Time Process A Process B Comments
---- --------- --------- --------
1 mmap MAP_ANONYMOUS // or shm_open()
2 init pshared cv
3 init pshared mutex
4 fork ------------------> lock(mutex) // can also re-shm_open()
5 wait... alarm(a_timeout)
6 wait... cond_wait(cv, mutex)
7 wait <------------------ <<ALRM>>
8 cond_signal(cv) // (without this, EBUSY for #9)
9 cond_destroy(cv) // blocks on linux
On Linux, the destroy() (#9) blocks forever. If I omit the signal (#8) to the orphaned cv, then the Linux destroy() returns EBUSY. On OS X, by contrast, the destroy() always returns EBUSY, regardless of signaling or not.
For what it's worth, I do not see this behavior on Linux with process-shared mutexes and cvs in a single multi-threaded process (with the waiting thread cancel()d).
Again, what's spec and what's bug?
According to the spec for pthread_cond_destroy
"It shall be safe to destroy an initialized condition variable upon
which no threads are currently blocked"
As this is exactly the case, i.e. there are no other threads whatsoever that reference or are blocked on the condition variable, the destroy shall by successful.
IMHO, we have bugs in both operating system in that the condition variable object is left in an inconsistent state.

Erlang: No Crash Dump

I'm running ejabberd, and every so often it crashes. To figure out why it crashed, I know to look in the erl_crash.dump. The problem is, there doesn't seem to be any erl_crash.dump file. There is a core dump file though. Loading it into gdb and running "bt full," here are the top two frames:
(gdb) bt full
#0 0x000000000054df83 in prepare_crash_dump (secs=<optimized out>) at sys/unix/sys.c:735
max = <optimized out>
env = "\005", '\000' <repeats 15 times>"\200, \373!ڴ"
heart_port = 0x7fb46f31eab0
hp = 0x7fb4d6efb938
heart_fd = {865035, -1}
has_heart = 0
i = <optimized out>
envsz = <optimized out>
heap = {4460060, 140412855877120, 1}
list = 18446744073709551611
#1 erts_sys_prepare_crash_dump (secs=<optimized out>) at sys/unix/sys.c:780
So, it appears that it crashed while it was trying to write the crash dump, but didn't get all the way. I did some research, and it sounds a lot like a problem that had been posted earlier (https://groups.google.com/forum/#!msg/erlang-programming/XH2Uly6hsLY/aeR2Yx2UkZMJ). Heart was not enabled on the command line, which means this shouldn't be the problem, but... in the core dump, heart_port is set to something non-null. This should mean that heart is lurking somewhere, shouldn't it? If so, is there a way to tell heart to really not run?
This is the erlang VM crashing, not an erlang process crashing, so there is no erl_crash.dump generated. From my experience, I suspect it did not core in prepare_crash_dump, but that you have the wrong binaries loading into gdb. If you are not debugging on the system that crashed, you should copy the erlang binaries down and point GDB to them.
In erts 8.0 you have: Make sure to create a crash dump when running out of memory. This was accidentally removed in the erts-7.3 release.
So in case your VM is affected by this bug and it's crashing for this reason it won't generate.

Resources