I'll try to summarize but it's gonna be complicated.
I'm having an operating system course in my university, and i have a lab work to do.
I'm working in Rust (the lab work is said to be doable in any compiled language but was principally designed for C in the first time).
So I have a tracer program and a traced program.
The goal of this step of the lab work is to attach from the tracer to the traced with ptrace, then to inject "trap-call-trap" instructions at the good place to replace an existing useless function, with the function posix_memalign (from the libc) in the traced, by an indirect call via registers (with rax). The goal is to allocate memory to be able later to call a cache code from another file into the traced program.
The problem i have is that i achieve to do a posix memalign in the tracer so i know the function works, but when i call it in the traced (via the tracer), and look in register rax for the return of the function, i always get a "12" which corresponds to ENOMEM (ENOMEM Not enough space/cannot allocate memory).
I have 2 separated cargo projects to be able to launch each program separately from cargo.
Everything is on this git : https://github.com/Carrybooo/TP_SEL
I'm sorry all prints and outputs are in french, (some comments too, i'm in a french course) and i didn't think i would have to share it with anyone. There is a lot of useless code too in it, from the previous steps of the lab, that i have kept just in case, so my code is not really clean.
This is the part where i attach and modify the regs to call the function (I shortened the code and didn't show you the auxiliaries functions declarations cause it would be too long) :
ptrace::attach(pid_ptrace) //attaching to process
wait(); //wait after attaching
inject(pid_trace, offset_fct_to_replace, false); //injecting trap-call-trap
ptrace::cont(pid_ptrace, Signal::SIGCONT);
wait(); //wait for 1st trap
let mut regs =
ptrace::getregs(pid_ptrace);
let ptr_to_ptr: *mut *mut c_void = ptr::null_mut();
regs.rax = get_libc_address(pid_trace).unwrap() + get_libc_offset("posix_memalign").unwrap();
regs.rsp = regs.rsp - (size_of::<*mut *mut c_void>() as u64);
regs.rdi = ptr_to_ptr as u64;
regs.rsi = size_of::<usize>() as u64;
regs.rdx = size_of::<usize>() as u64;
ptrace::setregs(pid_ptrace, regs); //set regs with modification
ptrace::cont(pid_ptrace, Signal::SIGCONT);
wait();
let regs = ptrace::getregs(pid_ptrace);
ptrace::detach(pid_ptrace, Signal::SIGCONT) //detaching
And running the program in the terminal gives something like this :
before modification of regs:
rax = 6
rip = 55e6c932ddc1
rsp = 7ffcee0b7fb8
before function execution:
rax = 7f59b935ded0
rdi = 0
rip = 55e6c932ddc1
rsp = 7ffcee0b7fb0
after function execution:
rax = 12 <------//RESULT OF THE CALL IS HERE
rdi = 55e6cac6aba0
rip = 7f59b935df20
rsp = 7ffcee0b7f90
//end of program
So yeah i don't know why i keep getting an error on this call, i presume it's because it violates rust memory safety, because the compiler never knew in the traced program that it would have to allocate memory, but i'm not sure of it, neither how to bypass it.
I hope that i've been clear enough, let me know if you need any more detail, and i really thank in advance anyone who could help me. Every advice will be welcome.
So I went with the libc memalign instead of using posix_memalign. It's not as restricted, and it works well while calling with registers in the traced program.
Still no idea of why posix_memalign doesn't work while called from the regs in a rust program.
Related
To elaborate, I am currently writing a program that requires a function that is provided by the professor. When I run the program, I get a segmentation fault, and the debugger I use (gdb) says that the segmentation fault occurred at the definition of the function that, like I said, was provided by the professor.
So my question here is, is the definition itself causing the fault, or is it somewhere else in the program that called the function causing the fault?
I attempted to find a spot in the program that might have been leading to it, such as areas that might have incorrect parameters. I have not changed the function itself, as it is not supposed to be modified (as per instructions). This is my first time posting a question, so if there is any other information needed, please let me know.
The error thrown is as follows:
Program received signal SIGSEGV, Segmentation fault. .0x00401450 in Parser::GetNextToken (in=..., line=#0x63fef0: 1) at PA2.cpp:20 20 return GetNextToken(in, line);
The code itself that this is happening at is this:
static LexItem GetNextToken(istream& in, int& line) {
if( pushed_back ) {
pushed_back = false;
return pushed_token;
}
return GetNextToken(in, line);
}
Making many assumptions here, but maybe the lesson is to understand how the stack is affected by a function call and parameters. Create a main() function, that call the professor's provided function and trace the code using dbg, looking at the stack.
On my system (Windows, 8 GB RAM 64-bit i7), Octave is having this problem handling medium sized arrays. I have task manager open, and the memory never going beyond 200 MB before the graphing section. It will often crash around 150 MB. The interesting thing is that if I put breakpoints into my code to find where the problem lies, the problem goes away, and I am actually able to get through everything, and move on to the graphing portion. It also crashes through that unless I add breakpoints every other graph.
With breakpoints, I am able to load it up to the full script load which should be around 1 GB. I'm not crazy right? This is supposed to be easy stuff that Matlab would breeze through in a second.
Below is a snippit of code that will crash unless I use breakpoints at every double new line.
n2040id = fopen(n_20to40ft_file);
n2040data = dlmread(n2040id,',', [8 1 70849 98]);
fclose(n2040id);
n20test = n2040data(3410:22066,:);
n30test = n2040data(26730:45748,:);
n40test = n2040data(49874:68706,:);
clear n2040data;
%% 20Ft Test Processing
n20spo2 = n20test(:,88);
n20spo2(n20spo2 == 0) = [];
n20co = n20test(:,89);
n20co(n20co == 0) = [];
clear n20test;
%% 30 Ft Test Processing
n30spo2 = n30test(:,88);
n30spo2(n30spo2 == 0) = [];
n30co = n30test(:,89);
n30co(n30co == 0) = [];
clear n30test;
%% 40 Ft Test Processing
n40spo2 = n40test(:,88);
n40spo2(n40spo2 == 0) = [];
n40co = n40test(:,89);
n40co(n40co == 0) = [];
clear n40test;
This snippit uses about an extra 60-90 MB of memory when compared to the memory before that point which is cleared before every break when I am done with it. The first array is a double of size 70841x98 while the others become around 450x1 to 900x1. These are not difficult arrays to deal with by a long shot. Yet it will crash unless I put in those breakpoints, then I can just press continue and it's fine.
I've also tried using clear -v but that crashed too unless I used breakpoints.
Now, I debugged with visual studio and got this error:
No symbol file loaded for liboctgui-3.dll as well as and error that it was trying to access 0xFFFFFFFFFFFFFFFF and it got "permission denied" trying to access it. Why on earth would it be trying to access the last memory block?
This actually doesn't happen if I don't clear any variables. It will take up the extra 1 -1.4 GB happily. Is this a known issue? releasing memory shouldn't cause a program to attempt to access the very last possibly memory block.
I hope this question isn't too open-ended. I ran into a memory issue with Rust, where I got an "out of memory" from calling next on an Iterator trait object. I'm unsure how to debug it. Prints have only brought me to the point where the failure occurs. I'm not very familiar with other tools such as ltrace, so although I could create a trace (231MiB, pff), I didn't really know what to do with it. Is a trace like that useful? Would I do better to grab gdb/lldb? Or Valgrind?
In general I would try to do the following approach:
Boilerplate reduction: Try to narrow down the problem of the OOM, so that you don't have too much additional code around. In other words: the quicker your program crashes, the better. Sometimes it is also possible to rip out a specific piece of code and put it into an extra binary, just for the investigation.
Problem size reduction: Lower the problem from OOM to a simple "too much memory" so that you can actually tell the some part wastes something but that it does not lead to an OOM. If it is too hard to tell wether you see the issue or not, you can lower the memory limit. On Linux, this can be done using ulimit:
ulimit -Sv 500000 # that's 500MB
./path/to/exe --foo
Information gathering: If you problem is small enough, you are ready to collect information which has a lower noise level. There are multiple ways which you can try. Just remember to compile your program with debug symbols. Also it might be an advantage to turn off optimization since this usually leads to information loss. Both can be archived by NOT using the --release flag during compilation.
Heap profiling: One way is too use gperftools:
LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ./path/to/exe --foo
pprof --gv ./path/to/exe /tmp/profile/profile.0100.heap
This shows you a graph which symbolizes which parts of your program eat which amount of memory. See official docs for more details.
rr: Sometimes it's very hard to figure out what is actually happening, especially after you created a profile. Assuming you did a good job in step 2, you can use rr:
rr record ./path/to/exe --foo
rr replay
This will spawn a GDB with superpowers. The difference to a normal debug session is that you can not only continue but also reverse-continue. Basically your program is executed from a recording where you can jump back and forth as you want. This wiki page provides you some additional examples. One thing to point out is that rr only seems to work with GDB.
Good old debugging: Sometimes you get traces and recordings that are still way too large. In that case you can (in combination with the ulimit trick) just use GDB and wait until the program crashes:
gdb --args ./path/to/exe --foo
You now should get a normal debugging session where you can examine what the current state of the program was. GDB can also be launched with coredumps. The general problem with that approach is that you cannot go back in time and you cannot continue with execution. So you only see the current state including all stack frames and variables. Here you could also use LLDB if you want.
(Potential) fix + repeat: After you have a glue what might go wrong you can try to change your code. Then try again. If it's still not working, go back to step 3 and try again.
Valgrind and other tools work fine, and should work out of the box as of Rust 1.32. Earlier versions of Rust require changing the global allocator from jemalloc to the system's allocator so that Valgrind and friends know how to monitor memory allocations.
In this answer, I use the macOS developer tool Instruments, as I'm on macOS, but Valgrind / Massif / Cachegrind work similarly.
Example: An infinite loop
Here's a program that "leaks" memory by pushing 1MiB Strings into a Vec and never freeing it:
use std::{thread, time::Duration};
fn main() {
let mut held_forever = Vec::new();
loop {
held_forever.push("x".repeat(1024 * 1024));
println!("Allocated another");
thread::sleep(Duration::from_secs(3));
}
}
You can see memory growth over time, as well as the exact stack trace that allocated the memory:
Example: Cycles in reference counts
Here's an example of leaking memory by creating an infinite reference cycle:
use std::{cell::RefCell, rc::Rc};
struct Leaked {
data: String,
me: RefCell<Option<Rc<Leaked>>>,
}
fn main() {
let data = "x".repeat(5 * 1024 * 1024);
let leaked = Rc::new(Leaked {
data,
me: RefCell::new(None),
});
let me = leaked.clone();
*leaked.me.borrow_mut() = Some(me);
}
See also:
Why does Valgrind not detect a memory leak in a Rust program using nightly 1.29.0?
Handling memory leak in cyclic graphs using RefCell and Rc
Minimal `Rc` Dependency Cycle
In general, to debug, you can use either a log-based approach (either by inserting the logs yourself, or having a tool such a ltrace, ptrace, ... to generate the logs for you) or you can use a debugger.
Note that ltrace, ptrace or debugger-based approaches require that you be able to reproduce the problem; I tend to favor manual logs because I work in an industry where bug reports are generally too imprecise to allow immediate reproduction (and thus we use logs to create the reproducer scenario).
Rust supports both approaches, and the standard toolset that one uses for C or C++ programs works well for it.
My personal approach is to have some logging in place to quickly narrow down where the issue occurs, and if logging is insufficient to fire up a debugger for a more fine-combed inspection. In this case I would recommend going straight away for the debugger.
A panic is generated, which means that by breaking on the call to the panic hook, you get to see both the call stack and memory state at the moment where things go awry.
Launch your program with the debugger, set a break point on the panic hook, run the program, profit.
I wanted to see if you could pass struct through the stack and I manage to get a local var from a void function in another void function.
Do you guys thinks there is any use to that and is there any chance you can get corrupted data between the two function call ?
Here's the Code in C (I know it's dirty)
#include <stdio.h>
typedef struct pouet
{
int a,b,c;
char d;
char * e;
}Pouet;
void test1()
{
Pouet p1;
p1.a = 1;
p1.b = 2;
p1.c = 3;
p1.d = 'a';
p1.e = "1234567890";
printf("Declared struct : %d %d %d %c \'%s\'\n", p1.a, p1.b, p1.c, p1.d, p1.e);
}
void test2()
{
Pouet p2;
printf("Element of struct undeclared : %d %d %d %c \'%s\'\n", p2.a, p2.b, p2.c, p2.d, p2.e);
p2.a++;
}
int main()
{
test1();
test2();
test2();
return 0;
}
Output is :
Declared struct : 1 2 3 a '1234567890'
Element of struct undeclared : 1 2 3 a '1234567890'
Element of struct undeclared : 2 2 3 a '1234567890'
Contrary to the opinion of the majority, I think it can work out in most of the cases (not that you should rely on it, though).
Let's check it out. First you call test1, and it gets a new stack frame: the stack pointer which signifies the top of the stack goes up. On that stack frame, besides other things, memory for your struct (exactly the size of sizeof(struct pouet)) is reserved and then initialized. What happens when test1 returns? Does its stack frame, along with your memory, get destroyed?
Quite the opposite. It stays on the stack. However, the stack pointer drops below it, back into the calling function. You see, this is quite a simple operation, it's just a matter of changing the stack pointer's value. I doubt there is any technology that clears a stack frame when it is disposed. It's just too costy a thing to do!
What happens then? Well, you call test2. All it stores on the stack is just another instance of struct pouet, which means that its stack frame will most probably be exactly the same size as that of test1. This also means that test2 will reserve the memory that previously contained your initialized struct pouet for its own variable Pouet p2, since both variables should most probably have the same positions relative to the beginning of the stack frame. Which in turn means that it will be initialized to the same value.
However, this setup is not something to be relied upon. Even with concerns about non-standartized behaviour aside, it's bound to be broken by something as simple as a call to a different function between the calls to test1 and test2, or test1 and test2 having stack frames of different sizes.
Also, you should take compiler optimizations into account, which could break things too. However, the more similar your functions are, the less chances there are that they will receive different optimization treatment.
Of course there's a chance you can get corrupted data; you're using undefined behavior.
What you have is undefined behavior.
printf("Element of struct undeclared : %d %d %d %c \'%s\'\n", p2.a, p2.b, p2.c, p2.d, p2.e);
The scope of the variable p2 is local to function test2() and as soon as you exit the function the variable is no more valid.
You are accessing uninitialized variables which will lead to undefined behavior.
The output what you see is not guaranteed at all times and on all platforms. So you need to get rid of the undefined behavior in your code.
The data may or may not appear in test2. It depends on exactly how the program was compiled. It's more likely to work in a toy example like yours than in a real program, and it's more likely to work if you turn off compiler optimizations.
The language definition says that the local variable ceases to exist at the end of the function. Attempting to read the address where you think it was stored may or may produce a result; it could even crash the program, or make it execute some completely unexpected code. It's undefined behavior.
For example, the compiler might decide to put a variable in registers in one function but not in the other, breaking the alignment of variables on the stack. It can even do that with a big struct, splitting it into several registers and some stack — as long as you don't take the address of the struct it doesn't need to exist as an addressable chunk of memory. The compiler might write a stack canary on top of one of the variables. These are just possibilities at the top of my head.
C lets you see a lot behind the scenes. A lot of what you see behind the scenes can completely change from one production compilation or run to the next.
Understanding what's going on here is useful as a debugging skill, to understand where values that you see in a debugger might be coming from. As a programming technique, this is useless since you aren't making the computer accomplish any particular result.
Just because this works for one compiler doesn't mean that it will for all. How uninitialized variables are handled is undefined and one computer could very well init pointers to null etc without breaking any rules.
So don't do this or rely on it. I have actually seen code that depended on functionality in mysql that was a bug. When that was fixed in later versions the program stopped working. My thoughts about the designer of that system I'll keep to myself.
In short, never rely on functionality that is not defined. If you knowingly use it for a specific function and you are prepared that an update to the compiler etc can break it and you keep an eye out for this at all times it might be something you could explain and live with. But most of the time this is far from a good idea.
i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll.
Note: i turned on the options to ignore Application.Idle time, and calls from System.pas. So it
isn't inside ntdll because the
application is idle:
After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll, but the odd thing is who the caller is:
The caller is from the Virtual Treeview's:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
Note: The application is not inside ntdll.dll because the
application is idle, because the
caller isn't Application.Idle.
What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll. Even more confusing is that:
not only is it not something inside PrepareCell()
it's not even the setup of PrepareCell (e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on the begin inside PrepareCell.
VirtualTrees.pas:
procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
...
end;
So i'm trying to figure out how this line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
is calling ntdll.dll.
The only other ways in are the three parameters:
PaintInfo
Window.Left
NodeBitmap.Width
Maybe one of those is a function, or a property getter, that would call into ntdll. So i put a breakpoint on the line, and look at the CPU window at runtime:
There is a line in there that might be the culprit:
call dword ptr [edx+$2c]
But when i follow that jump, it doesn't end up in ntdll.dll, but TBitmap.GetWidth:
Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll.
So how is the line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
calling into ntdll.dll?
Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.
Update 2
What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.
For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:
WaitMessage;
When when you actually look at it is:
USER32.WaitMessage
mov eax,$00001226
mov edx,$7ffe0300
call dword ptr [edx]
ret
Calling the function specified at $7ffe0300 is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage.
This is how to you get into ntdll.dll: you call it.
i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.
Update Three
i converted the two parameters:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
into stack variables:
_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
To confirm that the bottleneck is not is the call to
Windows.Left, or
Nodebitmap.Width
Profiler still indicates that the line
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:
VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
mov eax,[ebp-$54]
push eax
mov edx,esi
mov ecx,[ebp-$50]
mov eax,[ebp-$04]
call TBasevirtualTree.PrepareCell
Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:
VirtualTrees.pas.15746: begin
push ebp
mov ebp,esp
add esp,-$44
push ebx
push esi
push edi
mov [ebp-$14],ecx
mov [ebp-$18],edx
mov [ebp-$1c],eax
lea esi,[ebp-$1c]
mov edi,[ebp-$18]
Nothing in there calls into ntdll.dll.
The questions still remain:
why is pushing of one variable onto the stack, and two others into registers the bottleneck?
why isn't anything inside PrepareCell itself the bottleneck?
Well, this problem was actually my main reason to make my own sampling profiler:
http://code.google.com/p/asmprofiler/wiki/AsmProfilerSamplingMode
Maybe not perfect, but you could give it a try. Let me know what you think about it.
Btw, I think it has to do with the fact that almost all calls ends into calls to the kernel (memory requests, paint events, etc). Only calculations do not need to call the kernel.
Most calls ends in waiting for kernel results:
ntdll.dll!KiFastSystemCallRet
You can see this in Process Explorer with thread stack view, or in Delphi, or using StackWalk64 API in my "Live view" of AsmProfiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer
There are probably two things happening there.
The first is that SamplingProfiler identifies the caller by walking up the stack, until it encounters what looks like a valid call point into Delphi from Delphi code.
The thing is, some procedures may reserve a large amount of stack at once, without reinitializing it. This could result in a false positive. The only clue then would be that your false positive was recently invoked.
The second thing is the ntdll localization, that is known for certain, however, ntdll is your wait point in user-space, and as user197220, ntdll is where you'll end up waiting most of the time you're calling system stuff and waiting for the result.
In your case, unless you reduced the sampling rate, you're looking at 247ms of CPU work time, which could probably pass as idle if those 247 samples were collected over many seconds of real time. Since the false positive points to VirtualTree paint preparations, my bet would be that the ntdll time is actually paint time (driver or OS software).
You can try commenting out the code that actually does the painting to be sure.