I am currently learning buffer overflow attacks, in order to pass the OSCP exam.
My current understanding of the stack is that ESP and EIP are not located on the stack itself. I always thought that the current value of EIP, is just held in the CPU register "EIP".
The course continuously uses terminology such as the EIP is being over written, so the EIP value must be physically there.
I understand why EBP is recorded within the stack.
My current theory is that EIP and ESP are added to the stack when a function calls another function?
Below is diagram from the course.
Related
The WebAssembly spec states here that it has an implicit operand and call stack.
What exactly does that mean in terms of WebAssembly, and how would and explicit stack differ from an implicit one?
The implicit stack is managed by the VM and is not directly accessible. It is implicitly pushed to, and popped from, by various instruction.
An example of an explicit stack might be a region of linear memory that you have direct access to via load and store instruction. Indeed this is exactly what llvm does for address taken stack variables. i.e. it allocates a specific region and linear memory for them.
The control flow stack (e.g. the return addresses of each of the function on the stack) is also part of the implicit stack and cannot be explicitly read from or written to.
The LD_PRELOAD technique allows us to supply our own custom standard library functions to an existing binary, overriding the standard ones or manipulating their behaviour, giving a fun way to experiment with a binary and understand its behaviour.
I've read that LD_PRELOAD can be used to "checkpoint" a program --- that is, to produce a record of the full memory state, call stack and instruction pointer at any given time --- allowing us to "reset" the program back to that previous state at will.
It's clear to me how we can record the state of the heap. Since we can provide our own version of malloc and related functions, our preloaded library can obviously gain perfect knowledge of the memory state.
What I can't work out is how our preloaded functions can determine the call stack and instruction pointer; and then reset them at a later time to the previously recorded value. Clearly this is necessary for checkpointing. Are there standard library functions that can do this? Or is a different technique required?
I've read that LD_PRELOAD can be used to "checkpoint" a program ... allowing us to "reset" the program back to that previous state at will.
That is a gross simplification. This "checkpoint" mechanism can not possibly restore any open file descriptors, or any mutexes, since the state of these is partially inside the kernel.
It's clear to me how we can record the memory state. ...
What I can't work out is how our preloaded functions can determine the call stack and instruction pointer;
The instruction pointer is inside the preloaded function, and is trivially available as e.g. register void *rip __asm__("rip") on x86_64. But you (likely) don't care about that address -- you probably care about the caller of your function. That is also trivially available as __builtin_return_address() (at least when using GCC).
And the rest of the call stack is saved in memory (in the stack region to be more precise), so if you know the contents of memory, you know the call stack.
Indeed, when you use e.g. GDB where command with a core dump, that's exactly what GDB does -- it reads contents of memory from the core and recovers the call stack from it.
Update:
I wrote in my original post that I know how to inspect the memory, but in fact I only know how to inspect the heap. How can I view the full contents of all stack frames?
Inspecting memory works the same regardless of whether that memory "belongs" to heap, stack, or code. You simply dereference a pointer and voilà -- you get the contents of memory at that location.
What you probably mean is:
how to find location of stack and
how to decode it
The answer to the first question is OS-specific, and you didn't tag your question with any OS.
Assuming you are on Linux, one way to locate the stack is to parse entries in /proc/self/maps looking for an entry (continuous address range) which "covers" current stack (i.e. "covers" an address of any local variable).
For the second question, the answer is:
it's complicated1 and
you don't actually need to decode it in order to save/restore its state.
1To figure out how to decode stack, you could look at sources for debuggers (such as GDB and LLDB).
This is also very OS and processor specific.
You would need to know calling conventions. On x86_64 you would need to know about unwind descriptors. To find local variables, you would need to know about DWARF debugging format.
Did I mention it's complicated?
I have a few assembly projects to realise and I am confused as to precisely when to add space on the stack and how much I should add.
I am using NASM version 2.13.03 on a unix system (macos) intel x86_64.
I have been reading a lot of documentation and did a lot of researching but none explain in a detailed enough way the answer to my question.
I understood the red zone and that leaf functions do not need the use of an increased stack.
I understood that increasing the stack by using sub rsp should be used before a function call and that add rsp should be used after the function call.
I know that on 32-bits architecture you use push and pop to increase the stack as you go but on this 64-bits architecture it is needed to use sub rsp and add rsp as well as the mov instruction to add registers on the stack.
If anyone has any tip or explanation regarding the use of the stack with this architecture and explain when to increase the stack and how much should be given, thanks a lot !
Some x86-64 stack principles:
Stack needs to be 16 byte aligned before function calls, according to both major calling conventions, including the x86-64 System V ABI used on MacOS. If it is not you're risking a Segmentation fault when calling external functions. (Because they're allowed to assume alignment and use movaps for 16-byte copies to/from stack memory, for example.)
Fun fact - on MacOS, system calls do work correctly when stack is not 16 byte aligned.
For push rax rax value is pushed on top of the stack.
For sub rsp,8 the top of the stack remains unaltered (so whatever was sitting there in memory will stay there).
The change to the rsp value is exactly the same for both instructions.
So for example you could do either:
sub rsp,16
or
push rax
push rax
And the stack pointer rsp would point to exactly the same place.
For moving the stack pointer by only 8, a dummy push or pop can be as efficient or more efficient than add/sub. Beyond that, usually not.
I'm trying to understand what is stored in the stack in optix.
As I understand it, we set the stack size per context, and one stack is attached to each thread in the ray generation program.
When a ray is launched, the thread carries with it the stack, which stores the ray's payload.
I thought that, when we do a recursive ray-tracer for example, the stack overflow would occur because there would be too many payloads to keep in the memory. But right now, I have a program with a radiance ray that has a payload of float + 3 uint, and a shadow ray with only a float, and there is only one bounce. However, my stack needs to be bigger that 1024 to avoid a stack overflow. Surely, this is way more that just my two payloads.
So I wonder, what else is in the stack?
(I mean in general, not in my particular case. What is stored in the stack except the ray(s) payload(s) (if they are)? For example, do we also store information about the hits? about the scene tree? Do we keep track of which program called the current ray?)
Thanks for your help!
Answered on the NVIDIA board here
Detlef Roettger wrote
"The stack is also used to save and restore live variables around
function calls (e.g. rtTrace or callable programs). That's the
background for one of the performance advice in the OptiX Programming
Guide which starts with Try to minimize live state across calls to
rtTrace in programs."
More info on this at §3.1.3 - Global State in the OptiX Programming guide.
Remember that OptiX programs are full blown CUDA kernels combined together. Stack memory is therefore also used for ordinary execution needs (the amount is likely to vary even between CUDA versions).
i have a probably stack overflow in my application (off course, only in release mode...), and would like to add some protection/investigation code to it.
i am looking for a windows API to tell me the current state of a thread stack (i..e, the total size and used size).
anyone ?
thx
Noam
The total size of the stack will be the size of the stack you asked for when you created the thread ( or linked the program if it's the main thread ).
There are some preliminary references to getting the stack size for a thread pool in Windows 7 on MSDN ( QueryThreadpoolStackInformation ).
As an approximation, you can compare the address of a local variable with the address of another local variable further down the stack to get a measure of the amount us. I believe that how a program running in windows chooses to lay its local variables out in the virtual memory space windows allocates to a thread is up to the implementation of that language's runtime, rather than something that Windows really knows about; instead you get an exception when you attempt to access an address just below the memory allocated for the stack.
The other alternative to complicating your code with a check whether the stack has reached a limit is to add an exception handler for EXCEPTION_STACK_OVERFLOW, which will get called by the OS when it checks that the stack has reached its limit. There's an example here.