i've got a question regarding the exploit_notesearch program.
This program is only used to create a command string we finally call with the system() function to exploit the notesearch program that contains a buffer overflow vulnerability.
The commandstr looks like this:
./notesearch Nop-block|shellcode|repeated ret(will jump in nop block).
Now the actual question:
The ret-adress is calculated in the exploit_notesearch program by the line:
ret = (unsigned int) &i-offset;
So why can we use the address of the i-variable that is quite at the bottom of the main-stackframe of the exploit_notesearch program to calculate the ret address that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe, and has to contain an address in the nop block(which is in the same buffer).
that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe
As long as the system uses virtual memory, another process will be created by system() for the vulnerable program, and assuming that there is no stack randomization,
both processes will have almost identical values of esp (as well as offset) when their main() functions will start, given that the exploit was compiled on the attacked machine (i.e. with vulnerable notesearch).
The address of variable i was chosen just to give an idea about where the frame base is. We could use this instead:
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(){
esp = sp();
ret = esp - offset;
//the rest part of main()
}
Because the variable i will be located on relatively constant distance from esp, we can use &i instead of esp, it doesn't matter much.
It would be much more difficult to get an approximate value for ret if the system did not use virtual memory.
the stack is allocated in a way as first in last out approach. The location of i variable is somewhere on the top and lets assume that it is 0x200, and the return address is located in a lower address 0x180 so in order to determine the where about to put the return address and yet to leave some space for the shellcode, the attacker must get the difference, which is: 0x200 - 0x180 = 0x80 (128), so he will break that down as follows, ++, the return address is 4 bytes so, we have only 48 bytes we left before reaching the segmentation. that is how it is calculated and the location i give approximate reference point.
Related
I am working on dynamic memory analysis using stack painting/foot print analysis method.
dynamic-stack-depth-determination-using-footprint-analysis
basically the idea is to fill the entire amount of memory allocated to the stack area with a dedicated fill value, for example 0xABABABAB, before the application starts executing. Whenever the execution stops, the stack memory can be searched upwards from the end of the stack until a value that is not 0xABABABABis found, which is assumed to be how far the stack has been used. If the dedicated value cannot be found, the stack has consumed all stack space and most likely has overflowed.
I want a c code to fill the stack from top to bottom with a pattern.
void FillSystemStack()
{
extern char __stack_start,_Stack_bottom;
}
NOTE
I am using STM32F407VG board emulated with QEMU on eclipse.
stack is growing from higher address to lower address
start of the stack is 0x20020000
bottom of the stack is Ox2001fc00
You shouldn't completely fill the stack after main() begins, because the stack is in use once main() begins. Completely filling the stack would overwrite the bit of stack that has already been used and could lead to undefined behavior. I suppose you could fill a portion of the stack soon after main() begins as long as you're careful not to overwrite the portion that has been used already.
But a better plan is to fill the stack with a pattern before main() is called. Review the startup code for your tool chain. The startup code initializes variable values and sets the stack pointer before calling main(). The startup code may be in assembly depending on your tool chain. The code that initializes variables is probably a simple loop that copies bytes or words from the appropriate ROM to RAM sections. You can probably use this code as an example to write a new loop that will fill the stack memory range with a pattern.
This is a Cortex M so it gets the stack pointer set out of reset. Meaning it's pretty much instantly ready to go for C code. If your reset vector is written in C and performs stacking/C function calls, it will be too late to fill the stack at a very early stage. Meaning you shouldn't do it from application C code.
The normal way to do the trick you describe is through an in-circuit debugger. Download the program, hit reset, fill the stack with the help of the debugger. There will be some convenient debugger command available to do that. Execute the program, try to use as much of it as possible, observe the stack in the memory map of your debugger.
With the insights from #kkrambo answer, I tried to paint the stack just after the start of main by taking care that I do not overwrite the portion that has been used already.My stack paint and stack count functions are given below:
StackPaint
uint32_t sp;
#define STACK_CANARY 0xc5
#define WaterMark 0xc9
void StackPaint(void)
{
char *p = &__StackLimit; // __StackLimit macro defined in linker script
sp=__get_MSP();
PRINTF("stack pointer %08x \r\n\n",sp);
while((uint32_t)p < sp)
{
if(p==&__StackTop){ // __StackTop macro defined in linker script
*p = WaterMark;
p++;
}
*p = STACK_CANARY;
p++;
}
}
StackCount
uint16_t StackCount(void)
{
PRINTF("In the check address function in main file \r\n\n");
const char *p = &__StackLimit;
uint16_t c = 0;
while(*p == WaterMark || (*p == STACK_CANARY && p <= &__StackTop))
{
p++;
c++;
}
PRINTF("stack used:%d bytes \n",1024-c);
PRINTF("remaining stack :%d bytes\n",c);
return c;
}
void deal_msg(unsigned char * buf, int len)
{
unsigned char msg[1024];
strcpy(msg,buf);
//memcpy(msg, buf, len);
puts(msg);
}
void main()
{
// network operation
sock = create_server(port);
len = receive_data(sock, buf);
deal_msg(buf, len);
}
As the pseudocode shows above, the compile environment is vc6 and running environment is windows xp sp3 en. No other protection mechanisms are applied, that is stack can be executed, no ASLR.
The send data is 'A' * 1024 + addr_of_jmp_esp + shellcode.
My question is:
if strcpy is used, the shellcode is generated by msfvenom, msfvenom -p windows/exec cmd=calc.exe -a x86 -b "\x00" -f python,
msfvenom attempts to encode payload with 1 iterations of x86/shikata_ga_nai
after data is sent, no calc pops up, the exploit won't work.
But if memcpy is used, shellcode generated by msfvenom -p windows/exec cmd=calc.exe -a x86 -f python without encoding works.
How to avoid the original program's crash after calc pops up, how to keep stack balance to avoid crash?
Hard to say. I'd use a custom payload (just copy the windows/exec cmd=calc.exe) and put a 0xcc at the start and debug it (or something that will be easily recognizable under the debugger like a ud2 or \0xeb\0xfe). If your payload is executed, you'll see it. Bypass the added instruction (just NOP it) and try to see what can possibly go wrong with the remainder of the payload.
You'll need a custom payload ; Since you're on XP SP3 you don't need to do crazy things.
Don't try to do the overflow and smash the whole stack (given your overflow it seems to be perfect, just enough overflow to control rIP).
See how the target function (deal_msg in your example) behave under normal conditions. Note the stack address when the ret is executed (and if register need to have certain values, this depend on the caller).
Try to replicate that in your shellcode: you'll most probably to adjust the stack pointer a bit at the end of your shellcode.
Make sure the caller (main) stack hasn't been affected when executing the payload. This might happen, in this case reserve enough room on the stack (going to lower addresses), so the caller stack is far from the stack space needed by the payload and it doesn't get affected by the payload execution.
Finally return to the ret of the target or directly after the call of the deal_msg function (or anywhere you see fit, e.g. returning directly to ExitProcess(), but this might be more interesting to return close to the previous "normal" execution path).
All in all, returning somewhere after the payload execution is easy, just push <addr> and ret but you'll need to ensure that the stack is in good shape to continue execution and most of the registers are correctly set.
I'm trying to understand the way Rust deals with memory and I've a little program that prints some memory addresses:
fn main() {
let a = &&&5;
let x = 1;
println!(" {:p}", &x);
println!(" {:p} \n {:p} \n {:p} \n {:p}", &&&a, &&a, &a, a);
}
This prints the following (varies for different runs):
0x235d0ff61c
0x235d0ff710
0x235d0ff728
0x235d0ff610
0x7ff793f4c310
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix? Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)? I'm running Windows 10 on an AMD x-64 processor (Phenom || X4) with 24GB RAM.
All the addresses do have the same size, Rust is just not printing trailing 0-digits.
The actual memory layout is an implementation detail of your OS, but the reason that a prints a location in a different memory area than all the other variables is, that a actually lives in your loaded binary, because it is a value that can already be calculated by the compiler. All the other variables are calculated at runtime and live on the stack.
See the compilation result on https://godbolt.org/z/kzSrDr:
.L__unnamed_4 contains the value 5; .L__unnamed_5, .L__unnamed_6 and .L__unnamed_1 are &5 &&5 and &&&5.
So .L__unnamed_1 is what on your system is at 0x7ff793f4c310. While 0x235d0ff??? is on your stack and calculated in the red and blue areas of the code.
This is actually a mix of both 40-bit and 48-bit addresses. Why this mix?
It's not really a mix, Rust just doesn't display leading zeroes. It's really about where the OS maps the various components of the program (data, bss, heap and stack) in the address space.
Also, can somebody please tell me why the addresses (2, 3, 4) do not fall in locations separated by 8-bytes (since std::mem::size_of_val(&a) gives 8)?
Because println! is a macro which expands to a bunch of stuff in the stackframe, so your values are not defined next to one another in the frame final code (https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5b812bf11e51461285f51f95dd79236b). Though even if they were there'd be no guarantee the compiler wouldn't e.g. be reusing now-dead memory to save up on frame size.
In most managed languages (that is, the ones with a GC), local variables that go out of scope are inaccessible and have a higher GC-priority (hence, they'll be freed first).
Now, C is not a managed language, what happens to variables that go out of scope here?
I created a small test-case in C:
#include <stdio.h>
int main(void){
int *ptr;
{
// New scope
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
//printf("tmp = %d", tmp); // Compile-time error (as expected)
printf("ptr = %d\n", *ptr);
return 0;
}
I'm using GCC 4.7.3 to compile and the program above prints 17, why? And when/under what circumstances will the local variables be freed?
The actual behavior of your code sample is determined by two primary factors: 1) the behavior is undefined by the language, 2) an optimizing compiler will generate machine code that does not physically match your C code.
For example, despite the fact that the behavior is undefined, GCC can (and will) easily optimize your code to a mere
printf("ptr = %d\n", 17);
which means that the output you see has very little to do with what happens to any variables in your code.
If you want the behavior of your code to better reflect what happens physically, you should declare your pointers volatile. The behavior will still be undefined, but at least it will restrict some optimizations.
Now, as to what happens to local variables when they go out of scope. Nothing physical happens. A typical implementation will allocate enough space in the program stack to store all variables at the deepest level of block nesting in the current function. This space is typically allocated in the stack in one shot at the function startup and released back at the function exit.
That means that the memory formerly occupied by tmp continues to remain reserved in the stack until the function exits. That also means that the same stack space can (and will) be reused by different variables having approximately the same level of "locality depth" in sibling blocks. The space will hold the value of the last variable until some other variable declared in some sibling block variable overrides it. In your example nobody overrides the space formerly occupied by tmp, so you will typically see the value 17 survive intact in that memory.
However, if you do this
int main(void) {
volatile int *ptr;
volatile int *ptrd;
{ // Block
int tmp = 17;
ptr = &tmp; // Just to see if the memory is cleared
}
{ // Sibling block
int d = 5;
ptrd = &d;
}
printf("ptr = %d %d\n", *ptr, *ptrd);
printf("%p %p\n", ptr, ptrd);
}
you will see that the space formerly occupied by tmp has been reused for d and its former value has been overriden. The second printf will typically output the same pointer value for both pointers.
The lifetime of an automatic object ends at the end of the block where it is declared.
Accessing an object outside of its lifetime is undefined behavior in C.
(C99, 6.2.4p2) "If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime."
Local variables are allocated on the stack. They are not "freed" in the sense you think about GC languages, or memory allocated on the heap. They simply go out of scope, and for builtin types the code won't do anything - and for objects, the destructor is called.
Accessing them beyond their scope is Undefined Behaviour. You were just lucky, as no other code has overwritten that memory area...yet.
Below is the C code
#include <stdio.h>
void read_input()
{
char input[512];
int c = 0;
while (read(0, input + c++,1) == 1);
}
int main ()
{
read_input();
printf("Done !\n");
return 0;
}
In the above code, there should be a buffer overflow of the array 'input'. The file we give it will have over 600 characters in it, all 2's ( ex. 2222222...) (btw, ascii of 2 is 32). However, when executing the code with the file, no segmentation fault is thrown, meaning program counter register was unchanged. Below is the screenshot of the memory of input array in gdb, highlighted is the address of the ebp (program counter) register, and its clear that it was skipped when writing:
LINK
The writing of the characters continues after the program counter, which is maybe why segmentation fault is not shown. Please explain why this is happening, and how to cause the program counter to overflow.
This is tricky! Both input[] and c are in stack, with c following the 512 bytes of input[]. Before you read the 513th byte, c=0x00000201 (513). But since input[] is over you are reading 0x32 (50) onto c that after reading is c=0x00000232 (562): in fact this is little endian and the least significative byte comes first in memory (if this was a big endian architecture it was c=0x32000201 - and it was going to segfault mostly for sure).
So you are actually jumping 562 - 513 = 49 bytes ahead. Than there is the ++ and they are 50. In fact you have exactly 50 bytes not overwritten with 0x32 (again... 0x3232ab64 is little endian. If you display memory as bytes instead of dwords you will see 0x64 0xab 0x32 0x32).
So you are writing in not assigned stack area. It doesn't segfault because it's in the process legal space (up to the imposed limit), and is not overwriting any vital information.
Nice example of how things can go horribly wrong without exploding! Is this a real life example or an assignment?
Ah yes... for the second question, try declaring c before input[], or c as static... in order not to overwrite it.