The CPU and Memory (value, register) - memory

When a value is copied from one register to another, what happens to the value
in the source register? What happens to the value in the destination register.

I'll show how it works in simple processors, like DLX or RISC, which are used to study CPU-architecture.
When (AT&T syntax, or copy $R1 to $R2)
mov $R1, $R2
or even (for RISC-styled architecture)
add $R1, 0, $R2
instruction works, CPU will read source operands: R1 from register file and zero from... may be immediate operand or zero-generator; pass both inputs into Arithmetic Logic Unit (ALU). ALU will do an operation which will just pass first source operand to destination (because A+0 = A) and after ALU, destination will be written back to register file (but to R2 slot).
So, Data in source register is only readed and not changed in this operation; data in destination register will be overwritten with copy of source register data. (old state of destination register will be lost with generating of heat.)
At physical level, any register in register file is set of SRAM cells, each of them is the two inverters (bi-stable flip-flop, based on M1,M2,M3,M4) and additional gates for writing and reading:
When we want to overwrite value stored in SRAM cell, we will set BL and -BL according to our data (To store bit 0 - set BL and unset -BL; to store bit 1 - set -BL and unset BL); then the write is enabled for current set (line) of cells (WL is on; it will open M5 and M6). After opening of M5 and M6, BL and -BL will change state of bistable flip-flop (like in SR-latch). So, new value is written and old value is discarded (by leaking charge into BL and -BL).

Related

How to avoid crash during stack buffer overflow exploit?

void deal_msg(unsigned char * buf, int len)
{
unsigned char msg[1024];
strcpy(msg,buf);
//memcpy(msg, buf, len);
puts(msg);
}
void main()
{
// network operation
sock = create_server(port);
len = receive_data(sock, buf);
deal_msg(buf, len);
}
As the pseudocode shows above, the compile environment is vc6 and running environment is windows xp sp3 en. No other protection mechanisms are applied, that is stack can be executed, no ASLR.
The send data is 'A' * 1024 + addr_of_jmp_esp + shellcode.
My question is:
if strcpy is used, the shellcode is generated by msfvenom, msfvenom -p windows/exec cmd=calc.exe -a x86 -b "\x00" -f python,
msfvenom attempts to encode payload with 1 iterations of x86/shikata_ga_nai
after data is sent, no calc pops up, the exploit won't work.
But if memcpy is used, shellcode generated by msfvenom -p windows/exec cmd=calc.exe -a x86 -f python without encoding works.
How to avoid the original program's crash after calc pops up, how to keep stack balance to avoid crash?
Hard to say. I'd use a custom payload (just copy the windows/exec cmd=calc.exe) and put a 0xcc at the start and debug it (or something that will be easily recognizable under the debugger like a ud2 or \0xeb\0xfe). If your payload is executed, you'll see it. Bypass the added instruction (just NOP it) and try to see what can possibly go wrong with the remainder of the payload.
You'll need a custom payload ; Since you're on XP SP3 you don't need to do crazy things.
Don't try to do the overflow and smash the whole stack (given your overflow it seems to be perfect, just enough overflow to control rIP).
See how the target function (deal_msg in your example) behave under normal conditions. Note the stack address when the ret is executed (and if register need to have certain values, this depend on the caller).
Try to replicate that in your shellcode: you'll most probably to adjust the stack pointer a bit at the end of your shellcode.
Make sure the caller (main) stack hasn't been affected when executing the payload. This might happen, in this case reserve enough room on the stack (going to lower addresses), so the caller stack is far from the stack space needed by the payload and it doesn't get affected by the payload execution.
Finally return to the ret of the target or directly after the call of the deal_msg function (or anywhere you see fit, e.g. returning directly to ExitProcess(), but this might be more interesting to return close to the previous "normal" execution path).
All in all, returning somewhere after the payload execution is easy, just push <addr> and ret but you'll need to ensure that the stack is in good shape to continue execution and most of the registers are correctly set.

SIGSEGV on simple move register to memory in NASM

I must be missing something very basic here. Searched SO but could not find the answer to this particular question. Here's my NASM code:
%include "io64.inc"
section .text
myvar db "This is not working", 0
global CMAIN
CMAIN:
mov rbp, rsp; for correct debugging
;write your code here
xor rax, rax
mov [myvar], rax
ret
It crashes on the move [myvar], rax line with SIGSEGV. I am simply trying to store some zeroes at that address.
Thanks!
PS: Using SASM to build / run / debug with 64 bit option ticked (default settings otherwise), on Windows 10 64 bit.
section .text
myvar db "This is not working", 0
Section .text is an executable section without write permissions. This is done to prevent some kinds of vulnerabilities. You should either place your myvar into a writable section, e.g. .data (if the variable should live for the whole duration of program execution), have the variable on the stack (if it's not supposed to outlive the function where it's created), or change .text to be writable (not recommended for security reasons, but possible).

Memory address wont load

I am trying to move the value 0 into the address stored in ax (assume that this is writable for now).
mov ax, 0EC7 ; assume writable
mov BYTE [ax], 0
But, nasm is giving me this error:
error: invalid effective address
Any ideas?
16-bit addressing modes are quite limited. You can use an (optional) offset (a plain number), plus an (optional) base register (bx or bp), plus an (optional) index register (si or di). That's it.
In 32-bit addressing modes, any register can be a base register and any register but esp can be an index register. 32-bit addressing also introduces an (optional) scale (1, 2, 4, or 8) to be multiplied by the index register.
[eax] will work - even in 16-bit code. The assembler generates an "address size override prefix" byte (0x67). If the value in eax exceeds the segment limit (usually 64k), an exception is generated (not handled in real DOS - it just hangs), so be careful with it.

The art of exploitation - exploit_notesearch.c

i've got a question regarding the exploit_notesearch program.
This program is only used to create a command string we finally call with the system() function to exploit the notesearch program that contains a buffer overflow vulnerability.
The commandstr looks like this:
./notesearch Nop-block|shellcode|repeated ret(will jump in nop block).
Now the actual question:
The ret-adress is calculated in the exploit_notesearch program by the line:
ret = (unsigned int) &i-offset;
So why can we use the address of the i-variable that is quite at the bottom of the main-stackframe of the exploit_notesearch program to calculate the ret address that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe, and has to contain an address in the nop block(which is in the same buffer).
that will be saved in an overflowing buffer in the notesearch program itself ,so in an completely different stackframe
As long as the system uses virtual memory, another process will be created by system() for the vulnerable program, and assuming that there is no stack randomization,
both processes will have almost identical values of esp (as well as offset) when their main() functions will start, given that the exploit was compiled on the attacked machine (i.e. with vulnerable notesearch).
The address of variable i was chosen just to give an idea about where the frame base is. We could use this instead:
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(){
esp = sp();
ret = esp - offset;
//the rest part of main()
}
Because the variable i will be located on relatively constant distance from esp, we can use &i instead of esp, it doesn't matter much.
It would be much more difficult to get an approximate value for ret if the system did not use virtual memory.
the stack is allocated in a way as first in last out approach. The location of i variable is somewhere on the top and lets assume that it is 0x200, and the return address is located in a lower address 0x180 so in order to determine the where about to put the return address and yet to leave some space for the shellcode, the attacker must get the difference, which is: 0x200 - 0x180 = 0x80 (128), so he will break that down as follows, ++, the return address is 4 bytes so, we have only 48 bytes we left before reaching the segmentation. that is how it is calculated and the location i give approximate reference point.

Is there a safe way to clean up stack-based code when jumping out of a block?

I've been working on Issue 14 on the PascalScript scripting engine, in which using a Goto command to jump out of a Case block produces a compiler error, even though this is perfectly valid (if ugly) Object Pascal code.
Turns out the ProcessCase routine in the compiler calls HasInvalidJumps, which scans for any Gotos that lead outside of the Case block, and gives a compiler error if it finds one. If I comment that check out, it compiles just fine, but ends up crashing at runtime. A disassembly of the bytecode shows why. I've annotated it with the original script code:
[TYPES]
<SNIPPED>
[VARS]
Var [0]: 27 Class TFORM
Var [1]: 28 Class TAPPLICATION
Var [2]: 11 S32 //i: integer
[PROCS]
Proc [0] Export: !MAIN -1
{begin}
[0] ASSIGN GlobalVar[2], [1]
{ i := 1;}
[15] PUSHTYPE 11(S32) // 1
[20] ASSIGN Base[1], GlobalVar[2]
{ case i of}
[31] PUSHTYPE 25(U8) // 2
{ 0:}
[36] COMPARE into Base[2]: [0] = Base[1]
[57] COND_NOT_GOTO currpos + 5 Base[2] [72]
{ end;}
[67] GOTO currpos + 41 [113]
{ 1:}
[72] COMPARE into Base[2]: [1] = Base[1]
[93] COND_NOT_GOTO currpos + 10 Base[2] [113]
{ goto L1;}
[103] GOTO currpos + 8 [116]
{ end;}
[108] GOTO currpos + 0 [113]
{ end; //<-- case}
[113] POP // 1
[114] POP // 0
{ Exit;}
[115] RET
{L1:
Writeln('Label L1');}
[116] PUSHTYPE 17(WideString) // 1
[121] ASSIGN Base[1], ['????????']
[144] CALL 1
{end.}
[149] POP // 0
[150] RET
Proc [1]: External Decl: \00\00 WRITELN
The "goto L1;" statement at 103 skips the cleanup pops at 113 and 114, which leaves the stack in an invalid state.
Delphi doesn't have any trouble with this, because it doesn't use a calculation stack. PascalScript, though, is not as fortunate. I need some way to make this work, as this pattern is very common in some legacy scripts from a much simpler system with little in the way of control structures that I've translated to PascalScript and need to be able to support.
Anyone have any ideas how to patch the codegen so it'll clean up the stack properly?
IIRC the goto rules in classic pascals were:
jumps are only allowed out of a block (iow from a higher to a lower nesting level on the "same" branch of the tree)
from local procedures to their parents.
The later was afaik never supported by Borland derived Pascals, but the first still holds.
So you need to generate exiting code like Martin says, but possibly it can be for multiple block levels, so you can't have a could codegeneration for each goto, but must generate code (to exit the precise number of needed blocks).
A typical test pattern is to exit from multiple nested ifs (possibly within a loop) using a goto, since that was a classic microoptimization that was faster at least up to D7.
Keep in mind that the if evaluation(s) and the begin..end blocks of their branches might have generated temps that need cleanup.
---------- added later
I think the codegenerator needs a way to walk the scopes between the goto and its endpoint, generating the relevant exit code for blocks along the way. That way a fix works for the general case and not just this example.
Since you can only jump out of scopes, and not into it that might not that be that hard.
IOW generate something that is equivalent to (for a hypothetical double case block)
Lgoto1gluecode:
// exit code first block
pop x
pop y
// exit code first block
pop A
pop B
goto real_goto_destination
Additional analysis can be done. E.g. if there is only one scope, and it has already a cleanup exit label, you can jump directly. If you know for certain that the above pop's are only discarded values (and not saves of registers) you can do them at once with add $16,%esp (4*4 byte values) etc.
The straightforward solution would be:
When generating a GOTO for goto statement, prefix the GOTO with the same cleanup code that comes before RET.
It looks to me like the calculation of how far to jump forward is the problem. I would have to spend some time looking at the implementation of the parser to help further, but my guess would be that additional handling must be performed when using a goto and there are values on the stack AND the goto would be placed after those values would be removed from the stack. Of course to determine this you would need to save the current location being parsed (the goto) and the forward parse to the target location watching for stack changes, and if so then to either adjust the goto location backwards, or inject the code as Martin suggested.

Resources