SIGSEGV on simple move register to memory in NASM - memory

I must be missing something very basic here. Searched SO but could not find the answer to this particular question. Here's my NASM code:
%include "io64.inc"
section .text
myvar db "This is not working", 0
global CMAIN
CMAIN:
mov rbp, rsp; for correct debugging
;write your code here
xor rax, rax
mov [myvar], rax
ret
It crashes on the move [myvar], rax line with SIGSEGV. I am simply trying to store some zeroes at that address.
Thanks!
PS: Using SASM to build / run / debug with 64 bit option ticked (default settings otherwise), on Windows 10 64 bit.

section .text
myvar db "This is not working", 0
Section .text is an executable section without write permissions. This is done to prevent some kinds of vulnerabilities. You should either place your myvar into a writable section, e.g. .data (if the variable should live for the whole duration of program execution), have the variable on the stack (if it's not supposed to outlive the function where it's created), or change .text to be writable (not recommended for security reasons, but possible).

Related

How to avoid crash during stack buffer overflow exploit?

void deal_msg(unsigned char * buf, int len)
{
unsigned char msg[1024];
strcpy(msg,buf);
//memcpy(msg, buf, len);
puts(msg);
}
void main()
{
// network operation
sock = create_server(port);
len = receive_data(sock, buf);
deal_msg(buf, len);
}
As the pseudocode shows above, the compile environment is vc6 and running environment is windows xp sp3 en. No other protection mechanisms are applied, that is stack can be executed, no ASLR.
The send data is 'A' * 1024 + addr_of_jmp_esp + shellcode.
My question is:
if strcpy is used, the shellcode is generated by msfvenom, msfvenom -p windows/exec cmd=calc.exe -a x86 -b "\x00" -f python,
msfvenom attempts to encode payload with 1 iterations of x86/shikata_ga_nai
after data is sent, no calc pops up, the exploit won't work.
But if memcpy is used, shellcode generated by msfvenom -p windows/exec cmd=calc.exe -a x86 -f python without encoding works.
How to avoid the original program's crash after calc pops up, how to keep stack balance to avoid crash?
Hard to say. I'd use a custom payload (just copy the windows/exec cmd=calc.exe) and put a 0xcc at the start and debug it (or something that will be easily recognizable under the debugger like a ud2 or \0xeb\0xfe). If your payload is executed, you'll see it. Bypass the added instruction (just NOP it) and try to see what can possibly go wrong with the remainder of the payload.
You'll need a custom payload ; Since you're on XP SP3 you don't need to do crazy things.
Don't try to do the overflow and smash the whole stack (given your overflow it seems to be perfect, just enough overflow to control rIP).
See how the target function (deal_msg in your example) behave under normal conditions. Note the stack address when the ret is executed (and if register need to have certain values, this depend on the caller).
Try to replicate that in your shellcode: you'll most probably to adjust the stack pointer a bit at the end of your shellcode.
Make sure the caller (main) stack hasn't been affected when executing the payload. This might happen, in this case reserve enough room on the stack (going to lower addresses), so the caller stack is far from the stack space needed by the payload and it doesn't get affected by the payload execution.
Finally return to the ret of the target or directly after the call of the deal_msg function (or anywhere you see fit, e.g. returning directly to ExitProcess(), but this might be more interesting to return close to the previous "normal" execution path).
All in all, returning somewhere after the payload execution is easy, just push <addr> and ret but you'll need to ensure that the stack is in good shape to continue execution and most of the registers are correctly set.

asm usage of memory location operands

I am in trouble with the definition 'memory location'. According to the 'Intel 64 and IA-32 Software Developer's Manual' many instruction can use a memory location as operand.
For example MOVBE (move data after swapping bytes):
Instruction: MOVBE m32, r32
The question is now how a memory location is defined;
I tried to use variables defined in the .bss section:
section .bss
memory: resb 4 ;reserve 4 byte
memorylen: equ $-memory
section .text
global _start
_start:
MOV R9D, 0x6162630A
MOV [memory], R9D
SHR [memory], 1
MOVBE [memory], R9D
EDIT:->
MOV EAX, 0x01
MOV EBX, 0x00
int 0x80
<-EDIT
If SHR is commented out yasm (yasm -f elf64 .asm) compiles without problems but when executing stdio shows: Illegal Instruction
And if MOVBE is commented out the following error occurs when compiling: error: invalid size for operand 1
How do I have to allocate memory for using the 'm' option shown by the instruction set reference?
[CPU=x64, Compiler=yasm]
If that is all your code, you are falling off at the end into uninitialized region, so you will get a fault. That has nothing to do with allocating memory, which you did right. You need to add code to terminate your program using an exit system call, or at least put an endless loop so you avoid the fault (kill your program using ctrl+c or equivalent).
Update: While the above is true, the illegal instruction here is more likely caused by the fact that your cpu simply does not support the MOVBE instruction, because not all do. If you look in the reference, you can see it says #UD If CPUID.01H:ECX.MOVBE[bit 22] = 0. That is trying to tell you that a particular flag bit in the ECX register returned by the 01 leaf of the CPUID instruction shows support of this instruction. If you are on linux, you can conveniently check in /proc/cpuinfo whether you have the movbe flag or not.
As for the invalid operand size: you should generally specify the operand size when it can not be deduced from the instruction. That said, SHR accepts all sizes (byte, word, dword, qword) so you should really not get that error at all, but you might get an operation of unexpected default size. You should use SHR dword [memory], 1 in this case, and that also makes yasm happy.
Oh, and +1 for reading the intel manual ;)

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.
The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Memory address wont load

I am trying to move the value 0 into the address stored in ax (assume that this is writable for now).
mov ax, 0EC7 ; assume writable
mov BYTE [ax], 0
But, nasm is giving me this error:
error: invalid effective address
Any ideas?
16-bit addressing modes are quite limited. You can use an (optional) offset (a plain number), plus an (optional) base register (bx or bp), plus an (optional) index register (si or di). That's it.
In 32-bit addressing modes, any register can be a base register and any register but esp can be an index register. 32-bit addressing also introduces an (optional) scale (1, 2, 4, or 8) to be multiplied by the index register.
[eax] will work - even in 16-bit code. The assembler generates an "address size override prefix" byte (0x67). If the value in eax exceeds the segment limit (usually 64k), an exception is generated (not handled in real DOS - it just hangs), so be careful with it.

The CPU and Memory (value, register)

When a value is copied from one register to another, what happens to the value
in the source register? What happens to the value in the destination register.
I'll show how it works in simple processors, like DLX or RISC, which are used to study CPU-architecture.
When (AT&T syntax, or copy $R1 to $R2)
mov $R1, $R2
or even (for RISC-styled architecture)
add $R1, 0, $R2
instruction works, CPU will read source operands: R1 from register file and zero from... may be immediate operand or zero-generator; pass both inputs into Arithmetic Logic Unit (ALU). ALU will do an operation which will just pass first source operand to destination (because A+0 = A) and after ALU, destination will be written back to register file (but to R2 slot).
So, Data in source register is only readed and not changed in this operation; data in destination register will be overwritten with copy of source register data. (old state of destination register will be lost with generating of heat.)
At physical level, any register in register file is set of SRAM cells, each of them is the two inverters (bi-stable flip-flop, based on M1,M2,M3,M4) and additional gates for writing and reading:
When we want to overwrite value stored in SRAM cell, we will set BL and -BL according to our data (To store bit 0 - set BL and unset -BL; to store bit 1 - set -BL and unset BL); then the write is enabled for current set (line) of cells (WL is on; it will open M5 and M6). After opening of M5 and M6, BL and -BL will change state of bistable flip-flop (like in SR-latch). So, new value is written and old value is discarded (by leaking charge into BL and -BL).

Resources