How to force LLVM to generate a single 'ret'? - return

Consider the following LLVM IR:
#yyy = external dso_local global i32
#zzz = external dso_local global i64
define void #exec_xxx() {
entry:
%0 = load i32, i32* #yyy, align 4
%1 = icmp eq i32 %0, 0
br i1 %1, label %bb_true, label %bb_false
bb_true:
store i64 0, i64* #zzz, align 8
br label %bb_false
bb_false:
ret void
}
Here we see that it has a single ret. However, the generated code has multiple ret:
exec_xxx: # #exec_xxx
cmp dword ptr [rip + yyy], 0
je .LBB0_1
ret
.LBB0_1: # %bb_true
mov qword ptr [rip + zzz], 0
ret
By some reason one need a single ret in the generated code.
Question: how to force LLVM to generate a single ret?

The LLVMish answer to that is to write a pass. Passes is how LLVM modifies code.
In this case that'll be really simple, you'll need a class declaration that inherits PassInfoMixin<> and reimplements run(Function &, FunctionAnalysisManager &). Your reimplementation should be 15-20 lines of code, I think.
I'll outline it in case you haven't written a pass before. It needs to iterate over the basic blocks and see whether the terminator of each isa<ReturnInst>(), and add the ones that are to a list. After iterating you return early if there are fewer than two returns in the list.
Otherwise you make a new basic block. If the return type isn't void you need to create a phi node and populate it with incoming values from the returns, then create a return that returns the phi node. If it's void you make a new void return. Finally you replace all the old returns with branches to your new block. Done.

Related

Loading function pointer value into a register

I want to load a function pointer address into a register so that I can branch to it later since I cannot branch to an address directly according to ARM assembly branch to address inside register or memory
The function:
void foo(){
printf("Hi");
}
The function pointer in my case:
void* p = &foo; // ex: 0x104ffa7b4
Question - How I can the store the value 0x104ffa7b4 to register ex: x4 and then branch to it ?
ex:
LDR x4, 0x104ffa7b4
BX x4
Note: the instructions should be in the form of hex (ex: ret -> C0035FD6) in order to be written into memory directly.
Edit: 16th Jan 2022 - Solved
I'm sorry for not presenting the issue clearly, I ended up using keystone assembler and assemble a formatted char string of the below instruction.
char *inst[15];
sprintf(inst, "b %p", &foo);
inst -> b 0x104ffa7b4
then I wrote the encoded bytes from keystone to the memory and achieved the jump to foo function.

iOS ARM64 Syscalls

I am learning more about shellcode and making syscalls in arm64 on iOS devices. The device I am testing on is iPhone 6S.
I got the list of syscalls from this link (https://github.com/radare/radare2/blob/master/libr/include/sflib/darwin-arm-64/ios-syscalls.txt).
I learnt that x8 is used for putting the syscall number for arm64 from here (http://arm.ninja/2016/03/07/decoding-syscalls-in-arm64/).
I figured the various registers used to pass in parameters for arm64 should be the same as arm so I referred to this link (https://w3challs.com/syscalls/?arch=arm_strong), taken from https://azeria-labs.com/writing-arm-shellcode/.
I wrote inline assembly in Xcode and here are some snippets
//exit syscall
__asm__ volatile("mov x8, #1");
__asm__ volatile("mov x0, #0");
__asm__ volatile("svc 0x80");
However, the application does not terminate when I stepped over these codes.
char write_buffer[]="console_text";
int write_buffer_size = sizeof(write_buffer);
__asm__ volatile("mov x8,#4;" //arm64 uses x8 for syscall number
"mov x0,#1;" //1 for stdout file descriptor
"mov x1,%0;" //the buffer to display
"mov x2,%1;" //buffer size
"svc 0x80;"
:
:"r"(write_buffer),"r"(write_buffer_size)
:"x0","x1","x2","x8"
);
If this syscall works, it should print out some text in Xcode's console output screen. However, nothing gets printed.
There are many online articles for ARM assembly, some use svc 0x80 and some use svc 0 etc and so there can be a few variations. I tried various methods but I could not get the two code snippets to work.
Can someone provide some guidance?
EDIT:
This is what Xcode shows in its Assembly view when I wrote a C function syscall int return_value=syscall(1,0);
mov x1, sp
mov x30, #0
str x30, [x1]
orr w8, wzr, #0x1
stur x0, [x29, #-32] ; 8-byte Folded Spill
mov x0, x8
bl _syscall
I am not sure why this code was emitted.
The registers used for syscalls are completely arbitrary, and the resources you've picked are certainly wrong for XNU.
As far as I'm aware, the XNU syscall ABI for arm64 is entirely private and subject to change without notice so there's no published standard that it follows, but you can scrape together how it works by getting a copy of the XNU source (as tarballs, or viewing it online if you prefer that), grep for the handle_svc function, and just following the code.
I'm not gonna go into detail on where exactly you find which bits, but the end result is:
The immediate passed to svc is ignored, but the standard library uses svc 0x80.
x16 holds the syscall number
x0 through x8 hold up to 9 arguments*
There are no arguments on the stack
x0 and x1 hold up to 2 return values (e.g. in the case of fork)
The carry bit is used to report an error, in which case x0 holds the error code
* This is used only in the case of an indirect syscall (x16 = 0) with 8 arguments.
* Comments in the XNU source also mention x9, but it seems the engineer who wrote that should brush up on off-by-one errors.
And then it comes to the actual syscall numbers available:
The canonical source for "UNIX syscalls" is the file bsd/kern/syscalls.master in the XNU source tree. Those take syscall numbers from 0 up to about 540 in the latest iOS 13 beta.
The canonical source for "Mach syscalls" is the file osfmk/kern/syscall_sw.c in the XNU source tree. Those syscalls are invoked with negative numbers between -10 and -100 (e.g. -28 would be task_self_trap).
Unrelated to the last point, two syscalls mach_absolute_time and mach_continuous_time can be invoked with syscall numbers -3 and -4 respectively.
A few low-level operations are available through platform_syscall with the syscall number 0x80000000.
This should get you going. As #Siguza mentioned you must use x16 , not x8 for the syscall number.
#import <sys/syscall.h>
char testStringGlobal[] = "helloWorld from global variable\n";
int main(int argc, char * argv[]) {
char testStringOnStack[] = "helloWorld from stack variable\n";
#if TARGET_CPU_ARM64
//VARIANT 1 suggested by #PeterCordes
//an an input it's a file descriptor set to STD_OUT 1 so the syscall write output appears in Xcode debug output
//as an output this will be used for returning syscall return value;
register long x0 asm("x0") = 1;
//as an input string to write
//as an output this will be used for returning syscall return value higher half (in this particular case 0)
register char *x1 asm("x1") = testStringOnStack;
//string length
register long x2 asm("x2") = strlen(testStringOnStack);
//syscall write is 4
register long x16 asm("x16") = SYS_write; //syscall write definition - see my footnote below
//full variant using stack local variables for register x0,x1,x2,x16 input
//syscall result collected in x0 & x1 using "semi" intrinsic assembler
asm volatile(//all args prepared, make the syscall
"svc #0x80"
:"=r"(x0),"=r"(x1) //mark x0 & x1 as syscall outputs
:"r"(x0), "r"(x1), "r"(x2), "r"(x16): //mark the inputs
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
//VARIANT 2
//syscall write for globals variable using "semi" intrinsic assembler
//args hardcoded
//output of syscall is ignored
asm volatile(//prepare x1 with the help of x8 register
"mov x1, %0 \t\n"
//set file descriptor to STD_OUT 1 so it appears in Xcode debug output
"mov x0, #1 \t\n"
//hardcoded length
"mov x2, #32 \t\n"
//syscall write is 4
"mov x16, #0x4 \t\n"
//all args prepared, make the syscall
"svc #0x80"
::"r"(testStringGlobal):
//clobbered registers list
"x1","x0","x2","x16",
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
//VARIANT 3 - only applicable to global variables using "page" address
//which is PC-relative addressing to load addresses at a fixed offset from the current location (PIC code).
//syscall write for global variable using "semi" intrinsic assembler
asm volatile(//set x1 on proper PAGE
"adrp x1,_testStringGlobal#PAGE \t\n" //notice the underscore preceding variable name by convention
//add the offset of the testStringGlobal variable
"add x1,x1,_testStringGlobal#PAGEOFF \t\n"
//set file descriptor to STD_OUT 1 so it appears in Xcode debug output
"mov x0, #1 \t\n"
//hardcoded length
"mov x2, #32 \t\n"
//syscall write is 4
"mov x16, #0x4 \t\n"
//all args prepared, make the syscall
"svc #0x80"
:::
//clobbered registers list
"x1","x0","x2","x16",
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
#endif
#autoreleasepool {
return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
}
EDIT
To #PeterCordes excellent comment, yes there is a syscall numbers definition header <sys/syscall.h> which I included in the above snippet^ in Variant 1. But it's important to mention inside it's defined by Apple like this:
#ifdef __APPLE_API_PRIVATE
#define SYS_syscall 0
#define SYS_exit 1
#define SYS_fork 2
#define SYS_read 3
#define SYS_write 4
I haven't heard of a case yet of an iOS app AppStore rejection due to using a system call directly through svc 0x80 nonetheless it's definitely not public API.
As for the suggested "=#ccc" by #PeterCordes i.e. carry flag (set by syscall upon error) as an output constraint that's not supported as of latest XCode11 beta / LLVM 8.0.0 even for x86 and definitely not for ARM.

How can I store the value 2^128-1 in memory (16 bytes)?

According to this link What are the sizes of tword, oword and yword operands? we can store a number using this convention:
16 bytes (128 bit): oword, DO, RESO, DDQ, RESDQ
I tried the following:
section .data
number do 2538
Unfortunately the following error returns:
Integer supplied to a DT, DO or DY instruction
I don't understand why it doesn't work
If your assembler does not support 128 bit integer constants with do then you can achieve the same thing with dq by splitting the constant into two 64 bit halves, e.g.
section .data
number do 0x000102030405060708090a0b0c0d0e0f
could be implemented as
section .data
number dq 0x08090a0b0c0d0e0f,0x0001020304050607
Unless some other code needs it in memory, it's cheaper to generate on the fly a vector with all 128 bits set to 1 = 0xFF... repeating = 2^128-1:
pcmpeqw xmm0, xmm0 ; xmm0 = 0xFF... repeating
;You can store to memory if you want, e.g. to set a bitmap to all-ones.
movups [rdx], xmm0
See also What are the best instruction sequences to generate vector constants on the fly?
For the use-case you described in comments, there's no reason to mess with static data in .data or .rodata, or static storage in .bss. Just make space on the stack and pass pointers to that.
call_something_by_ref:
sub rsp, 24
pcmpeqw xmm0, xmm0 ; xmm0 = 0xFF... repeating
mov rdi, rsp
movaps [rdi], xmm0 ; one byte shorter than movaps [rsp], xmm0
lea rsi, [rdi+8]
call some_function
add rsp, 24
ret
Notice that this code has no immediate constants larger than 8 bits (for data or addresses), and it only touches memory that's already hot in cache (the bottom of the stack). And yes, store-forwarding does work from wide vector stores to integer loads when some_function dereferences RDI and RSI separately.

return floats to objective-c from arm assembly function

I've written an assembly function that runs fine on an iPhone 4 (32-bit code) as well as on an iPhone 6s (64-bit code). I pass in four floating point numbers from a calling function in objective-c.
Here is the structure I use for the 4 floating point numbers and below that is the prototype for the function - as found at the top of my objective-c code.
struct myValues{ // This is a structure. It is used to conveniently group multiple data items logically.
float A; // I am using it here because i want to return multiple float values from my ASM code
float B; // They get passed in via S0, S1, S2 etc. and they come back out that way too
float number;
float divisor; //gonna be 2.0
}myValues;
struct myValues my_asm(float e, float f, float g, float h); // Prototype for the ASM function
Down in my objective-c code I call my assembly function like this:
myValues = my_asm(myValues.A, myValues.B, myValues.number,myValues.divisor); // ASM function
When running against the iPhone 6S the code runs like a champ (64-bit code). The 4 floating point values are passed from the objective-c code to the assembly code via the ARM single float registers S0-S4. The results returned are also passed via S0-S4.
When running against the iPhone 4 the code runs fine as well (32-bit code). The 4 floating point values are passed from the obj-c code to the assembly code via the ARM single float registers S0, S2, S4, and S6 (not sure why skips odd registers). The code runs fine but the values that get returned to my obj-c structure are garbage.
Where/how do i pass floating point values from the ARM 32-bit code so they arrive back in the obj-c structure?
thanks,
relayman357
p.s. Below is my assembly code from my Xcode S file.
.ios_version_min 9, 0
.globl _my_asm
.align 2
#ifdef __arm__
.thumb_func _my_asm
.syntax unified
.code 16
_my_asm: // 32 bit code
// S0 = A, S2 = B, S4 = Number, S6 = 2.0 - parameters passed in when called by the function
vadd.f32 s0, s0, s2
vdiv.f32 s0, s0, s6
vdiv.f32 s1, s4, s0
vcvt.u32.f32 r0,s0
bx lr
//ret
#else
_my_asm: // 64 bit code
//add W0, W0, W1
; S0 = A, S1 = B, S2 = Number, S3 = 2.0 parameters passed in when called by the function
fadd s0, s0, s1
fdiv s0, s0, s3
fdiv s1, s2, s0
ret
#endif
Neither of your functions are correctly returning a structure. You need to understand the ARM ABI. You can start by reading Apple's iOS ABI Function Call Guide. If after studying the ABI you do not understand ask a new question showing what you've tried.
HTH

Forth and processor flags

Why doesn't Forth use processor flags for conditional execution?
Instead the result of a comparison is placed on the parameter stack. Is it because the inner interpreter loop may alter flags when going to the next instruction? Or is it simply to abstract conditional logic?
E.g. on x86 the flags register holds results of a comparison as most processors if not all will have a flags register.
As Forth is a stack-based language, in order to define the operations inside the language, you must define the result to alter something that is inside the language. The flags register isn't in the language. Obviously in case of an optimizing compiler, whatever approach that gives the same final result is equally acceptable.
It depends on the Forth, and the level of optimization.
: tt 0 if ." true" else ." false" then ;
In SwiftForth (x86_64 GNU/Linux):
see tt
808376F 4 # EBP SUB 83ED04
8083772 EBX 0 [EBP] MOV 895D00
8083775 0 # EBX MOV BB00000000
808377A EBX EBX OR 09DB
808377C 0 [EBP] EBX MOV 8B5D00
808377F 4 [EBP] EBP LEA 8D6D04
8083782 808379D JZ 0F8415000000
8083788 804D06F ( (S") ) CALL E8E298FCFF
808378D "true"
8083793 804C5BF ( TYPE ) CALL E8278EFCFF
8083798 80837AE JMP E911000000
808379D 804D06F ( (S") ) CALL E8CD98FCFF
80837A2 "false"
80837A9 804C5BF ( TYPE ) CALL E8118EFCFF
80837AE RET C3 ok
In Gforth:
see tt
: tt
0
IF .\" true"
ELSE .\" false"
THEN ; ok

Resources