return floats to objective-c from arm assembly function - ios

I've written an assembly function that runs fine on an iPhone 4 (32-bit code) as well as on an iPhone 6s (64-bit code). I pass in four floating point numbers from a calling function in objective-c.
Here is the structure I use for the 4 floating point numbers and below that is the prototype for the function - as found at the top of my objective-c code.
struct myValues{ // This is a structure. It is used to conveniently group multiple data items logically.
float A; // I am using it here because i want to return multiple float values from my ASM code
float B; // They get passed in via S0, S1, S2 etc. and they come back out that way too
float number;
float divisor; //gonna be 2.0
}myValues;
struct myValues my_asm(float e, float f, float g, float h); // Prototype for the ASM function
Down in my objective-c code I call my assembly function like this:
myValues = my_asm(myValues.A, myValues.B, myValues.number,myValues.divisor); // ASM function
When running against the iPhone 6S the code runs like a champ (64-bit code). The 4 floating point values are passed from the objective-c code to the assembly code via the ARM single float registers S0-S4. The results returned are also passed via S0-S4.
When running against the iPhone 4 the code runs fine as well (32-bit code). The 4 floating point values are passed from the obj-c code to the assembly code via the ARM single float registers S0, S2, S4, and S6 (not sure why skips odd registers). The code runs fine but the values that get returned to my obj-c structure are garbage.
Where/how do i pass floating point values from the ARM 32-bit code so they arrive back in the obj-c structure?
thanks,
relayman357
p.s. Below is my assembly code from my Xcode S file.
.ios_version_min 9, 0
.globl _my_asm
.align 2
#ifdef __arm__
.thumb_func _my_asm
.syntax unified
.code 16
_my_asm: // 32 bit code
// S0 = A, S2 = B, S4 = Number, S6 = 2.0 - parameters passed in when called by the function
vadd.f32 s0, s0, s2
vdiv.f32 s0, s0, s6
vdiv.f32 s1, s4, s0
vcvt.u32.f32 r0,s0
bx lr
//ret
#else
_my_asm: // 64 bit code
//add W0, W0, W1
; S0 = A, S1 = B, S2 = Number, S3 = 2.0 parameters passed in when called by the function
fadd s0, s0, s1
fdiv s0, s0, s3
fdiv s1, s2, s0
ret
#endif

Neither of your functions are correctly returning a structure. You need to understand the ARM ABI. You can start by reading Apple's iOS ABI Function Call Guide. If after studying the ABI you do not understand ask a new question showing what you've tried.
HTH

Related

Loading function pointer value into a register

I want to load a function pointer address into a register so that I can branch to it later since I cannot branch to an address directly according to ARM assembly branch to address inside register or memory
The function:
void foo(){
printf("Hi");
}
The function pointer in my case:
void* p = &foo; // ex: 0x104ffa7b4
Question - How I can the store the value 0x104ffa7b4 to register ex: x4 and then branch to it ?
ex:
LDR x4, 0x104ffa7b4
BX x4
Note: the instructions should be in the form of hex (ex: ret -> C0035FD6) in order to be written into memory directly.
Edit: 16th Jan 2022 - Solved
I'm sorry for not presenting the issue clearly, I ended up using keystone assembler and assemble a formatted char string of the below instruction.
char *inst[15];
sprintf(inst, "b %p", &foo);
inst -> b 0x104ffa7b4
then I wrote the encoded bytes from keystone to the memory and achieved the jump to foo function.

iOS ARM64 Syscalls

I am learning more about shellcode and making syscalls in arm64 on iOS devices. The device I am testing on is iPhone 6S.
I got the list of syscalls from this link (https://github.com/radare/radare2/blob/master/libr/include/sflib/darwin-arm-64/ios-syscalls.txt).
I learnt that x8 is used for putting the syscall number for arm64 from here (http://arm.ninja/2016/03/07/decoding-syscalls-in-arm64/).
I figured the various registers used to pass in parameters for arm64 should be the same as arm so I referred to this link (https://w3challs.com/syscalls/?arch=arm_strong), taken from https://azeria-labs.com/writing-arm-shellcode/.
I wrote inline assembly in Xcode and here are some snippets
//exit syscall
__asm__ volatile("mov x8, #1");
__asm__ volatile("mov x0, #0");
__asm__ volatile("svc 0x80");
However, the application does not terminate when I stepped over these codes.
char write_buffer[]="console_text";
int write_buffer_size = sizeof(write_buffer);
__asm__ volatile("mov x8,#4;" //arm64 uses x8 for syscall number
"mov x0,#1;" //1 for stdout file descriptor
"mov x1,%0;" //the buffer to display
"mov x2,%1;" //buffer size
"svc 0x80;"
:
:"r"(write_buffer),"r"(write_buffer_size)
:"x0","x1","x2","x8"
);
If this syscall works, it should print out some text in Xcode's console output screen. However, nothing gets printed.
There are many online articles for ARM assembly, some use svc 0x80 and some use svc 0 etc and so there can be a few variations. I tried various methods but I could not get the two code snippets to work.
Can someone provide some guidance?
EDIT:
This is what Xcode shows in its Assembly view when I wrote a C function syscall int return_value=syscall(1,0);
mov x1, sp
mov x30, #0
str x30, [x1]
orr w8, wzr, #0x1
stur x0, [x29, #-32] ; 8-byte Folded Spill
mov x0, x8
bl _syscall
I am not sure why this code was emitted.
The registers used for syscalls are completely arbitrary, and the resources you've picked are certainly wrong for XNU.
As far as I'm aware, the XNU syscall ABI for arm64 is entirely private and subject to change without notice so there's no published standard that it follows, but you can scrape together how it works by getting a copy of the XNU source (as tarballs, or viewing it online if you prefer that), grep for the handle_svc function, and just following the code.
I'm not gonna go into detail on where exactly you find which bits, but the end result is:
The immediate passed to svc is ignored, but the standard library uses svc 0x80.
x16 holds the syscall number
x0 through x8 hold up to 9 arguments*
There are no arguments on the stack
x0 and x1 hold up to 2 return values (e.g. in the case of fork)
The carry bit is used to report an error, in which case x0 holds the error code
* This is used only in the case of an indirect syscall (x16 = 0) with 8 arguments.
* Comments in the XNU source also mention x9, but it seems the engineer who wrote that should brush up on off-by-one errors.
And then it comes to the actual syscall numbers available:
The canonical source for "UNIX syscalls" is the file bsd/kern/syscalls.master in the XNU source tree. Those take syscall numbers from 0 up to about 540 in the latest iOS 13 beta.
The canonical source for "Mach syscalls" is the file osfmk/kern/syscall_sw.c in the XNU source tree. Those syscalls are invoked with negative numbers between -10 and -100 (e.g. -28 would be task_self_trap).
Unrelated to the last point, two syscalls mach_absolute_time and mach_continuous_time can be invoked with syscall numbers -3 and -4 respectively.
A few low-level operations are available through platform_syscall with the syscall number 0x80000000.
This should get you going. As #Siguza mentioned you must use x16 , not x8 for the syscall number.
#import <sys/syscall.h>
char testStringGlobal[] = "helloWorld from global variable\n";
int main(int argc, char * argv[]) {
char testStringOnStack[] = "helloWorld from stack variable\n";
#if TARGET_CPU_ARM64
//VARIANT 1 suggested by #PeterCordes
//an an input it's a file descriptor set to STD_OUT 1 so the syscall write output appears in Xcode debug output
//as an output this will be used for returning syscall return value;
register long x0 asm("x0") = 1;
//as an input string to write
//as an output this will be used for returning syscall return value higher half (in this particular case 0)
register char *x1 asm("x1") = testStringOnStack;
//string length
register long x2 asm("x2") = strlen(testStringOnStack);
//syscall write is 4
register long x16 asm("x16") = SYS_write; //syscall write definition - see my footnote below
//full variant using stack local variables for register x0,x1,x2,x16 input
//syscall result collected in x0 & x1 using "semi" intrinsic assembler
asm volatile(//all args prepared, make the syscall
"svc #0x80"
:"=r"(x0),"=r"(x1) //mark x0 & x1 as syscall outputs
:"r"(x0), "r"(x1), "r"(x2), "r"(x16): //mark the inputs
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
//VARIANT 2
//syscall write for globals variable using "semi" intrinsic assembler
//args hardcoded
//output of syscall is ignored
asm volatile(//prepare x1 with the help of x8 register
"mov x1, %0 \t\n"
//set file descriptor to STD_OUT 1 so it appears in Xcode debug output
"mov x0, #1 \t\n"
//hardcoded length
"mov x2, #32 \t\n"
//syscall write is 4
"mov x16, #0x4 \t\n"
//all args prepared, make the syscall
"svc #0x80"
::"r"(testStringGlobal):
//clobbered registers list
"x1","x0","x2","x16",
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
//VARIANT 3 - only applicable to global variables using "page" address
//which is PC-relative addressing to load addresses at a fixed offset from the current location (PIC code).
//syscall write for global variable using "semi" intrinsic assembler
asm volatile(//set x1 on proper PAGE
"adrp x1,_testStringGlobal#PAGE \t\n" //notice the underscore preceding variable name by convention
//add the offset of the testStringGlobal variable
"add x1,x1,_testStringGlobal#PAGEOFF \t\n"
//set file descriptor to STD_OUT 1 so it appears in Xcode debug output
"mov x0, #1 \t\n"
//hardcoded length
"mov x2, #32 \t\n"
//syscall write is 4
"mov x16, #0x4 \t\n"
//all args prepared, make the syscall
"svc #0x80"
:::
//clobbered registers list
"x1","x0","x2","x16",
//inform the compiler we read the memory
"memory",
//inform the compiler we clobber carry flag (during the syscall itself)
"cc");
#endif
#autoreleasepool {
return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
}
EDIT
To #PeterCordes excellent comment, yes there is a syscall numbers definition header <sys/syscall.h> which I included in the above snippet^ in Variant 1. But it's important to mention inside it's defined by Apple like this:
#ifdef __APPLE_API_PRIVATE
#define SYS_syscall 0
#define SYS_exit 1
#define SYS_fork 2
#define SYS_read 3
#define SYS_write 4
I haven't heard of a case yet of an iOS app AppStore rejection due to using a system call directly through svc 0x80 nonetheless it's definitely not public API.
As for the suggested "=#ccc" by #PeterCordes i.e. carry flag (set by syscall upon error) as an output constraint that's not supported as of latest XCode11 beta / LLVM 8.0.0 even for x86 and definitely not for ARM.

AVX2: How to move a single element from vector to vector

The problem is:
a1[7] = b[6];
a1[15] = b[14];
a2[7] = b[0];
a2[15] = b[8];
All three vectors are uint8x16_t
On aarch64 NEON, it would be rather trivial:
mov a1.b[7], b.b[6]
mov a1.b[15], b.b[14]
mov a2.b[7], b.b[0]
mov a2.b[15], b.b[8]
How can I do this on AVX2?
I already loaded the vectors accordingly into __m256i a, b; where b contains the same 128bit vector twice, and then:
const __m256i shuffle=_mm256_set_epi64x(0x0808080808080808, 0x0000000000000000, \
0x0e0e0e0e0e0e0e0e, 0x0606060606060606);
const __m256i mask=_mm256_set1_epi64x(0x8000000000000000);
.
.
.
b = _mm256_shuffle_epi8(b, shuffle);
a = _mm256_blendv_epi8(a, b, mask);
Yes, it works the way I want, but I cannot get rid of the feeling that it's anything else than optimal, sacrificing two registers for this kind of trivial operations.
Am I missing something? Are there more efficient ways dealing with this problem?
Shall I modify this to a 64-bit shift then blend? That would need the same number of registers and instructions. Any suggestions?
Pleas note that I cannot overwrite other lanes in a
Thanks in advance.

RGBA to ABGR: Inline arm neon asm for iOS/Xcode

This code(very similar code, haven't tried exactly this code) compiles using Android NDK, but not with Xcode/armv7+arm64/iOS
Errors in comments:
uint32_t *src;
uint32_t *dst;
#ifdef __ARM_NEON
__asm__ volatile(
"vld1.32 {d0, d1}, [%[src]] \n" // error: Vector register expected
"vrev32.8 q0, q0 \n" // error: Unrecognized instruction mnemonic
"vst1.32 {d0, d1}, [%[dst]] \n" // error: Vector register expected
:
: [src]"r"(src), [dst]"r"(dst)
: "d0", "d1"
);
#endif
What's wrong with this code?
EDIT1:
I rewrote the code using intrinsics:
uint8x16_t x = vreinterpretq_u8_u32(vld1q_u32(src));
uint8x16_t y = vrev32q_u8(x);
vst1q_u32(dst, vreinterpretq_u32_u8(y));
After disassembling, I get the following, which is a variation I have already tried:
vld1.32 {d16, d17}, [r0]!
vrev32.8 q8, q8
vst1.32 {d16, d17}, [r1]!
So my code looks like this now, but gives the exact same errors:
__asm__ volatile("vld1.32 {d0, d1}, [%0]! \n"
"vrev32.8 q0, q0 \n"
"vst1.32 {d0, d1}, [%1]! \n"
:
: "r"(src), "r"(dst)
: "d0", "d1"
);
EDIT2:
Reading through the disassembly, I actually found a second version of the function. It turns out that arm64 uses a slightly different instruction set. For example, the arm64 assembly uses rev32.16b v0, v0 instead. The whole function listing(which I can't make heads or tails of) is below:
_My_Function:
cmp w2, #0
add w9, w2, #3
csel w8, w9, w2, lt
cmp w9, #7
b.lo 0x3f4
asr w9, w8, #2
ldr x8, [x0]
mov w9, w9
lsl x9, x9, #2
ldr q0, [x8], #16
rev32.16b v0, v0
str q0, [x1], #16
sub x9, x9, #16
cbnz x9, 0x3e0
ret
I have successfully published several iOS apps which make use of ARM assembly language and inline code is the most frustrating way to do it. Apple still requires apps to support both ARM32 and ARM64 devices. Since the code will be built as both ARM32 and ARM64 by default (unless you changed the compile options), you need to design code which will successfully compile in both modes. As you noticed, ARM64 is a completely different mnemonic format and register model. There are 2 simple ways around this:
1) Write your code using NEON intrinsics. ARM specified that the original ARM32 intrinsics would remain mostly unchanged for ARMv8 targets and therefore can be compiled to both ARM32 and ARM64 code. This is the safest/easiest option.
2) Write inline code or a separate '.S' module for your assembly language code. To deal with the 2 compile modes, use "#ifdef __arm64__" and "#ifdef __arm__" to distinguish between the two instruction sets.
Intrinsics are apparently the only way to use the same code for NEON between ARM (32-bit) and AArch64.
There are many reasons not to use inline-assembly: https://gcc.gnu.org/wiki/DontUseInlineAsm
Unfortunately, current compilers often do a very poor job with ARM / AArch64 intrinsics, which is surprising because they do an excellent job optimizing x86 SSE/AVX intrinsics and PowerPC Altivec. They often do ok in simple cases, but can easily introduce extra store/reloads.
In theory with intrinsics, you should get good asm output, and it lets the compiler schedule instructions between the vector load and store, which will help most on an in-order core. (Or you could write a whole loop in inline asm that you schedule by hand.)
ARM's official documentation:
Although it is technically possible to optimize NEON assembly by hand, this can be very difficult because the pipeline and memory access timings have complex inter-dependencies. Instead of hand assembly, ARM strongly recommends the use of intrinsics
If you do use inline asm anyway, avoid future pain by getting it right.
It's easy to write inline asm that happens to work, but isn't safe wrt. future source changes (and sometimes to future compiler optimizations), because the constraints don't accurately describe what the asm does. The symptoms will be weird, and this kind of context-sensitive bug could even lead to unit tests passing but wrong code in the main program. (or vice versa).
A latent bug that doesn't cause any defects in the current build is still a bug, and is a really Bad Thing in a Stackoverflow answer that can be copied as an example into other contexts. #bitwise's code in the question and self-answer both have bugs like this.
The inline asm in the question isn't safe, because it modifies memory telling the compiler about it. This probably only manifests in a loop that reads from dst in C both before and after the inline asm. However, it's easy to fix, and doing so lets us drop the volatile (and the `"memory" clobber which it's missing) so the compiler can optimize better (but still with significant limitations compared to intrinsics).
volatile should prevent reordering relative to memory accesses, so it may not happen outside of fairly contrived circumstances. But that's hard to prove.
The following compiles for ARM and AArch64 (it might fail if compiling for ILP32 on AArch64, though, I forgot about that possibility). Using -funroll-loops leads to gcc choosing different addressing modes, and not forcing the dst++; src++; to happen between every inline asm statement. (This maybe wouldn't be possible with asm volatile).
I used memory operands so the compiler knows that memory is an input and an output, and giving the compiler the option to use auto-increment / decrement addressing modes. This is better than anything you can do with a pointer in a register as an input operand, because it allows loop unrolling to work.
This still doesn't let the compiler schedule the store many instructions after the corresponding load to software pipeline the loop for in-order cores, so it's probably only going to perform decently on out-of-order ARM chips.
void bytereverse32(uint32_t *dst32, const uint32_t *src32, size_t len)
{
typedef struct { uint64_t low, high; } vec128_t;
const vec128_t *src = (const vec128_t*) src32;
vec128_t *dst = (vec128_t*) dst32;
// with old gcc, this gets gcc to use a pointer compare as the loop condition
// instead of incrementing a loop counter
const vec128_t *src_endp = src + len/(sizeof(vec128_t)/sizeof(uint32_t));
// len is in units of 4-byte chunks
while (src < src_endp) {
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
#if __LP64__ // FIXME: doesn't account for ILP32 in 64-bit mode
// aarch64 registers: s0 and d0 are subsets of q0 (128bit), synonym for v0
asm ("ldr q0, %[src] \n\t"
"rev32.16b v0, v0 \n\t"
"str q0, %[dst] \n\t"
: [dst] "=<>m"(*dst) // auto-increment/decrement or "normal" memory operand
: [src] "<>m" (*src)
: "q0", "v0"
);
#else
// arm32 registers: 128bit q0 is made of d0:d1, or s0:s3
asm ("vld1.32 {d0, d1}, %[src] \n\t"
"vrev32.8 q0, q0 \n\t" // reverse 8 bit elements inside 32bit words
"vst1.32 {d0, d1}, %[dst] \n"
: [dst] "=<>m"(*dst)
: [src] "<>m"(*src)
: "d0", "d1"
);
#endif
#else
#error "no NEON"
#endif
// increment pointers by 16 bytes
src++; // The inline asm doesn't modify the pointers.
dst++; // of course, these increments may compile to a post-increment addressing mode
// this way has the advantage of letting the compiler unroll or whatever
}
}
This compiles (on the Godbolt compiler explorer with gcc 4.8), but I don't know if it assembles, let alone works correctly. Still, I'm confident these operand constraints are correct. Constraints are basically the same across all architectures, and I understand them much better than I know NEON.
Anyway, the inner loop on ARM (32bit) with gcc 4.8 -O3, without -funroll-loops is:
.L4:
vld1.32 {d0, d1}, [r1], #16 # MEM[(const struct vec128_t *)src32_17]
vrev32.8 q0, q0
vst1.32 {d0, d1}, [r0], #16 # MEM[(struct vec128_t *)dst32_18]
cmp r3, r1 # src_endp, src32
bhi .L4 #,
The register constraint bug
The code in the OP's self-answer has another bug: The input pointer operands uses separate "r" constraints. This leads to breakage if the compiler wants to keep the old value around, and chooses an input register for src that isn't the same as the output register.
If you want to take pointer inputs in registers and choose your own addressing modes, you can use "0" matching-constraints, or you can use "+r" read-write output operands.
You will also need a "memory" clobber or dummy memory input/output operands (i.e. that tell the compiler which bytes of memory are read and written, even if you don't use that operand number in the inline asm).
See Looping over arrays with inline assembly for a discussion of the advantages and disadvantages of using r constraints for looping over an array on x86. ARM has auto-increment addressing modes, which appear to produce better code than anything you could get with manual choice of addressing modes. It lets gcc use different addressing modes in different copies of the block when loop-unrolling. "r" (pointer) constraints appear to have no advantage, so I won't go into detail about how to use a dummy input / output constraint to avoid needing a "memory" clobber.
Test-case that generates wrong code with #bitwise's asm statement:
// return a value as a way to tell the compiler it's needed after
uint32_t* unsafe_asm(uint32_t *dst, const uint32_t *src)
{
uint32_t *orig_dst = dst;
uint32_t initial_dst0val = orig_dst[0];
#ifdef __ARM_NEON
#if __LP64__
asm volatile("ldr q0, [%0], #16 # unused src input was %2\n\t"
"rev32.16b v0, v0 \n\t"
"str q0, [%1], #16 # unused dst input was %3\n"
: "=r"(src), "=r"(dst)
: "r"(src), "r"(dst)
: "d0", "d1" // ,"memory"
// clobbers don't include v0?
);
#else
asm volatile("vld1.32 {d0, d1}, [%0]! # unused src input was %2\n\t"
"vrev32.8 q0, q0 \n\t"
"vst1.32 {d0, d1}, [%1]! # unused dst input was %3\n"
: "=r"(src), "=r"(dst)
: "r"(src), "r"(dst)
: "d0", "d1" // ,"memory"
);
#endif
#else
#error "No NEON/AdvSIMD"
#endif
uint32_t final_dst0val = orig_dst[0];
// gcc assumes the asm doesn't change orig_dst[0], so it only does one load (after the asm)
// and uses it for final and initial
// uncomment the memory clobber, or use a dummy output operand, to avoid this.
// pointer + initial+final compiles to LSL 3 to multiply by 8 = 2 * sizeof(uint32_t)
// using orig_dst after the inline asm makes the compiler choose different registers for the
// "=r"(dst) output operand and the "r"(dst) input operand, since the asm constraints
// advertise this non-destructive capability.
return orig_dst + final_dst0val + initial_dst0val;
}
This compiles to (AArch64 gcc4.8 -O3):
ldr q0, [x1], #16 # unused src input was x1 // src, src
rev32.16b v0, v0
str q0, [x2], #16 # unused dst input was x0 // dst, dst
ldr w1, [x0] // D.2576, *dst_1(D)
add x0, x0, x1, lsl 3 //, dst, D.2576,
ret
The store uses x2 (an uninitialized register, since this function only takes 2 args). The "=r"(dst) output (%1) picked x2, while the "r"(dst) input (%3 which is used only in a comment) picked x0.
final_dst0val + initial_dst0val compiles to 2x final_dst0val, because we lied to the compiler and told it that memory wasn't modified. So instead of reading the same memory before and after the inline asm statement, it just reads after and left-shifts by one extra position when adding to the pointer. (The return value exists only to use the values so they're not optimized away).
We can fix both problems by correcting the constraints: using "+r" for the pointers and adding a "memory" clobber. (A dummy output would also work, and might hurt optimization less.) I didn't bother since this appears to have no advantage over the memory-operand version above.
With those changes, we get
safe_register_pointer_asm:
ldr w3, [x0] //, *dst_1(D)
mov x2, x0 // dst, dst ### These 2 insns are new
ldr q0, [x1], #16 // src
rev32.16b v0, v0
str q0, [x2], #16 // dst
ldr w1, [x0] // D.2597, *dst_1(D)
add x3, x1, x3, uxtw // D.2597, D.2597, initial_dst0val ## And this is new, to add the before and after loads
add x0, x0, x3, lsl 2 //, dst, D.2597,
ret
As stated in the edits to the original question, it turned out that I needed a different assembly implementation for arm64 and armv7.
#ifdef __ARM_NEON
#if __LP64__
asm volatile("ldr q0, [%0], #16 \n"
"rev32.16b v0, v0 \n"
"str q0, [%1], #16 \n"
: "=r"(src), "=r"(dst)
: "r"(src), "r"(dst)
: "d0", "d1"
);
#else
asm volatile("vld1.32 {d0, d1}, [%0]! \n"
"vrev32.8 q0, q0 \n"
"vst1.32 {d0, d1}, [%1]! \n"
: "=r"(src), "=r"(dst)
: "r"(src), "r"(dst)
: "d0", "d1"
);
#endif
#else
The intrinsics code that I posted in the original post generated surprisingly good assembly though, and also generated the arm64 version for me, so it may be a better idea to use intrinsics instead in the future.

Run-time Stack of C code

I don't know if the title of my question is right, but I'm studying run-time stacks and I have the following C code:
int main() {
int a, b, c , x;
a = 4;
b = 5
c = 6
x = func(a, b, c);
return;
}
int func(int x, int y, int z) {
int p, q, r;
p = x*x;
q = y/z;
r = p + q;
return r;
}
This is compiled and loaded in location x3000.
I'm dealing with simulated computer called lc3. I need to find out how the run-time stack would look when this code is executed. My understanding of the topic is too limited in order to actually solve this, but here is how I think it should look:
x0000
(I don't know how the return should look either)
(Assignments that I don't know how to interpret)
r
q
p
main's frame pointer
Return address to main
Return value to main
x a
y b
z c
(I don't know the assignments should look in the run-time stack)
x
c
b
a
xEFFF
I hope someone can offer me some clarity in this subject. Thank you in advance.
Ok, this all depends on the ABI you are using. If it is anything similar to SystemV x86 Abi (the one in 32-bit linuxes). It should look like what you have described. (I have modified my answer to match what the wikipedia describes for LC-3)
first of all, you reach main(), and have 4 local variables, each of one is an int. (Assuming each int is 4 bytes, and stack aligned to 4 bytes), they will be stored in:
0xEFFC: a
0xEFF8: b
0xEFF4: c
0xEFF0: x
Then you call a function, namely func(). The LC-3 abi says that the parameters must be passed on the stack, from rigth to left:
0xEFEC: z --> c
0xEFE8: y --> b
0xEFE4: x --> a
Then you should save space for the return value, put the return address, and save your R5:
0xEFE0: Return value
0xEFDC: Return address to main
0xEFD8: Space for R5
Local variables again:
0xEFD4: p
0xEFD0: q
0xEFCC: r
On modern systems, the return value can be passed in registers (like EAX). It depends on your ABI where to pass it. Probably it can be returned on the stack too.
Another thing is that each function can create an stack frame, pushing into the stack the parent's Base Stack Address, and assuming their stack starts from that address.
You should probably have a document in where all these things are defined.

Resources