32-bit ADD on Aarch64 assembly - arm64

This is my first post here and I'm also kind of new to arm64 assembly, so I'm trying to do some arithmetic, but for example when I try to do an addition it seems to do it in 32-bit.
Here's my code:
.data
msg: .asciz "Value 1: "
msg2: .asciz "Value 2: "
result: .asciz "Result: %d\n"
fmt: .asciz "%d"
.balign 8
value1: .quad 0
.balign 8
value2: .quad 0
.balign 16
lr_value: .quad 0
.text
.global main
main:
adr x0, lr_value
str x30, [x0]
//Display message
adr x0,msg
bl printf
//Input first value
adr x0,fmt
adr x1,value1
bl scanf
//Display second message
adr x0,msg2
bl printf
//Input second value
adr x0,fmt
adr x1,value2
bl scanf
//Load first and second value
adr x1,value1
ldr x1,[x1]
adr x2,value2
ldr x2,[x2]
//Add both values on x1
add x1,x1,x2
//Show result
adr x0,result
bl printf
adr x0,lr_value
ldr x30,[x0]
mov w0,#0
ret
And here's the output:
Value 1: 2147483647
Value 2: 1
Result: -2147483648
What am I doing wrong? I've also tried multiplication and substraction
Edit: Solved it, turns out I had to use %ld instead of %d, thank you Nate Eldredge!

Related

What would be the best approach patch-finding the pointer of a certain function on the XNU Kernel?

I am currently working on an iOS Jailbreak for iOS 13.7.
As part of the jailbreak, I need to do a series of patches to the XNU Kernel live in the memory.
Of course, the kernel is protected by kASLR, KPP / KTRR, and other memory watchdogs that would trigger a Kernel Panic if something is modified.
As luck would have it, KTRR (Kernel Text Ready Only Region) can only protect, well, static data that is not supposed to change (i.e. the TEXT section and constants). The variables can still be altered.
I am building a PatchFinder which is supposed to locate a function or a variable in the XNU memory based on tell-tale symbols and I am wondering what would be the most effective approach for this.
I am currently adapting on top of the PatchFinder made publicly available back in the iOS 8 era by in7egal which looks like this:
uint32_t find_cs_enforcement_disable_amfi(uint32_t region, uint8_t* kdata, size_t ksize)
{
// Find a function referencing cs_enforcement_disable_amfi
const uint8_t search_function[] = {0x20, 0x68, 0x40, 0xF4, 0x40, 0x70, 0x20, 0x60, 0x00, 0x20, 0x90, 0xBD};
uint8_t* ptr = memmem(kdata, ksize, search_function, sizeof(search_function));
if(!ptr)
return 0;
// Only LDRB in there should try to dereference cs_enforcement_disable_amfi
uint16_t* ldrb = find_last_insn_matching(region, kdata, ksize, (uint16_t*) ptr, insn_is_ldrb_imm);
if(!ldrb)
return 0;
// Weird, not the right one.
if(insn_ldrb_imm_imm(ldrb) != 0 || insn_ldrb_imm_rt(ldrb) > 12)
return 0;
// See what address that LDRB is dereferencing
return find_pc_rel_value(region, kdata, ksize, ldrb, insn_ldrb_imm_rn(ldrb));
}
I wonder if there is any faster way or a more reliable way to locate the cs_enforcement_disable_amfi.
Once found by the PatchFinder in the XNU Kernel memory, it's used like this:
uint32_t cs_enforcement_disable_amfi = find_cs_enforcement_disable_amfi(kernel_base, kdata, ksize);
printf("cs_enforcement_disable_amfi is at=0x%08x\n",cs_enforcement_disable_amfi);
if (cs_enforcement_disable_amfi){
char patch[] ="\x00\xbf\x00\xbf\x00\xbf\x00\xbf\x00\xbf";
kern_return_t kernret = vm_write(proccessTask, cs_enforcement_disable_amfi+kernel_base, patch, sizeof(patch)-1);
if (kernret == KERN_SUCCESS){
printf("Successfully patched cs_enforcement_disable_amfi\n");
}
}
So the PatchFinder has to be able to reliably return the pointer to cs_enforcement_disable_amfi otherwise I am blindly writing to an invalid (or valid but different) address which almost certainly will trigger memory corruption.
The current code does return a valid pointer to cs_enforcement_disable_amfi most of the time, but randomly panics the kernel about 10-15% of the time which means the address it returns 10-15% of the time is invalid. Not sure how to make it more reliable.
The variable you're looking for doesn't exist anymore.
The bytes in your first snippet make up Thumb instructions, which find this function in AMFI in a 32bit kernelcache:
0x8074ad04 90b5 push {r4, r7, lr}
0x8074ad06 01af add r7, sp, 4
0x8074ad08 0d48 ldr r0, [0x8074ad40]
0x8074ad0a 7844 add r0, pc
0x8074ad0c 0078 ldrb r0, [r0]
0x8074ad0e 0128 cmp r0, 1
0x8074ad10 03d1 bne 0x8074ad1a
0x8074ad12 0020 movs r0, 0
0x8074ad14 00f04efa bl 0x8074b1b4
0x8074ad18 30b9 cbnz r0, 0x8074ad28
0x8074ad1a 7c69 ldr r4, [r7, 0x14]
0x8074ad1c 002c cmp r4, 0
0x8074ad1e 05d0 beq 0x8074ad2c
0x8074ad20 2068 ldr r0, [r4]
0x8074ad22 40f44070 orr r0, r0, 0x300
0x8074ad26 2060 str r0, [r4]
0x8074ad28 0020 movs r0, 0
0x8074ad2a 90bd pop {r4, r7, pc}
Given the magic constant 0x300 and the fact that AMFI's __TEXT_EXEC segment is quite small, we can easily find this in other kernels, including 64bit ones.
This is what it looks like on an iPhone 5s on 8.4:
0xffffff800268d2e4 f44fbea9 stp x20, x19, [sp, -0x20]!
0xffffff800268d2e8 fd7b01a9 stp x29, x30, [sp, 0x10]
0xffffff800268d2ec fd430091 add x29, sp, 0x10
0xffffff800268d2f0 f30307aa mov x19, x7
0xffffff800268d2f4 e8fc1110 adr x8, section.com.apple.driver.AppleMobileFileIntegrity.10.__DATA.__bss
0xffffff800268d2f8 1f2003d5 nop
0xffffff800268d2fc 08054039 ldrb w8, [x8, 1]
0xffffff800268d300 a8000037 tbnz w8, 0, 0xffffff800268d314
0xffffff800268d304 130100b4 cbz x19, 0xffffff800268d324
0xffffff800268d308 680240b9 ldr w8, [x19]
0xffffff800268d30c 08051832 orr w8, w8, 0x300
0xffffff800268d310 680200b9 str w8, [x19]
0xffffff800268d314 00008052 mov w0, 0
0xffffff800268d318 fd7b41a9 ldp x29, x30, [sp, 0x10]
0xffffff800268d31c f44fc2a8 ldp x20, x19, [sp], 0x20
0xffffff800268d320 c0035fd6 ret
But by the time of iOS 11, the variable is gone:
0xfffffff006245d84 f44fbea9 stp x20, x19, [sp, -0x20]!
0xfffffff006245d88 fd7b01a9 stp x29, x30, [sp, 0x10]
0xfffffff006245d8c fd430091 add x29, sp, 0x10
0xfffffff006245d90 f30307aa mov x19, x7
0xfffffff006245d94 130100b4 cbz x19, 0xfffffff006245db4
0xfffffff006245d98 680240b9 ldr w8, [x19]
0xfffffff006245d9c 08051832 orr w8, w8, 0x300
0xfffffff006245da0 680200b9 str w8, [x19]
0xfffffff006245da4 00008052 mov w0, 0
0xfffffff006245da8 fd7b41a9 ldp x29, x30, [sp, 0x10]
0xfffffff006245dac f44fc2a8 ldp x20, x19, [sp], 0x20
0xfffffff006245db0 c0035fd6 ret
Looking at iOS 12.0b1, we can learn the signature of that function:
_vnode_check_exec(ucred*, vnode*, vnode*, label*, label*, label*, componentname*, unsigned int*, void*, unsigned long)
So yeah, finding this function is really easy:
Find AMFI's __TEXT_EXEC segment.
Find an orr wN, wN, 0x300 in it.
But that won't help you unless you defeat kernel integrity.

What do these 2 lines of assembly code do?

I am in the middle of phase 2 for bomb lab and I can't seem to figure out how these two lines of assembly affect the code overall and how they play a role in the loop going on.
Here is the 2 lines of code:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
and here is the entire code:
Dump of assembler code for function phase_2:
0x08048ba4 <+0>: push %ebp
0x08048ba5 <+1>: mov %esp,%ebp
0x08048ba7 <+3>: push %ebx
0x08048ba8 <+4>: sub $0x34,%esp
0x08048bab <+7>: lea -0x20(%ebp),%eax
0x08048bae <+10>: mov %eax,0x4(%esp)
0x08048bb2 <+14>: mov 0x8(%ebp),%eax
0x08048bb5 <+17>: mov %eax,(%esp)
0x08048bb8 <+20>: call 0x804922f <read_six_numbers>
0x08048bbd <+25>: cmpl $0x0,-0x20(%ebp)
0x08048bc1 <+29>: jns 0x8048be3 <phase_2+63>
0x08048bc3 <+31>: call 0x80491ed <explode_bomb>
0x08048bc8 <+36>: jmp 0x8048be3 <phase_2+63>
0x08048bca <+38>: mov %ebx,%eax
0x08048bcc <+40>: add -0x24(%ebp,%ebx,4),%eax
0x08048bd0 <+44>: cmp %eax,-0x20(%ebp,%ebx,4)
0x08048bd4 <+48>: je 0x8048bdb <phase_2+55>
0x08048bd6 <+50>: call 0x80491ed <explode_bomb>
0x08048bdb <+55>: inc %ebx
0x08048bdc <+56>: cmp $0x6,%ebx
0x08048bdf <+59>: jne 0x8048bca <phase_2+38>
0x08048be1 <+61>: jmp 0x8048bea <phase_2+70>
0x08048be3 <+63>: mov $0x1,%ebx
0x08048be8 <+68>: jmp 0x8048bca <phase_2+38>
0x08048bea <+70>: add $0x34,%esp
0x08048bed <+73>: pop %ebx
0x08048bee <+74>: pop %ebp
0x08048bef <+75>: ret
I noticed the inc command that increments %ebx by 1 and using that as %eax in the loop. But the add and cmp trip me up every time. If I had %eax as 1 going into to the add and cmp what %eax comes out? Thanks! I also know that once %ebx gets to 5 then the loop is over and it ends the entire code.
You got a list of 6 numbers. This means you can compare at most 5 pairs of numbers. So the loop that uses %ebx does 5 iterations.
In each iteration the value at the lower address is added to the current loop count, and then compared with the value at the next higher address. As long as they match the bomb won't explode!
This loops 5 times:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
These numbers are used:
with %ebx=1 numbers are at -0x20(%ebp) and -0x1C(%ebp)
with %ebx=2 numbers are at -0x1C(%ebp) and -0x18(%ebp)
with %ebx=3 numbers are at -0x18(%ebp) and -0x14(%ebp)
with %ebx=4 numbers are at -0x14(%ebp) and -0x10(%ebp)
with %ebx=5 numbers are at -0x10(%ebp) and -0x0C(%ebp)
Those two instructions are dealing with memory at two locations, indexed by ebp and ebx. In particular, the add instruction is keeping a running total of all the numbers examined so far, and the comparison instruction is checking whether that is equal to the next number. So something like:
int total = 0;
for (i=0; ..., i++) {
total += array[i];
if (total != array[i+])
explode_bomb();
}

Why do crashes in iOS relating to dyld_stub_binder occur?

It's widely known that dynamic link libraries aren't allowed in iOS apps, they may only link to dynamic system libraries. But I do run into some pretty confusing crashes with the 3rd frame from the top of the stack being dyld_stub_binder.
It's tough to find some solid information, but I'm guessing that dyld_stub_binder actually performs late linking of a dynamic system library.
I tend to run into crashes where the exception is EXC_BREAKPOINT UNKNOWN and the crash always seems to occur in the context of dyld_stub_binder.
The implementation of dyld_stub_binder is on the apple open source website. I don't quite understand the assembly, but perhaps someone who does could interpret why this error happens or whether or not it's something that is out of the application's direct control. The assembly code may not be useful though, as I'm talking about the iOS (arm) implementation and this code is i386 and x86_64.
EDIT: An interesting piece of information is that I think I started seeing this crash during efforts for porting to arm64. Is it possible that a runtime exception like this is due to some kind of misalignment?
As you've stated, the asm for the ARM case is not available, but it's fairly straightforward to figure out since you can decompile fairly easily. What dyld_stub_binder does (on all architectures) is to handle the lazy symbols in a binary. For example, consider the following:
$ cat a.c
void main(int argc, char **argv)
{
printf("%s", argv[1]);
}
$ gcc-iphone a.c -o a
$ jtool -d a
Disassembling from file offset 0x7f44, Address 0x100007f44
_main:
100007f44 STP X29, X30, [X31,#-16]!
100007f48 ADD x29, x31, #0x0 ; ..R29 = R31 (0x0) + 0x0 = 0x1f
100007f4c SUB X31, X31, #32
100007f50 STUR X0, X29, #-4 ; *((1) + 0x0) = ???
100007f54 STR X1, [ X31, #2] ; *((2) + 0x0) = ???
100007f58 LDR X1, [X31, #0x10] ; R1 = *(10) = 0x100000cfeedfacf
100007f5c LDR X1, [X1, #0x8] ; R1 = *(100000cfeedfad7) = 0x100000cfeedfacf
100007f60 ADD x8, x31, #0x0 ; ..R8 = R31 (0x0) + 0x0 = 0x1f
100007f64 STR X1, [ X8, #0] ; *(0x0) = 0xfeedfacf
100007f68 ADRP x0, 0 ; ->R0 = 0x100007000
100007f6c ADD x0, x0, #0xfb4 ; ..R0 = R0 (0x100007000) + 0xfb4 = 0x100007fb4 "%s"
100007f70 BL _printf ; 0x100007f84
; _printf("%s",arg..);
100007f74 STR X0, [ X31, #3] ; *((254) + 0x0) = ???
100007f78 ADD x31, x29, #0x0 ; ..R31 = R29 (0x1f) + 0x0 = 0x1d
100007f7c LDP X29, X30, [X31],#16
100007f80 RET
see that printf up there? 0x100007f84? Let's see what that is (The built-in otool can't decompile that part, but jtool can:)
_printf:
100007f84 NOP
100007f88 LDR X16, #34 ; R16 = *(100008010) = 0x100007fa8
100007f8c BR X16
So you just to 0x100007fa8. Once again applying jtool:
$ jtool -d 0x100007fa8 a
Disassembling from file offset 0x7fa8, Address 0x100007fa8
100007fa8 LDR X16, #2
100007fac B 0x100007f90
And now we have 0x100007f90, which is ...
100007f90 ADR x17, 120 ; ->R17 = 0x100008008
100007f94 NOP
100007f98 STP X16, X17, [X31,#-16]!
100007f9c NOP
100007fa0 LDR X16, #24 ; R16 = *(100008000) dyld_stub_binder
100007fa4 BR X16
Now, go back to that 0x...8010 which gets loaded - that will be the address of printf(), but it is only bound after the first "hit" or access. You can verify that with dyldinfo, or jtool -lazy_bind:
$ jtool -lazy_bind a
bind information:
segment section address type addend dylib symbol
__DATA __la_symbol_ptr 0x100008010 ... 0 libSystem.B.dylib _printf
Meaning, on first access, the stub_binder finds the address of printf in lib system, and embeds it there.
If the symbol cannot be bound, you get an exception. Though that can be for oh-so-many-reasons. You might want to add the crash log here. If it's a breakpoint, that's a voluntary crash by dyld which usually occurs when symbol was not found. If a debugger (lldb) is attached, it will break there and then. Else - with no debugger - it crashes.

Bare metal assembly - data not initialized

I wrote some very simple code, aimed to work on bare metal RaspberryPi. My code consists of gpio.s (with function "flash", which turns LED on and off) and main.s, shown below.
.section .init
.globl _start
_start:
mov sp, $0x8000
b main
.section .text
.globl main
main:
ldr r5, =variable
ldr r4, [r5]
cmp r4, $100
bleq flash
loop:
b loop
.section .data
.align 4
.globl variable
variable:
.word 100
So r4 should be filled with 100 => condition flag should be eq => LED should flash! But it does not. Why?
Apart from that example, function "flash" works, as well as in the case of adding these lines after "ldr r5, =variable":
mov r1, $100
str r1, [r5]
So it seems like memory is accessible, but doesn't get initialized. I would be grateful for your explanations.
Disassembly:
./build/output.elf: file format elf32-littlearm
Disassembly of section .init:
00000000 <_start>:
0: e3a0d902 mov sp, #32768 ; 0x8000
4: ea00205c b 817c <main>
Disassembly of section .text:
00008000 <getGpioAddr>:
8000: e59f0170 ldr r0, [pc, #368] ; 8178 <flash2+0x14>
8004: e1a0f00e mov pc, lr
00008008 <setGpioFunct>:
8008: e3500035 cmp r0, #53 ; 0x35
800c: 93510007 cmpls r1, #7 ; 0x7
8010: 83a00001 movhi r0, #1 ; 0x1
8014: 81a0f00e movhi pc, lr
8018: e92d0030 push {r4, r5}
801c: e1a02001 mov r2, r1
8020: e1a01000 mov r1, r0
8024: e92d4000 push {lr}
8028: ebfffff4 bl 8000 <getGpioAddr>
802c: e8bd4000 pop {lr}
8030: e3a04000 mov r4, #0 ; 0x0
00008034 <subTen>:
8034: e351000a cmp r1, #10 ; 0xa
8038: 2241100a subcs r1, r1, #10 ; 0xa
803c: 22844001 addcs r4, r4, #1 ; 0x1
8040: 2afffffb bcs 8034 <subTen>
8044: e3a05004 mov r5, #4 ; 0x4
8048: e0030594 mul r3, r4, r5
804c: e0800003 add r0, r0, r3
8050: e3a05003 mov r5, #3 ; 0x3
8054: e0030591 mul r3, r1, r5
8058: e1a02312 lsl r2, r2, r3
805c: e3e0430e mvn r4, #939524096 ; 0x38000000
8060: e3a05009 mov r5, #9 ; 0x9
8064: e0451001 sub r1, r5, r1
8068: e3a05003 mov r5, #3 ; 0x3
806c: e0030591 mul r3, r1, r5
8070: e1a04374 ror r4, r4, r3
8074: e5905000 ldr r5, [r0]
8078: e0055004 and r5, r5, r4
807c: e1855002 orr r5, r5, r2
8080: e5805000 str r5, [r0]
8084: e8bd0030 pop {r4, r5}
8088: e3a00000 mov r0, #0 ; 0x0
808c: e1a0f00e mov pc, lr
00008090 <setPin>:
8090: e3500035 cmp r0, #53 ; 0x35
8094: 83a00001 movhi r0, #1 ; 0x1
8098: 81a0f00e movhi pc, lr
809c: e92d0020 push {r5}
80a0: e3500020 cmp r0, #32 ; 0x20
80a4: 22401020 subcs r1, r0, #32 ; 0x20
80a8: 31a01000 movcc r1, r0
80ac: 23a02020 movcs r2, #32 ; 0x20
80b0: 33a0201c movcc r2, #28 ; 0x1c
80b4: e92d4000 push {lr}
80b8: ebffffd0 bl 8000 <getGpioAddr>
80bc: e8bd4000 pop {lr}
80c0: e3a05001 mov r5, #1 ; 0x1
80c4: e1a05115 lsl r5, r5, r1
80c8: e7805002 str r5, [r0, r2]
80cc: e3a00000 mov r0, #0 ; 0x0
80d0: e8bd0020 pop {r5}
80d4: e1a0f00e mov pc, lr
000080d8 <clearPin>:
80d8: e3500035 cmp r0, #53 ; 0x35
80dc: 83a00001 movhi r0, #1 ; 0x1
80e0: 81a0f00e movhi pc, lr
80e4: e92d0020 push {r5}
80e8: e3500020 cmp r0, #32 ; 0x20
80ec: 22401020 subcs r1, r0, #32 ; 0x20
80f0: 31a01000 movcc r1, r0
80f4: 23a0202c movcs r2, #44 ; 0x2c
80f8: 33a02028 movcc r2, #40 ; 0x28
80fc: e92d4000 push {lr}
8100: ebffffbe bl 8000 <getGpioAddr>
8104: e8bd4000 pop {lr}
8108: e3a05001 mov r5, #1 ; 0x1
810c: e1a05115 lsl r5, r5, r1
8110: e7805002 str r5, [r0, r2]
8114: e3a00000 mov r0, #0 ; 0x0
8118: e8bd0020 pop {r5}
811c: e1a0f00e mov pc, lr
00008120 <flash>:
8120: e92d4013 push {r0, r1, r4, lr}
8124: e3a00010 mov r0, #16 ; 0x10
8128: e3a01001 mov r1, #1 ; 0x1
812c: ebffffb5 bl 8008 <setGpioFunct>
8130: e3a00010 mov r0, #16 ; 0x10
8134: ebffffe7 bl 80d8 <clearPin>
8138: eb000004 bl 8150 <wait>
813c: e3a00010 mov r0, #16 ; 0x10
8140: ebffffd2 bl 8090 <setPin>
8144: eb000001 bl 8150 <wait>
8148: e8bd4013 pop {r0, r1, r4, lr}
814c: e1a0f00e mov pc, lr
00008150 <wait>:
8150: e3a0583f mov r5, #4128768 ; 0x3f0000
00008154 <loop>:
8154: e2455001 sub r5, r5, #1 ; 0x1
8158: e3550000 cmp r5, #0 ; 0x0
815c: 1afffffc bne 8154 <loop>
8160: e1a0f00e mov pc, lr
00008164 <flash2>:
8164: e92d4000 push {lr}
8168: ebffffec bl 8120 <flash>
816c: ebffffeb bl 8120 <flash>
8170: e8bd4000 pop {lr}
8174: e1a0f00e mov pc, lr
8178: 20200000 .word 0x20200000
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e5954000 ldr r4, [r5]
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 00008194 .word 0x00008194
Disassembly of section .data:
00008194 <variable>:
8194: 00000064 .word 0x00000064
Linker scripts, makefile etc. taken from: http://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ok01.html
from your link (you should not ask questions here using links, put the code in the question)
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e3a01064 mov r1, #100 ; 0x64
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 000081a0 .word 0x000081a0
Disassembly of section .data:
000081a0 <variable>:
81a0: 00000064 .word 0x00000064
...
you are moving r1 a 100 but comparing r4 which has not been initialized at least in this code, so that is unpredictable what will happen. if you replace that with a mov r4,[r5] it should work as desired as r5 is getting the address of the word that contains the #100 and then you read from that address into r4.
I assume you have verified that if you simply bl flash it works (not a conditional but always go there) as desired?
In this bare metal mode you definitely have access to read/write memory, no worries there.
David
Memory is normally initialized as part of the C runtime code. If you are writing bare-metal assembly without including the functionality of the C runtime then your variables in RAM will not be initialized. You need to explicitly initialize the value of variable in your own code.
Finally found out! Really subtle, and it's not my fault indeed. I had taken the makefile and linker script from Alex Chadwick tutorial, and the linker script looked like that:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x0000 : {
*(.init)
}
/*
* We allow room for the ATAGs and the stack and then start our code at
* 0x8000.
*/
.text 0x8000 : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
.init section was based at 0x0000, and then the .text started at 0x8000. But actually, kernel.img is loaded at address 0x8000 by Pi (real address of .init was 0x8000), so: whole .text section (as well as the following sections) were shifted - due to that fact, addresses of labels were misassumed at the assembling-linking time. Only pc-relative addressing could work, as PC was set correctly. The solution is to start the image at 0x8000:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
I've just checked the template on his website and it's corrected now, so there is no point contacting him. I must have downloaded template before this correction. Thank you guys for your attempts.

Use of stack pointer (sp) in arm assembly

I'm slightly confused by the following bit of disassembly:
_GSEventLockDevice:
000047d8 b5f0 push {r4, r5, r6, r7, lr}
000047da af03 add r7, sp, #12
000047dc b08d sub sp, #52
000047de f7ffffb3 bl _GSGetPurpleSystemEventPort
000047e2 466d mov r5, sp
000047e4 2234 movs r2, #52
000047e6 2100 movs r1, #0
000047e8 4604 mov r4, r0
000047ea 4628 mov r0, r5
000047ec f005e8b0 blx 0x9950 # symbol stub for: _memset
000047f0 2600 movs r6, #0
000047f2 f24030f6 movw r0, 0x3f6
000047f6 4621 mov r1, r4
000047f8 e88d0041 stmia.w sp, {r0, r6}
000047fc 4628 mov r0, r5
000047fe f7fffaf7 bl _GSSendEvent
00004802 b00d add sp, #52
00004804 bdf0 pop {r4, r5, r6, r7, pc}
00004806 bf00 nop
I don't get how this would go in C. The only bit I get is:
memset(whateverTheStackPointerIs, 0, 52);
But how do I know what sp is and how would it look in C?
The
sub sp, #52
reserves 52 bytes of space for local variables on the stack; afterwards sp will point to the first of those 52 bytes. They are all then zeroed with the memset call. After the memset, stmia stores particular values in the first two words. So the C equivalent would be something like
GEEventLockDecvice() {
int tmp = GSGetPurpleSystemEventPort();
int localdata[13] = {0};
localdata[0] = *0x3f6;
localdata[1] = 0;
return GSSendEvent(&localdata, tmp);
}

Resources