Incorrect symbol resolution when compiling inline assembly on arm - ios

Part of the source code is
id (*old_objc_msgSend)(id, SEL, ...);
__attribute__((naked))
id new_objc_msgSend(id self, SEL op, ...) {
__asm__ __volatile__ (
".thumb\n"
"ldmia.w sp, {r2, r3}\n"
"b _old_objc_msgSend\n"
);
}
But the generated assembly is
Dump of assembler code for function _Z16new_objc_msgSendP11objc_objectP13objc_selectorz:
0x01a7ae9c <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+0>: stmia.w sp, {r2, r3}
0x01a7aea0 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+4>: ldmia.w sp, {r2, r3}
0x01a7aea4 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+8>: b.w 0x1a7af68 <_Z27new_initWithContentwithSizeP11objc_objectP13objc_selectorS0_6CGSize+188>
0x01a7aea8 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+12>: bx lr
0x01a7aeaa <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+14>: nop
End of assembler dump.
It branches to a different address.

Related

Getting Beaglebone PRUs to work using PASM

I have been trying to get the PRU to work in a way that makes sense to me and at this point I am completely clueless. I can get the examples to work, but anytime I make a change or try to write things from scratch I just beat my head against the wall. I just want to as a start access any of the USRLEDS and turn them off or on at some speed, or as first pass turn on a LED and leave it on. Here is a PASM code I got off the internet (Will post link when I find it):
.origin 0
.entrypoint START
#define PRU0_ARM_INTERRUPT 19
#define AM33XX
#define GPIO1 0x4804c000 //Trying to access the GPIO1
#define GPIO_CLEARDATAOUT 0x190 //writing 1 to the bit you want cleared in GPIO_DATAOUT register (what does that mean?)
#define GPIO_SETDATAOUT 0x194 (set a value for GPIO output pins, which pins am I even writing to? GPIO1?
#define GPIO_OE 0x134 //enable the pins output capabilities
START:
//clear that bit
lbco r0, c4, 4, 4 //This creates a constant offset and stores in c4, but why do you need that?
CLR r0, r0, 4 //if you copied the data why do you need to clear it?
SBCO r0, C4, 4, 4 //What is this for?
//MOV r1, 10
MOV r2, 0x00000000 //store address 0x00 into r2, why?
MOV r3, GPIO1 //Store GPIO1 address in r3
MOV r4, GPIO_OE //place address of GPIO_OE into r4
MOV r5, GPIO_SETDATAOUT //store address of GPIO_SETDATAOUT in r5
MOV r6, GPIO_CLEARDATAOUT //store addres of GPIOCLEARDATAOUT in r6
SBBO r2, r3, r4,4 //What is this even doing? Copying 4 bytes from r2 into r3+r4, but why do you want to copy that way and if not why not?
MOV r1, 10
MOV r2, 0xFFFFFFFF //Suppossedly this turn the GPIO1 ON and OFF?
SBBO r2, r3, r6, 4 and again the storage stuff?
HALT
I am also attaching the C code that I am using:
#include <stdio.h>
#include <pruss/prussdrv.h>
#include <pruss/pruss_intc_mapping.h>
#define PRU_NUM 0 //defining which PRU to use
int main() {
int ret;
tpruss_intc_initdata intc = PRUSS_INTC_INITDATA;
//initialize the PRU by using init command from prussdrv.h
ret = prussdrv_init();
if(ret != 0) {
printf("Error returned: %d\n",ret);
printf("PRU unable to be initialized");
return -1;
}
ret = prussdrv_open(PRU_EVTOUT_0);
if(ret != 0) {
printf("Error returned for prussdrv_open(): %d\n",ret);
printf("PRU can't open PRU_EVTOUT_0");
return -1;
}
//Map PRUS's INTC
ret = prussdrv_pruintc_init(&intc);
if (ret != 0) {
printf("Error returned for prussdrv_pruintc_int\n");
printf("PRU doesn't work");
return -1;
}
//load and execute binary on PRU
prussdrv_exec_program(PRU_NUM, "./ashwini_test.bin");
prussdrv_pru_wait_event(PRU_EVTOUT_0);
prussdrv_pru_clear_event(PRU_EVTOUT_0,PRU0_ARM_INTERRUPT);
/*Disable PRU and close memory mappings*/
prussdrv_pru_disable(PRU_NUM);
prussdrv_exit();
//prussdrv_pru_wait_event(PRU_EVTOUT_0);
return 0;
}
I have gone through THE TRM and https://groups.google.com/forum/#!topic/beaglebone/98eF1wQE_QA, and elinux and derekmolloy, I just feel like I am missing something very basic about how address scheme work or how to think about these things. Thanks again for your help!
When you say that's your PASM code... do you mean it's some code you got from somewhere else that you're trying to use? Because the comments on most lines asking what they do makes it seem unlikely that it's actually your code...
Anyways, can't really answer unless you have a specific question, but there's plenty of info out there about how to use the GPIO subsystem on the BeagleBone's AM335x processor. I talked about it some in a post a while back here: https://graycat.io/tutorials/beaglebone-io-using-python-mmap/
I've also got a few documented PRU assembly examples here: https://github.com/alexanderhiam/PRU-stuffs

Why do crashes in iOS relating to dyld_stub_binder occur?

It's widely known that dynamic link libraries aren't allowed in iOS apps, they may only link to dynamic system libraries. But I do run into some pretty confusing crashes with the 3rd frame from the top of the stack being dyld_stub_binder.
It's tough to find some solid information, but I'm guessing that dyld_stub_binder actually performs late linking of a dynamic system library.
I tend to run into crashes where the exception is EXC_BREAKPOINT UNKNOWN and the crash always seems to occur in the context of dyld_stub_binder.
The implementation of dyld_stub_binder is on the apple open source website. I don't quite understand the assembly, but perhaps someone who does could interpret why this error happens or whether or not it's something that is out of the application's direct control. The assembly code may not be useful though, as I'm talking about the iOS (arm) implementation and this code is i386 and x86_64.
EDIT: An interesting piece of information is that I think I started seeing this crash during efforts for porting to arm64. Is it possible that a runtime exception like this is due to some kind of misalignment?
As you've stated, the asm for the ARM case is not available, but it's fairly straightforward to figure out since you can decompile fairly easily. What dyld_stub_binder does (on all architectures) is to handle the lazy symbols in a binary. For example, consider the following:
$ cat a.c
void main(int argc, char **argv)
{
printf("%s", argv[1]);
}
$ gcc-iphone a.c -o a
$ jtool -d a
Disassembling from file offset 0x7f44, Address 0x100007f44
_main:
100007f44 STP X29, X30, [X31,#-16]!
100007f48 ADD x29, x31, #0x0 ; ..R29 = R31 (0x0) + 0x0 = 0x1f
100007f4c SUB X31, X31, #32
100007f50 STUR X0, X29, #-4 ; *((1) + 0x0) = ???
100007f54 STR X1, [ X31, #2] ; *((2) + 0x0) = ???
100007f58 LDR X1, [X31, #0x10] ; R1 = *(10) = 0x100000cfeedfacf
100007f5c LDR X1, [X1, #0x8] ; R1 = *(100000cfeedfad7) = 0x100000cfeedfacf
100007f60 ADD x8, x31, #0x0 ; ..R8 = R31 (0x0) + 0x0 = 0x1f
100007f64 STR X1, [ X8, #0] ; *(0x0) = 0xfeedfacf
100007f68 ADRP x0, 0 ; ->R0 = 0x100007000
100007f6c ADD x0, x0, #0xfb4 ; ..R0 = R0 (0x100007000) + 0xfb4 = 0x100007fb4 "%s"
100007f70 BL _printf ; 0x100007f84
; _printf("%s",arg..);
100007f74 STR X0, [ X31, #3] ; *((254) + 0x0) = ???
100007f78 ADD x31, x29, #0x0 ; ..R31 = R29 (0x1f) + 0x0 = 0x1d
100007f7c LDP X29, X30, [X31],#16
100007f80 RET
see that printf up there? 0x100007f84? Let's see what that is (The built-in otool can't decompile that part, but jtool can:)
_printf:
100007f84 NOP
100007f88 LDR X16, #34 ; R16 = *(100008010) = 0x100007fa8
100007f8c BR X16
So you just to 0x100007fa8. Once again applying jtool:
$ jtool -d 0x100007fa8 a
Disassembling from file offset 0x7fa8, Address 0x100007fa8
100007fa8 LDR X16, #2
100007fac B 0x100007f90
And now we have 0x100007f90, which is ...
100007f90 ADR x17, 120 ; ->R17 = 0x100008008
100007f94 NOP
100007f98 STP X16, X17, [X31,#-16]!
100007f9c NOP
100007fa0 LDR X16, #24 ; R16 = *(100008000) dyld_stub_binder
100007fa4 BR X16
Now, go back to that 0x...8010 which gets loaded - that will be the address of printf(), but it is only bound after the first "hit" or access. You can verify that with dyldinfo, or jtool -lazy_bind:
$ jtool -lazy_bind a
bind information:
segment section address type addend dylib symbol
__DATA __la_symbol_ptr 0x100008010 ... 0 libSystem.B.dylib _printf
Meaning, on first access, the stub_binder finds the address of printf in lib system, and embeds it there.
If the symbol cannot be bound, you get an exception. Though that can be for oh-so-many-reasons. You might want to add the crash log here. If it's a breakpoint, that's a voluntary crash by dyld which usually occurs when symbol was not found. If a debugger (lldb) is attached, it will break there and then. Else - with no debugger - it crashes.

Calling a function crashes when the stack pointer is changed with inline assembly

I have written some code that changes the current stack used by modifying the stack pointer in inline assembly. Although I can call functions and create local variables, calls to println! and some functions from std::rt result in the application terminating abnormally with signal 4 (illegal instruction) in the playpen. How should I improve the code to prevent crashes?
#![feature(asm, box_syntax)]
#[allow(unused_assignments)]
#[inline(always)]
unsafe fn get_sp() -> usize {
let mut result = 0usize;
asm!("
movq %rsp, $0
"
:"=r"(result):::"volatile"
);
result
}
#[inline(always)]
unsafe fn set_sp(value: usize) {
asm!("
movq $0, %rsp
"
::"r"(value)::"volatile"
);
}
#[inline(never)]
unsafe fn foo() {
println!("Hello World!");
}
fn main() {
unsafe {
let mut stack = box [0usize; 500];
let len = stack.len();
stack[len-1] = get_sp();
set_sp(std::mem::transmute(stack.as_ptr().offset((len as isize)-1)));
foo();
asm!("
movq (%rsp), %rsp
"
::::"volatile"
);
}
}
Debugging the program with rust-lldb on x86_64 on OS X yields 300K stack traces, repeating these lines over and over:
frame #299995: 0x00000001000063c4 a`rt::util::report_overflow::he556d9d2b8eebb88VbI + 36
frame #299996: 0x0000000100006395 a`rust_stack_exhausted + 37
frame #299997: 0x000000010000157f a`__morestack + 13
morestack is assembly for each platform, like i386 and x86_64 — the i386 variant has more description that I think you will want to read carefully. This piece stuck out to me:
Each Rust function contains an LLVM-generated prologue that compares the stack space required for the current function to the space remaining in the current stack segment, maintained in a platform-specific TLS slot.
Here's the first instructions of the foo method:
a`foo::h5f80496ac1ee3d43zaa:
0x1000013e0: cmpq %gs:0x330, %rsp
0x1000013e9: ja 0x100001405 ; foo::h5f80496ac1ee3d43zaa + 37
0x1000013eb: movabsq $0x48, %r10
0x1000013f5: movabsq $0x0, %r11
-> 0x1000013ff: callq 0x100001572 ; __morestack
As you can see, I am about to call into __morestack, so the comparison check failed.
I believe that this indicates that you cannot manipulate the stack pointer and attempt to call any Rust functions.
As a side note, let's look at your get_sp assembly:
movq %rsp, $0
Doing a check check for the semantics of movq:
Copies a quadword from the source operand (second operand) to the destination operand (first operand).
That seems to indicate that your assembly is backwards, in addition to all the other problems.

Bare metal assembly - data not initialized

I wrote some very simple code, aimed to work on bare metal RaspberryPi. My code consists of gpio.s (with function "flash", which turns LED on and off) and main.s, shown below.
.section .init
.globl _start
_start:
mov sp, $0x8000
b main
.section .text
.globl main
main:
ldr r5, =variable
ldr r4, [r5]
cmp r4, $100
bleq flash
loop:
b loop
.section .data
.align 4
.globl variable
variable:
.word 100
So r4 should be filled with 100 => condition flag should be eq => LED should flash! But it does not. Why?
Apart from that example, function "flash" works, as well as in the case of adding these lines after "ldr r5, =variable":
mov r1, $100
str r1, [r5]
So it seems like memory is accessible, but doesn't get initialized. I would be grateful for your explanations.
Disassembly:
./build/output.elf: file format elf32-littlearm
Disassembly of section .init:
00000000 <_start>:
0: e3a0d902 mov sp, #32768 ; 0x8000
4: ea00205c b 817c <main>
Disassembly of section .text:
00008000 <getGpioAddr>:
8000: e59f0170 ldr r0, [pc, #368] ; 8178 <flash2+0x14>
8004: e1a0f00e mov pc, lr
00008008 <setGpioFunct>:
8008: e3500035 cmp r0, #53 ; 0x35
800c: 93510007 cmpls r1, #7 ; 0x7
8010: 83a00001 movhi r0, #1 ; 0x1
8014: 81a0f00e movhi pc, lr
8018: e92d0030 push {r4, r5}
801c: e1a02001 mov r2, r1
8020: e1a01000 mov r1, r0
8024: e92d4000 push {lr}
8028: ebfffff4 bl 8000 <getGpioAddr>
802c: e8bd4000 pop {lr}
8030: e3a04000 mov r4, #0 ; 0x0
00008034 <subTen>:
8034: e351000a cmp r1, #10 ; 0xa
8038: 2241100a subcs r1, r1, #10 ; 0xa
803c: 22844001 addcs r4, r4, #1 ; 0x1
8040: 2afffffb bcs 8034 <subTen>
8044: e3a05004 mov r5, #4 ; 0x4
8048: e0030594 mul r3, r4, r5
804c: e0800003 add r0, r0, r3
8050: e3a05003 mov r5, #3 ; 0x3
8054: e0030591 mul r3, r1, r5
8058: e1a02312 lsl r2, r2, r3
805c: e3e0430e mvn r4, #939524096 ; 0x38000000
8060: e3a05009 mov r5, #9 ; 0x9
8064: e0451001 sub r1, r5, r1
8068: e3a05003 mov r5, #3 ; 0x3
806c: e0030591 mul r3, r1, r5
8070: e1a04374 ror r4, r4, r3
8074: e5905000 ldr r5, [r0]
8078: e0055004 and r5, r5, r4
807c: e1855002 orr r5, r5, r2
8080: e5805000 str r5, [r0]
8084: e8bd0030 pop {r4, r5}
8088: e3a00000 mov r0, #0 ; 0x0
808c: e1a0f00e mov pc, lr
00008090 <setPin>:
8090: e3500035 cmp r0, #53 ; 0x35
8094: 83a00001 movhi r0, #1 ; 0x1
8098: 81a0f00e movhi pc, lr
809c: e92d0020 push {r5}
80a0: e3500020 cmp r0, #32 ; 0x20
80a4: 22401020 subcs r1, r0, #32 ; 0x20
80a8: 31a01000 movcc r1, r0
80ac: 23a02020 movcs r2, #32 ; 0x20
80b0: 33a0201c movcc r2, #28 ; 0x1c
80b4: e92d4000 push {lr}
80b8: ebffffd0 bl 8000 <getGpioAddr>
80bc: e8bd4000 pop {lr}
80c0: e3a05001 mov r5, #1 ; 0x1
80c4: e1a05115 lsl r5, r5, r1
80c8: e7805002 str r5, [r0, r2]
80cc: e3a00000 mov r0, #0 ; 0x0
80d0: e8bd0020 pop {r5}
80d4: e1a0f00e mov pc, lr
000080d8 <clearPin>:
80d8: e3500035 cmp r0, #53 ; 0x35
80dc: 83a00001 movhi r0, #1 ; 0x1
80e0: 81a0f00e movhi pc, lr
80e4: e92d0020 push {r5}
80e8: e3500020 cmp r0, #32 ; 0x20
80ec: 22401020 subcs r1, r0, #32 ; 0x20
80f0: 31a01000 movcc r1, r0
80f4: 23a0202c movcs r2, #44 ; 0x2c
80f8: 33a02028 movcc r2, #40 ; 0x28
80fc: e92d4000 push {lr}
8100: ebffffbe bl 8000 <getGpioAddr>
8104: e8bd4000 pop {lr}
8108: e3a05001 mov r5, #1 ; 0x1
810c: e1a05115 lsl r5, r5, r1
8110: e7805002 str r5, [r0, r2]
8114: e3a00000 mov r0, #0 ; 0x0
8118: e8bd0020 pop {r5}
811c: e1a0f00e mov pc, lr
00008120 <flash>:
8120: e92d4013 push {r0, r1, r4, lr}
8124: e3a00010 mov r0, #16 ; 0x10
8128: e3a01001 mov r1, #1 ; 0x1
812c: ebffffb5 bl 8008 <setGpioFunct>
8130: e3a00010 mov r0, #16 ; 0x10
8134: ebffffe7 bl 80d8 <clearPin>
8138: eb000004 bl 8150 <wait>
813c: e3a00010 mov r0, #16 ; 0x10
8140: ebffffd2 bl 8090 <setPin>
8144: eb000001 bl 8150 <wait>
8148: e8bd4013 pop {r0, r1, r4, lr}
814c: e1a0f00e mov pc, lr
00008150 <wait>:
8150: e3a0583f mov r5, #4128768 ; 0x3f0000
00008154 <loop>:
8154: e2455001 sub r5, r5, #1 ; 0x1
8158: e3550000 cmp r5, #0 ; 0x0
815c: 1afffffc bne 8154 <loop>
8160: e1a0f00e mov pc, lr
00008164 <flash2>:
8164: e92d4000 push {lr}
8168: ebffffec bl 8120 <flash>
816c: ebffffeb bl 8120 <flash>
8170: e8bd4000 pop {lr}
8174: e1a0f00e mov pc, lr
8178: 20200000 .word 0x20200000
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e5954000 ldr r4, [r5]
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 00008194 .word 0x00008194
Disassembly of section .data:
00008194 <variable>:
8194: 00000064 .word 0x00000064
Linker scripts, makefile etc. taken from: http://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ok01.html
from your link (you should not ask questions here using links, put the code in the question)
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e3a01064 mov r1, #100 ; 0x64
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 000081a0 .word 0x000081a0
Disassembly of section .data:
000081a0 <variable>:
81a0: 00000064 .word 0x00000064
...
you are moving r1 a 100 but comparing r4 which has not been initialized at least in this code, so that is unpredictable what will happen. if you replace that with a mov r4,[r5] it should work as desired as r5 is getting the address of the word that contains the #100 and then you read from that address into r4.
I assume you have verified that if you simply bl flash it works (not a conditional but always go there) as desired?
In this bare metal mode you definitely have access to read/write memory, no worries there.
David
Memory is normally initialized as part of the C runtime code. If you are writing bare-metal assembly without including the functionality of the C runtime then your variables in RAM will not be initialized. You need to explicitly initialize the value of variable in your own code.
Finally found out! Really subtle, and it's not my fault indeed. I had taken the makefile and linker script from Alex Chadwick tutorial, and the linker script looked like that:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x0000 : {
*(.init)
}
/*
* We allow room for the ATAGs and the stack and then start our code at
* 0x8000.
*/
.text 0x8000 : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
.init section was based at 0x0000, and then the .text started at 0x8000. But actually, kernel.img is loaded at address 0x8000 by Pi (real address of .init was 0x8000), so: whole .text section (as well as the following sections) were shifted - due to that fact, addresses of labels were misassumed at the assembling-linking time. Only pc-relative addressing could work, as PC was set correctly. The solution is to start the image at 0x8000:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
I've just checked the template on his website and it's corrected now, so there is no point contacting him. I must have downloaded template before this correction. Thank you guys for your attempts.

Printing memory accesses in GDB

I am new to gdb. I want to print the memory addresses used with the actual sequence during execution of a c program. Let’s explain my question with an example. Let’s assume that we have the following c code with two functions main() and test(). I know that, inside gdb, I can use "disassemble main" to disassemble main() function, or "disassemble test" to disassemble test() function separately. My question is, how can I disassemble these two functions as a single code; so that, I can see all the memory addresses used during execution and their sequence of accesses? To be specific, as main() is calling test() and test() is also calling itself multiple times, I want to see something like example 2. I am also wandering, the addresses shown in gdb disassembler, are they virtual or physical memory addresses? Any help or guidance will be appreciated.
Example 1:
#include "stdio.h"
int test(int q)
{
if(q<16)
test(q+5);
return q;
}
void main()
{
unsigned int a=5;
unsigned int b=5;
unsigned int c=5;
test(a);
}
Example 2:
<Memory Address> <assembly instruction> <c instructions>
0x12546a mov //for unsigned int a=5;
0x12546b mov //for unsigned int b=5;
0x12546c mov //for unsigned int c=5;
0x12546d jmp //for test(q=a=5);
0x12546e cmpl //for if(q<16)
0x12546f jmp //for test(q+5);
0x12546d jmp //for test(q=10);
0x12546e cmpl //for if(q<16)
0x12546f jmp //for test(q+5);
0x12547a jmp //for test(q=15);
0x12547b cmpl //for if(q<16)
0x12547c jmp //for test(q+5);
0x12547d jmp //for test(q=20);
0x12547e cmpl //for if(q<16)
0x12547f jmp //return q);
0x12548a jmp //return q);
0x12548b jmp //return q);
0x12548c jmp //return q);
There's really no pretty way to do this. You're just going to have to step through the code:
(gdb) stepi
(gdb) x/i $pc
(gdb) info registers
(gdb) stepi
(gdb) x/i $pc
(gdb) info registers
.....
You could script that up so that it does it quickly and dumps the data to a file, but that's about all.
I suppose you may have more luck with valgrind. If there's no existing tool to do so, it is possible to add your own instrumentation to report memory accesses (and not only that), or alter an existing one.
E.g. see http://valgrind.org/docs/manual/lk-manual.html
--trace-mem= [default: no]
When enabled, Lackey prints the size and address of almost every memory access made by the program.

Resources