Since I'm a newbie to reverse engineering, I can't understand why, in v8 = objc_retain(a3, a2); sentence, the objc_retain() function has two parameters and even a return value! As far as I know, the function objc_retain in runtime library just gets one parameter and no return value. how does this objc_retain work with its two parameters?
char __cdecl -[NSString writeToAppFile:tag:userInfo:error:](NSString *self, SEL a2, id a3, id a4, id a5, id *a6)
{
NSString *v6; // r10
id v7; // r5
int v8; // r8
int v9; // r1
int v10; // r11
int v11; // r1
int v12; // r6
void *v13; // r0
void *v14; // r4
v6 = self;
v7 = a4;
v8 = objc_retain(a3, a2);
v10 = objc_retain(v7, v9);
v12 = objc_retain(a5, v11);
v13 = objc_msgSend(v6, "dataUsingEncoding:", 4);
v14 = (void *)objc_retainAutoreleasedReturnValue(v13);
LOBYTE(v7) = (unsigned int)objc_msgSend(v14, "writeToAppFile:tag:userInfo:error:", v8, v10, v12, a6);
objc_release(v12);
objc_release(v10);
objc_release(v8);
objc_release(v14);
return (char)v7;
}
Related
There is a problem that has become quite popular these last days regarding the LLVM/GCC inability to optimize a trivial loop when the range is quite obvious.
Godbolt for all the examples below: https://godbolt.org/z/b3PzrsE5e
The code below will not optimize and the generated assembly will produce a loop.
uint64_t sum1( uint64_t num ) {
uint64_t sum = 0;
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
return sum;
}
produces
sum1(unsigned long): # #sum1(unsigned long)
xorl %eax, %eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
addq $1, %rax
cmpq %rdi, %rax
jbe .LBB0_1
retq
However if one adds a limiter to the range of the variable, like an AND mask, the loop is able to optimize. You can also easily make it optimize if you change the condition j<=num to j<num+1.
uint64_t sum2( uint64_t num ) {
num &= 0xFFFFFFFFULL;
uint64_t sum = 0;
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
return sum;
}
produces
sum2(unsigned long): # #sum2(unsigned long)
movl %edi, %eax
addq $1, %rax
retq
while curbing the range with an if statement does not have any effect
uint64_t sum3( uint64_t num ) {
uint64_t sum = 0;
if ( num <= 0xFF ) {
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
}
return sum;
}
Produces assembly code again with a loop.
sum3(unsigned long): # #sum3(unsigned long)
xorl %eax, %eax
cmpq $255, %rdi
ja .LBB3_2
.LBB3_1: # =>This Inner Loop Header: Depth=1
addq $1, %rax
cmpq %rdi, %rax
jbe .LBB3_1
.LBB3_2:
retq
For that sake, even __builtin_assume( num < 0x100ULL ) has no effect on the result.
I have looked into the LLVM code and traced this to the failed statement at
// lib/Transforms/Scalar/IndVarSimplify.cpp:1430
const SCEV *MaxExitCount = SE->getSymbolicMaxBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(MaxExitCount)) {
printf( "Could not compute\n");
return false;
}
...
which then ends up in
// lib/Analysis/ScalarEvolution.cpp:7253
const SCEV *ScalarEvolution::getExitCount(const Loop *L,
const BasicBlock *ExitingBlock,
ExitCountKind Kind) {
switch (Kind) {
case Exact:
case SymbolicMaximum:
return getBackedgeTakenInfo(L).getExact(ExitingBlock, this);
case ConstantMaximum:
return getBackedgeTakenInfo(L).getConstantMax(ExitingBlock, this);
};
llvm_unreachable("Invalid ExitCountKind!");
}
What I don't understand is why the boundary cannot be inferred if the if statement makes it clear? Is this a feature that could be implemented? Am I in the right track?
Just trying http://api.madewithmarmalade.com/ExampleArmASM.html and using iOS; the program run if I comment out the loop and the res is printed as 28. But if not comment it out, it will abend without printing the res.
Any hint why and how to fix it.
Thanks in advance.
My code is as follows:
#include <stdio.h>
#include <stdlib.h>
#define ARRAY_SIZE 512
#if defined __arm__ && defined __ARM_NEON__
static int computeSumNeon(const int a[])
{
// Computes the sum of all elements in the input array
int res = 0;
asm(".align 4 \n\t" //dennis warning avoiding
"vmov.i32 q8, #0 \n\t" //clear our accumulator register
"mov r3, #512 \n\t" //Loop condition n = ARRAY_SIZE
// ".loop1: \n\t" // No loop add 0-7 works as 28
"vld1.32 {d0, d1, d2, d3}, [%[input]]! \n\t" //load 8 elements into d0, d1, d2, d3 = q0, q1
"pld [%[input]] \n\t" // preload next set of elements
"vadd.i32 q8, q0, q8 \n\t" // q8 += q0
"vadd.i32 q8, q1, q8 \n\t" // q8 += q1
"subs r3, r3, #8 \n\t" // n -= 8
// "bne .loop1 \n\t" // n == 0?
"vpadd.i32 d0, d16, d17 \n\t" // d0[0] = d16[0] + d16[1], d0[1] = d17[0] + d17[1]
"vpaddl.u32 d0, d0 \n\t" // d0[0] = d0[0] + d0[1]
"vmov.32 %[result], d0[0] \n\t"
: [result] "=r" (res) , [input] "+r" (a)
:
: "q0", "q1", "q8", "r3");
return res;
}
#else
static int computeSumNeon(const int a[])
{
int i, res = 0;
for (i = 0; i < ARRAY_SIZE; i++)
res += a[i];
}
#endif
...
#implementation AppDelegate
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// Override point for customization after application launch.
//int* inp;
int inp[ARRAY_SIZE];
//posix_memalign((void**)&inp, 64, ARRAY_SIZE*sizeof(int)); // Align to cache line size (64bytes on a cortex A8)
// Initialise the array with consecutive integers.
int i;
for (i = 0; i < ARRAY_SIZE; i++)
{
inp[i] = i;
}
for (i = 0; i < ARRAY_SIZE; i++)
{
printf("%i,", inp[i]);
}
printf("\n\n sum 0-7:%i\n", 0+1+2+3+4+5+6+7);
int res = 0;
res = computeSumNeon(inp);
printf("res NEO :%i\n", res);
// free(inp); // error pointer being free was not allocated !!!
UISplitViewController *splitViewController = (UISplitViewController *)self.window.rootViewController;
UINavigationController *navigationController = [splitViewController.viewControllers lastObject];
navigationController.topViewController.navigationItem.leftBarButtonItem = splitViewController.displayModeButtonItem;
splitViewController.delegate = self;
return YES;
}
- (void)applicationWillResignActive:(UIApplication *)application {
...
==== assembly code generated
.align 1
.code 16 # #computeSumNeon
.thumb_func _computeSumNeon
_computeSumNeon:
Lfunc_begin3:
.loc 18 133 0 is_stmt 1 # ...
.cfi_startproc
# BB#0:
sub sp, #8
movs r1, #0
str r0, [sp, #4]
.loc 18 135 9 prologue_end # ...
Ltmp18:
str r1, [sp]
.loc 18 136 5 # ...
ldr r0, [sp, #4]
# InlineAsm Start
.align 4
vmov.i32 q8, #0x0
movw r3, #504
.loop1:
vld1.32 {d0, d1, d2, d3}, [r0]!
vadd.i32 q8, q0, q8
vadd.i32 q8, q1, q8
subs r3, #8
bne .loop1
vpadd.i32 d0, d16, d17
vpaddl.u32 d0, d0
vmov.32 r1, d0[0]
# InlineAsm End
str r1, [sp]
str r0, [sp, #4]
.loc 18 155 12 # ...
ldr r0, [sp]
.loc 18 155 5 is_stmt 0 # ...
add sp, #8
bx lr
Ltmp19:
Lfunc_end3:
.cfi_endproc
I have been trying to get the PRU to work in a way that makes sense to me and at this point I am completely clueless. I can get the examples to work, but anytime I make a change or try to write things from scratch I just beat my head against the wall. I just want to as a start access any of the USRLEDS and turn them off or on at some speed, or as first pass turn on a LED and leave it on. Here is a PASM code I got off the internet (Will post link when I find it):
.origin 0
.entrypoint START
#define PRU0_ARM_INTERRUPT 19
#define AM33XX
#define GPIO1 0x4804c000 //Trying to access the GPIO1
#define GPIO_CLEARDATAOUT 0x190 //writing 1 to the bit you want cleared in GPIO_DATAOUT register (what does that mean?)
#define GPIO_SETDATAOUT 0x194 (set a value for GPIO output pins, which pins am I even writing to? GPIO1?
#define GPIO_OE 0x134 //enable the pins output capabilities
START:
//clear that bit
lbco r0, c4, 4, 4 //This creates a constant offset and stores in c4, but why do you need that?
CLR r0, r0, 4 //if you copied the data why do you need to clear it?
SBCO r0, C4, 4, 4 //What is this for?
//MOV r1, 10
MOV r2, 0x00000000 //store address 0x00 into r2, why?
MOV r3, GPIO1 //Store GPIO1 address in r3
MOV r4, GPIO_OE //place address of GPIO_OE into r4
MOV r5, GPIO_SETDATAOUT //store address of GPIO_SETDATAOUT in r5
MOV r6, GPIO_CLEARDATAOUT //store addres of GPIOCLEARDATAOUT in r6
SBBO r2, r3, r4,4 //What is this even doing? Copying 4 bytes from r2 into r3+r4, but why do you want to copy that way and if not why not?
MOV r1, 10
MOV r2, 0xFFFFFFFF //Suppossedly this turn the GPIO1 ON and OFF?
SBBO r2, r3, r6, 4 and again the storage stuff?
HALT
I am also attaching the C code that I am using:
#include <stdio.h>
#include <pruss/prussdrv.h>
#include <pruss/pruss_intc_mapping.h>
#define PRU_NUM 0 //defining which PRU to use
int main() {
int ret;
tpruss_intc_initdata intc = PRUSS_INTC_INITDATA;
//initialize the PRU by using init command from prussdrv.h
ret = prussdrv_init();
if(ret != 0) {
printf("Error returned: %d\n",ret);
printf("PRU unable to be initialized");
return -1;
}
ret = prussdrv_open(PRU_EVTOUT_0);
if(ret != 0) {
printf("Error returned for prussdrv_open(): %d\n",ret);
printf("PRU can't open PRU_EVTOUT_0");
return -1;
}
//Map PRUS's INTC
ret = prussdrv_pruintc_init(&intc);
if (ret != 0) {
printf("Error returned for prussdrv_pruintc_int\n");
printf("PRU doesn't work");
return -1;
}
//load and execute binary on PRU
prussdrv_exec_program(PRU_NUM, "./ashwini_test.bin");
prussdrv_pru_wait_event(PRU_EVTOUT_0);
prussdrv_pru_clear_event(PRU_EVTOUT_0,PRU0_ARM_INTERRUPT);
/*Disable PRU and close memory mappings*/
prussdrv_pru_disable(PRU_NUM);
prussdrv_exit();
//prussdrv_pru_wait_event(PRU_EVTOUT_0);
return 0;
}
I have gone through THE TRM and https://groups.google.com/forum/#!topic/beaglebone/98eF1wQE_QA, and elinux and derekmolloy, I just feel like I am missing something very basic about how address scheme work or how to think about these things. Thanks again for your help!
When you say that's your PASM code... do you mean it's some code you got from somewhere else that you're trying to use? Because the comments on most lines asking what they do makes it seem unlikely that it's actually your code...
Anyways, can't really answer unless you have a specific question, but there's plenty of info out there about how to use the GPIO subsystem on the BeagleBone's AM335x processor. I talked about it some in a post a while back here: https://graycat.io/tutorials/beaglebone-io-using-python-mmap/
I've also got a few documented PRU assembly examples here: https://github.com/alexanderhiam/PRU-stuffs
Part of the source code is
id (*old_objc_msgSend)(id, SEL, ...);
__attribute__((naked))
id new_objc_msgSend(id self, SEL op, ...) {
__asm__ __volatile__ (
".thumb\n"
"ldmia.w sp, {r2, r3}\n"
"b _old_objc_msgSend\n"
);
}
But the generated assembly is
Dump of assembler code for function _Z16new_objc_msgSendP11objc_objectP13objc_selectorz:
0x01a7ae9c <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+0>: stmia.w sp, {r2, r3}
0x01a7aea0 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+4>: ldmia.w sp, {r2, r3}
0x01a7aea4 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+8>: b.w 0x1a7af68 <_Z27new_initWithContentwithSizeP11objc_objectP13objc_selectorS0_6CGSize+188>
0x01a7aea8 <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+12>: bx lr
0x01a7aeaa <_Z16new_objc_msgSendP11objc_objectP13objc_selectorz+14>: nop
End of assembler dump.
It branches to a different address.
I am new to gdb. I want to print the memory addresses used with the actual sequence during execution of a c program. Let’s explain my question with an example. Let’s assume that we have the following c code with two functions main() and test(). I know that, inside gdb, I can use "disassemble main" to disassemble main() function, or "disassemble test" to disassemble test() function separately. My question is, how can I disassemble these two functions as a single code; so that, I can see all the memory addresses used during execution and their sequence of accesses? To be specific, as main() is calling test() and test() is also calling itself multiple times, I want to see something like example 2. I am also wandering, the addresses shown in gdb disassembler, are they virtual or physical memory addresses? Any help or guidance will be appreciated.
Example 1:
#include "stdio.h"
int test(int q)
{
if(q<16)
test(q+5);
return q;
}
void main()
{
unsigned int a=5;
unsigned int b=5;
unsigned int c=5;
test(a);
}
Example 2:
<Memory Address> <assembly instruction> <c instructions>
0x12546a mov //for unsigned int a=5;
0x12546b mov //for unsigned int b=5;
0x12546c mov //for unsigned int c=5;
0x12546d jmp //for test(q=a=5);
0x12546e cmpl //for if(q<16)
0x12546f jmp //for test(q+5);
0x12546d jmp //for test(q=10);
0x12546e cmpl //for if(q<16)
0x12546f jmp //for test(q+5);
0x12547a jmp //for test(q=15);
0x12547b cmpl //for if(q<16)
0x12547c jmp //for test(q+5);
0x12547d jmp //for test(q=20);
0x12547e cmpl //for if(q<16)
0x12547f jmp //return q);
0x12548a jmp //return q);
0x12548b jmp //return q);
0x12548c jmp //return q);
There's really no pretty way to do this. You're just going to have to step through the code:
(gdb) stepi
(gdb) x/i $pc
(gdb) info registers
(gdb) stepi
(gdb) x/i $pc
(gdb) info registers
.....
You could script that up so that it does it quickly and dumps the data to a file, but that's about all.
I suppose you may have more luck with valgrind. If there's no existing tool to do so, it is possible to add your own instrumentation to report memory accesses (and not only that), or alter an existing one.
E.g. see http://valgrind.org/docs/manual/lk-manual.html
--trace-mem= [default: no]
When enabled, Lackey prints the size and address of almost every memory access made by the program.