how to call malloc in arm64 ios assembly - ios

I am trying to call malloc from iOS arm64 assembly .s file,
However calling _test_malloc from *.m file, it does not return from _test_malloc()
(running this on iPhone5s)
What am I misunderstanding ?
//test_malloc.s
.private_extern _test_malloc
.globl _test_malloc
.align 2
_test_malloc:
mov x0, #8
bl _malloc //wordPtr = malloc(8)
ret
//run_test_malloc.m
extern uint32_t* test_malloc();
static void run_test_malloc() {
uint32_t* ptr = test_malloc();
}

You're not saving the caller's link register contents. You need to save this in your function prologue instruction and you need to restore it in the function epilogue. Because you're saving something on your stack, you'll need to also adjust the stack pointer down so it stays 16-byte aligned as required by the ABI. You'll need to set your frame pointer register and restore it after the function all.
I would recommend disassembling compiler-generated functions to see how this setup and teardown is done. It is simple template code that's done the same in nearly every function.

Here is how I fixed it:
.private_extern _test_malloc
.globl _test_malloc
.align 2
_test_malloc:
//function prolog
stp fp, lr, [sp, #-16]!
mov fp, sp
orr x0, xzr, #0x8
bl _malloc //malloc(8)
//function epilog
ldp fp, lr, [sp], #16
ret lr

Related

No memory source operand for crc32 in Delphi 11? [duplicate]

I am trying to assemble the code below using yasm. I have put 'here' comments where yasm reports the error "error: invalid size for operand 2". Why is this error happening ?
segment .data
a db 25
b dw 0xffff
c dd 3456
d dq -14
segment .bss
res resq 1
segment .text
global _start
_start:
movsx rax, [a] ; here
movsx rbx, [b] ; here
movsxd rcx, [c] ; here
mov rdx, [d]
add rcx, rdx
add rbx, rcx
add rax, rbx
mov [res], rax
ret
For most instructions, the width of the register operand implies the width of the memory operand, because both operands have to be the same size. e.g. mov rdx, [d] implies mov rdx, qword [d] because you used a 64-bit register.
But the same movsx / movzx mnemonics are used for the byte-source and word-source opcodes, so it's ambiguous unless the source is a register (like movzx eax, cl). Another example is crc32 r32, r/m8 vs. r/m16 vs. r/m32. (Unlike movsx/zx, its source size can be as wide as the operand-size.)
movsx / movzx with a memory source always need the width of the memory operand specified explicitly.
The movsxd mnemonic is supposed to imply a 32-bit source size. movsxd rcx, [c] assembles with NASM, but apparently not with YASM. YASM requires you to write dword, even though it doesn't accept byte, word, or qword there, and it doesn't accept movsx rcx, dword [c] either (i.e. it requires the movsxd mnemonic for 32-bit source operands).
In NASM, movsx rcx, dword [c] assembles to movsxd, but movsxd rcx, word [c] is still rejected. i.e. in NASM, plain movsx is fully flexible, but movsxd is still rigid. I'd still recommend using dword to make the width of the load explicit, for the benefit of humans.
movsx rax, byte [a]
movsx rbx, word [b]
movsxd rcx, dword [c]
Note that the "operand size" of the instruction (as determined by the operand-size prefix to make it 16-bit, or REX.W=1 to make it 64-bit) is the destination width for movsx / movzx. Different source sizes use different opcodes.
In case it's not obvious, there's no movzxd because 32-bit mov already zero-extends to 64-bit implicitly. movsxd eax, ecx is encodeable, but not recommended (use mov instead).
In AT&T syntax, you need to explicitly specify both the source and destination width in the mnemonic, like movsbq (%rsi), %rax. GAS won't let you write movsb (%rsi), %eax to infer a destination width (operand-size) because movsb/movsw/etc are the mnemonics for string-move instructions with implicit (%rsi), (%rdi) operands.
Fun fact: GAS and clang do allow it for things like movzb (%rsi), %eax as movzbl, but GAS only has extra logic to allow disambiguation (not just inferring size) based on operands when it's necessary, like movsd (%rsi), %xmm0 vs. movsd. (Clang12.0.1 actually does accept movsb (%rcx), %eax as movsbl, but GAS 2.36.1 doesn't, so for portability it's best to be explicit with sign-extension, and not a bad idea for zero-extension too.)
Other stuff about your source code:
NASM/YASM allow you to use the segment keyword instead of section, but really you're giving ELF section names, not executable segment names. Also, you can put read-only data in section .rodata (which is linked as part of the text segment). What's the difference of section and segment in ELF file format.
You can't ret from _start. It's not a function, it's your ELF entry point. The first thing on the stack is argc, not a valid return address. Use this to exit cleanly:
xor edi,edi
mov eax, 231
syscall ; sys_exit_group(0)
See the x86 tag wiki for links to more useful guides (and debugging tips at the bottom).

How llvm does O2 optimisation

I am trying to see the IR of a very simple loop
for (int i = 0; i < 15; i++){
a[b[i]]++;
}
while compile using -O0 and diving into the .ll file, I can see instructions written step by step in the define i32 #main() function. However, while compiling using -O2 and looking into the .ll file, there is only ret i32 0 in the define i32 #main() function. And some call instruction presented in the .ll file compiled by -O0 are changed to tail call in the .ll file compiled by -O2.
Can anyone give a rather detailed explanation on how llvm does the -O2 compilation? Thanks.
T
We can use the Compiler Explorer at godbolt.org to look at your example. We'll use the following testbench code:
int test() {
int a[15] = {0};
int b[15] = {0};
for (int i = 0; i < 15; i++){
a[b[i]]++;
}
return 0;
}
Godbolt shows the x86 assembly, not the LLVM bytecode, but I've summarized it a bit to show what's going on. Here it is at -O0 -m32:
test():
# set up stack
.LBB0_1:
cmp dword ptr [ebp - 128], 15 # i < 15?
jge .LBB0_4 # no? then jump out of loop
mov eax, dword ptr [ebp - 128] # load i
mov eax, dword ptr [ebp + 4*eax - 124] # load b[i]
mov ecx, dword ptr [ebp + 4*eax - 64] # load a[b[i]]
add ecx, 1 # increment it
mov dword ptr [ebp + 4*eax - 64], ecx # store it back
mov eax, dword ptr [ebp - 128]
add eax, 1 # increment i
mov dword ptr [ebp - 128], eax
jmp .LBB0_1 # repeat
.LBB0_4:
# tear down stack
ret
This looks like we'd expect: the loop is clearly visible and it does all the steps we listed. If we compile at -O1 -m32 -march=i386, we see the loop is still there but it's much simpler:
test(): # #test()
# set up stack
.LBB0_1:
mov ecx, dword ptr [esp + 4*eax] # load b[i]
inc dword ptr [esp + 4*ecx + 60] # increment a[b[i]]
inc eax # increment i
cmp eax, 15 # compare == 15
jne .LBB0_1 # no? then loop
# tear down stack
ret
Clang now uses the inc instruction (useful), noticed it could use the eax register for the loop counter i (neat), and moved the condition check to the bottom of the loop (probably better). We can still recognize our original code, though. Now let's try with -O2 -m32 -march=i386:
test():
xor eax, eax # does nothing
ret
That's it? Yes.
clang has detected that the a array can never be used outside of the function. This means doing the incrementing will never affect any other part of the program - and also that nobody will miss it when it's gone.
Removing the increment leaves a for loop with an empty body and no side effects, which can also be removed. In turn, removing the loop leaves an (for all intents and purposes) empty function.
This empty function is likely what you were seeing in the LLVM bytecode (ret i32 0).
This is not a very scientific description, and the steps clang takes might be different, but I hope the example clears it up a bit. If you want, you can read up on the as-if rule. I also recommend playing around on https://godbolt.org/ for a bit: see what happens when you move a and b outside the function, for example.

How do I translate DOS assembly targeted for the small memory model to the large memory model?

I'm somewhat new to assembly language and wanted to understand how it works on an older system. I understand that the large memory model uses far pointers while the small memory model uses near pointers, and that the return address in the large model is 4 bytes instead of two, so the first parameter changes from [bp+4] to [bp+6]. However, in the process of adapting a graphics library from a small to a large model, there are other subtle things that I don't seem to understand. Running this code with a large memory model from C is supposed to clear the screen, but instead it hangs the system (it was assembled with TASM):
; void gr256cls( int color , int page );
COLOR equ [bp+6]
GPAGE equ [bp+8]
.MODEL LARGE,C
.186
public C gr256cls
.code
gr256cls PROC
push bp
mov bp,sp
push di
pushf
jmp skip_1
.386
mov ax,0A800h
mov es,ax
mov ax,0E000h
mov fs,ax
CLD
mov al,es:[bp+6]
mov ah,al
mov bx,ax
shl eax,16
mov ax,bx
cmp word ptr GPAGE,0
je short cls0
cmp word ptr GPAGE,2
je short cls0
jmp short skip_0
cls0:
mov bh,0
mov bl,1
call grph_cls256
skip_0:
cmp word ptr GPAGE,1
je short cls1
cmp word ptr GPAGE,2
je short cls1
jmp short skip_1
cls1:
mov bh,8
mov bl,9
call grph_cls256
skip_1:
.186
popf
pop di
pop bp
ret
.386
grph_cls256:
mov fs:[0004h],bh
mov fs:[0006h],bl
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,16384
mov di,0
rep stosd
add word ptr fs:[0004h],2
add word ptr fs:[0006h],2
mov cx,14848 ;=8192+6656
mov di,0
rep stosd
;; Freezes here.
ret
gr256cls ENDP
end
It hangs at the ret at the end of grph_256cls. In fact, even if I immediately ret from the beginning of the function it still hangs right after. Is there a comprehensive list of differences when coding assembly in the two modes, so I can more easily understand what's happening?
EDIT: To clarify, this is the original source. This is not generated output; it's intended to be assembled and linked into a library.
I changed grph_256cls to a procedure with PROC FAR and it now works without issue:
grph_cls256 PROC FAR
...
grph_cls256 ENDP
The issue had to do with how C expects functions to be called depending on the memory model. In the large memory model, all function calls are far. I hadn't labeled this assumption on the grph_256cls subroutine when trying to call it, so code that didn't push/pop the right values onto/off the stack was assembled instead.

Understanding calling convention and stack pointer

I want to understand how should I use local variables and how to pass arguments to function in x86. I read a lot of guides, and they all wrote that the first parameter should be at [ebp+8], but it isn't here :/ WHat am I missing? What am I not understanding correctly?
number byte "724.5289",0
.code
main PROC
mov ebx,offset number ;making so that [ebp] = '7' atm
push ebx ;I push it on stack so I can access it inside the function
call rewrite
main ENDP
rewrite PROC
push ebp ; push ebp so we can retrieve later
mov ebp, esp ; use esp memory to retrieve parameters and
sub esp, 8 ; allocate data for local variable
lea ebx, [ebp-8]
lea eax, [ebp+8] ; i think here ebp+8 should point to the same now to which ebx did
;before function, but it does not, writechar prints some garbage ascii character
call writechar
call crlf
rewrite ENDP
END main
You pass a pointer as argument to rewrite, and then pass its address on to writechar. That is you take the address twice. That is one too many :)
You want mov eax, [ebp+8] instead of lea eax, [ebp+8]
Also, you need to clean up the stack after yourself, which you don't do. Furthermore, make sure your assembler automatically emits a RET for the ENDP directive, otherwise you will be in trouble. You might want to write it out explicitly.

ARC and autorelease

autorelease is used for returned function object so the caller don't take ownership and callee will release the object in the future.
However, ARC is capable to count ownership of caller and release it after use, that is, it can behavior just like Smart Pointer in C++. With ARC, it can get rid of autorelease because autorelease is non-deterministic.
The reason I ask for this question is that I do see the returned object calls dealloc earlier in ARC than non-ARC code. This leads me to think that ARC can behvior like Smart Pointer and can make autorelease useless. Is it true or possible? The only thing I can think about autorelease usefullness is in multip-thread or network code because it may be not easier to count the ownership when the object is passing around.
Thanks for your thoughts.
Here is new edit to make thing clear:
with autorelease
+ (MyClass*) myClass
{
return [[[MyCClass alloc] init] autorelease];
}
- doSomething
{
MyClass *obj = [MyClass myClass];
}
With ARC:
+ (MyClass*) myClass
{
return [[MyCClass alloc] init]; // no autorelease
}
- doSomething
{
MyClass *obj = [MyClass myClass];
// insert [obj release]
}
So, we really don't need autorelease.
Autorelease as a mechanism is still used by ARC, furthermore ARC compiled-code is designed to interoperate seamlessly with MRC compiled-code so the autorelease machinery is around.
First, don't think in terms of reference counts but in terms of ownership interest - as long as there is a declared ownership interest in an object then the object lives, when there is no ownership interest it is destroyed. In MRC you declare ownership interest by using retain, or by creating a new object; and you relinquish ownership interest by using release.
Now when a callee method creates an object and wishes to return it to its caller the callee is going away so it needs to relinquish ownership interest, and so the caller needs to declare its ownership interest or the object may be destroyed. But there is a problem, the callee finishes before the caller receives the object - so when the caller relinquishes its ownership interest the object may be destroyed before the caller has a chance to declare its interest - not good.
Two solutions are used to address this:
1) The method is declared to transfer ownership interest in its return value from the callee to the caller - this is the model used for init, copy, etc. methods. The callee never notifies it is relinquishing its ownership interest, and the callee never declares ownership interest - by agreement the caller just takes over the ownership interest and the responsibility of relinquishing it later.
2) The method is declared to return a value in which the caller has no ownership interest, but which someone else will maintain an ownership interest in for some short period of time - usually until the end of the current run loop cycle. If the caller wants to use the return value longer than that is must declare its own ownership interest, but otherwise it can rely on someone else having an ownership interest and hence the object staying around.
The question is who can that "someone" be who maintains the ownership interest? It cannot be the callee method as it is about to go away. Enter the "autorelease pool" - this is just an object to which anybody can transfer an ownership interest to so the object will stay around for a while. The autorelease pool will relinquish its ownership interest in all the objects transferred to it in this way when instructed to do so - usually at the end of the current run loop cycle.
Now if the above makes any sense (i.e. if I explained it clearly), you can see that method (2) is not really required as you could always use method (1); but, and its a crucial but, under MRC that is a lot more work for the programmer - every value received from a method comes with an ownership interest which must be managed and relinquished at some point - generate a string just to output it? Well you then need to relinquish your interest in that temporary string... So (2) makes life a lot easier.
One the other hand computers are just fast idiots, and counting things and inserting code to relinquish ownership interest on behalf of the intelligent programmers is something they are well suited to. So ARC doesn't need the auto release pool. But it can make things easier and more efficient, and behind the scenes ARC optimises its use - look at the assembler output in Xcode and you'll see calls to routines with name similar to "retainAutoreleasedReturnValue"...
So you are right, its not needed, however it is still useful - but under ARC you can (usually) forget it even exists.
HTH more than it probably confuses!
autorelease is used for returned function object so the caller don't take ownership and callee will release the object in the future.
If autoreleased, it will be added to the autorelease pool. When the autorelease pool is drained, the deferred release will be performed. a function/method does not need to return an autoreleased object (e.g. it could be an ivar which did not receive a retain/autorelease cycle).
However, ARC is capable to count ownership of caller and release it after use, that is, it can behavior just like Smart Pointer in C++. With ARC, it can get rid of autorelease because autorelease is non-deterministic.
It has the potential to. There is no guarantee. The biggest 'problem' here is that the compiler does not know/care the memory mechanics of the returned object of an arbitrary call. It cannot assume how an object is returned because ARC is a new addition which predates MRC. This is important because it makes ARC programs compatible with programs which use manual retain/release. For example, Foundation.framework may use ARC, or it may use MRC, or it may use both. It may also call into APIs which were built using older toolchains. So this has the benefit of keeping a ton of existing code usable.
The reason I ask for this question is that I do see the returned object calls dealloc earlier in ARC than non-ARC code.
There's an optional way to return an object -- see CRD's answer (+1) about assembly and the calls the compiler inserts to perform reference count operations e.g. retainAutoreleasedReturnValue.
In any event, there is no guarantee that lifetimes will always be reduced in ARC. A programmer who understands execution of their program can minimize lifetimes and ref count operations because ARC has stricter lifetime and ownership requirements.
This leads me to think that ARC can behvior like Smart Pointer and can make autorelease useless. Is it true or possible?
In theory, I don't see why autorelease pools could not be done away with for a new system. However, I think there's too much existing code that relies on autorelease pools to lift that restriction -- I think they would need to phase in a new executable format (as was the case with ObjC Garbage Collection) and review a ton of existing APIs and programs for such a significant transition to succeed. Also, a few APIs would probably just need to be removed. APIs may need some strengthening concerning ownership to accomplish this, but most of that is complete in programs which have already been migrated to ARC. Heck, even the compiler could (be extended to) internally use a form of smart pointers for passing and returning objc types and autorelease pools could be eliminated in such a system. Again, that would require a lot of code to be migrated. So such an upgrade would be like an ARC V2.
The only thing I can think about autorelease usefullness is in multip-thread or network code because it may be not easier to count the ownership when the object is passing around.
Not an issue - autorelease pools are thread local. I don't see an issue beyond that in such a system (unless you are relying on a race condition, which is obviously a bad idea).
Difference between ARC and autorelease explained in code :
ARC :
-somefunc {
id obj = [NSArray array];
NSLog(#"%#", obj);
// ARC now calls release for the first object
id obj2 = [NSArray array];
NSLog(#"%#", obj2);
// ARC now calls release for the second object
}
Autorelease :
-somefunc {
id obj = [NSArray array];
NSLog(#"%#", obj);
id obj2 = [NSArray array];
NSLog(#"%#", obj2);
}
// Objects are released some time after this
Basically ARC works once a variable isn't used anymore in a scope, while autorelease waits until it reaches the main loop and then calls release on all objects in the pool. ARC is used inside the scope, autorelease is used outside the scope of the function.
autorelease is still used under ARC. ARC just makes the call for you and is clever about short-circuiting it. Here is a demonstration of exactly how that works, which I'll copy here in case that blog post ever disappears; all due credit to Matt Galloway.
So consider the following method:
void foo() {
#autoreleasepool {
NSNumber *number = [NSNumber numberWithInt:0];
NSLog(#"number = %p", number);
}
}
This is entirely contrived, of course, but it should let us see what’s
going on. In non-ARC land we would assume here that number would be
allocated inside numberWithInt: and returned autoreleased. So when the
autorelease pool is next drained, it will be released. So let’s see if
that’s what happened (as usual, this is ARMv7 instructions):
.globl _foo
.align 2
.code 16
.thumb_func _foo
_foo:
push {r4, r7, lr}
add r7, sp, #4
blx _objc_autoreleasePoolPush
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
movs r2, #0
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
mov r4, r0
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_0:
add r1, pc
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_1:
add r0, pc
ldr r1, [r1]
ldr r0, [r0]
blx _objc_msgSend
mov r1, r0
movw r0, :lower16:(L__unnamed_cfstring_-(LPC0_2+4))
movt r0, :upper16:(L__unnamed_cfstring_-(LPC0_2+4))
LPC0_2:
add r0, pc
blx _NSLog
mov r0, r4
blx _objc_autoreleasePoolPop
pop {r4, r7, pc}
Well, yes. That’s exactly what’s happening. We can see the call to
push an autorelease pool then a call to numberWithInt: then a call to
pop an autorelease pool. Exactly what we’d expect. Now let’s look at
the exact same code compiled under ARC:
.globl _foo
.align 2
.code 16
.thumb_func _foo
_foo:
push {r4, r5, r7, lr}
add r7, sp, #8
blx _objc_autoreleasePoolPush
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
movs r2, #0
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
mov r4, r0
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_0:
add r1, pc
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
LPC0_1:
add r0, pc
ldr r1, [r1]
ldr r0, [r0]
blx _objc_msgSend
# InlineAsm Start
mov r7, r7 # marker for objc_retainAutoreleaseReturnValue
# InlineAsm End
blx _objc_retainAutoreleasedReturnValue
mov r5, r0
movw r0, :lower16:(L__unnamed_cfstring_-(LPC0_2+4))
movt r0, :upper16:(L__unnamed_cfstring_-(LPC0_2+4))
mov r1, r5
LPC0_2:
add r0, pc
blx _NSLog
mov r0, r5
blx _objc_release
mov r0, r4
blx _objc_autoreleasePoolPop
pop {r4, r5, r7, pc}
Notice the calls to objc_retainAutoreleasedReturnValue and
objc_release. What’s happening there is that ARC has determined for us
that it doesn’t really need to worry about the autorelease pool that’s
in place, because it can simply tell the autorelease to not happen
(with the call to objc_retainAutoreleasedReturnValue) and then release
the object later itself. This is desirable as it means the autorelease
logic doesn’t have to happen.
Note that the autorelease pool is still required to be pushed and
popped because ARC can’t know what’s going on in the calls to
numberWithInt: and NSLog to know if objects will be put into the pool
there. If it did know that they didn’t autorelease anything then it
could actually get rid of the push and pop. Perhaps that kind of
logic will come in future versions although I’m not quite sure how the
semantics of that would work though.
Now let’s consider another example which is where we want to use
number outside of the scope of the autorelease pool block. This should
show us why ARC is a wonder to work with. Consider the following code:
void bar() {
NSNumber *number;
#autoreleasepool {
number = [NSNumber numberWithInt:0];
NSLog(#"number = %p", number);
}
NSLog(#"number = %p", number);
}
You might be (correctly) thinking that this is going to cause problems
even though it looks perfectly innocuous. It’s a problem because
number will be allocated inside the autorelease pool block, will be
deallocated when the autorelease pool pops but is then used after it’s
been deallocated. Uh oh! Let’s see if we’re right by compiling it
without ARC enabled:
.globl _bar
.align 2
.code 16
.thumb_func _bar
_bar:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
blx _objc_autoreleasePoolPush
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
movs r2, #0
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
mov r4, r0
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
LPC1_0:
add r1, pc
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
LPC1_1:
add r0, pc
ldr r1, [r1]
ldr r0, [r0]
blx _objc_msgSend
movw r6, :lower16:(L__unnamed_cfstring_-(LPC1_2+4))
movt r6, :upper16:(L__unnamed_cfstring_-(LPC1_2+4))
LPC1_2:
add r6, pc
mov r5, r0
mov r1, r5
mov r0, r6
blx _NSLog
mov r0, r4
blx _objc_autoreleasePoolPop
mov r0, r6
mov r1, r5
blx _NSLog
pop {r4, r5, r6, r7, pc}
Obviously no calls to retain, release or autorelease as we’d expect
since we haven’t made any explicitly and we’re not using ARC. We can
see here that it’s been compiled exactly as we’d expect from our
reasoning before. So let’s see what it looks like when ARC gives us a
helping hand:
.globl _bar
.align 2
.code 16
.thumb_func _bar
_bar:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
blx _objc_autoreleasePoolPush
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
movs r2, #0
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
mov r4, r0
movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
LPC1_0:
add r1, pc
movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
LPC1_1:
add r0, pc
ldr r1, [r1]
ldr r0, [r0]
blx _objc_msgSend
# InlineAsm Start
mov r7, r7 # marker for objc_retainAutoreleaseReturnValue
# InlineAsm End
blx _objc_retainAutoreleasedReturnValue
movw r6, :lower16:(L__unnamed_cfstring_-(LPC1_2+4))
movt r6, :upper16:(L__unnamed_cfstring_-(LPC1_2+4))
LPC1_2:
add r6, pc
mov r5, r0
mov r1, r5
mov r0, r6
blx _NSLog
mov r0, r4
blx _objc_autoreleasePoolPop
mov r0, r6
mov r1, r5
blx _NSLog
mov r0, r5
blx _objc_release
pop {r4, r5, r6, r7, pc}
Round of applause for ARC please! Notice that it’s realised we’re
using number outside of the scope of the autorelease pool block so
it’s retained the return value from numberWithInt: just as it did
before, but this time it’s placed the release at the end of the bar
function rather than before the autorelease pool is popped. That will
have saved us a crash in some code that we might have thought was
correct but actually had a subtle memory management bug.
However, ARC is capable to count ownership of caller and release it
after use, that is, it can behavior just like Smart Pointer in C++.
With ARC, it can get rid of autorelease because autorelease is
non-deterministic.
You are confusing ARC with reference counting. Objective-C has always relied on reference counting for memory management. ARC continues this tradition and simply eliminates the need for the programmer to manually insert appropriate calls to -retain, -release, and -autorelease. Under ARC the compiler inserts these calls for you, but the reference counting mechanism remains the same as it has always been.
ARC does not eliminate the need for autorelease, but it may be able to avoid it in situations where a human would typically have used it.

Resources