Julia massively outperforms Delphi. Obsolete asm code by Delphi compiler? - delphi

I wrote a simple for loop in Delphi.
The same program is 7.6 times faster in Julia 1.6.
procedure TfrmTester.btnForLoopClick(Sender: TObject);
VAR
i, Total, Big, Small: Integer;
s: string;
begin
TimerStart;
Total:= 0;
Big := 0;
Small:= 0;
for i:= 1 to 1000000000 DO //1 billion
begin
Total:= Total+1;
if Total > 500000
then Big:= Big+1
else Small:= Small+1;
end;
s:= TimerElapsedS;
//here code to show Big/Small on the screen
end;
The ASM code seems decent to me:
TesterForm.pas.111: TimerStart;
007BB91D E8DE7CF9FF call TimerStart
TesterForm.pas.113: Total:= 0;
007BB922 33C0 xor eax,eax
007BB924 8945F4 mov [ebp-$0c],eax
TesterForm.pas.114: Big := 0;
007BB927 33C0 xor eax,eax
007BB929 8945F0 mov [ebp-$10],eax
TesterForm.pas.115: Small:= 0;
007BB92C 33C0 xor eax,eax
007BB92E 8945EC mov [ebp-$14],eax
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB931 C745F801000000 mov [ebp-$08],$00000001
TesterForm.pas.118: Total:= Total+1;
007BB938 FF45F4 inc dword ptr [ebp-$0c]
TesterForm.pas.119: if Total > 500000
007BB93B 817DF420A10700 cmp [ebp-$0c],$0007a120
007BB942 7E05 jle $007bb949
TesterForm.pas.120: then Big:= Big+1
007BB944 FF45F0 inc dword ptr [ebp-$10]
007BB947 EB03 jmp $007bb94c
TesterForm.pas.121: else Small:= Small+1;
007BB949 FF45EC inc dword ptr [ebp-$14]
TesterForm.pas.122: end;
007BB94C FF45F8 inc dword ptr [ebp-$08]
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB94F 817DF801CA9A3B cmp [ebp-$08],$3b9aca01
007BB956 75E0 jnz $007bb938
TesterForm.pas.124: s:= TimerElapsedS;
007BB958 8D45E8 lea eax,[ebp-$18]
How can it be that Delphi has such a pathetic score compared with Julia?
Can I do anything to improve the code generated by the compiler?
Info
My Delphi 10.4.2 program is Win32 bit. Of course, I run in "Release" mode :)
But the ASM code above is for the "Debug" version because I don't know how to pause the execution of the program when I run an optimized EXE file. But the difference between a Release and a Debug exe is pretty small (1.8 vs 1.5 sec). Julia does it in 195ms.
More discussions
I do have to mention that when you run the code in Julia for the first time, its time is ridiculous high, because Julia is JIT, so it has to compile the code first. The compilation time (since it is "one-time") was not included in the measurement.
Also, as AmigoJack commented, Delphi code will run pretty much everywhere, while Julia code will probably only run in computers that have a modern CPU to support all those new/fancy instructions. I do have small tools that I produced back in 2004 and still run today.
Whatever code Julia produces cannot be delivered to "customers" unless that have Julia installed.
Anyway, all these being said, it is sad that that Delphi compiler is so outdated.
I ran other tests, finding the shortest and longest string in a list of strings is 10x faster in Delphi than Julia. Allocating small blocks of memory (10000x10000x4 bytes) has the same speed.
As AhnLab mentioned, I run pretty "dry" tests. I guess a full program that performs more complex/realistic tasks needs to be written and see at the end of the program if Julia still outperforms Delphi 7x.
Update
Ok, the Julia code seems totally alien to me. Seems to use more modern ops:
; ┌ # Julia_vs_Delphi.jl:4 within `for_fun`
pushq %rbp
movq %rsp, %rbp
subq $96, %rsp
vmovdqa %xmm11, -16(%rbp)
vmovdqa %xmm10, -32(%rbp)
vmovdqa %xmm9, -48(%rbp)
vmovdqa %xmm8, -64(%rbp)
vmovdqa %xmm7, -80(%rbp)
vmovdqa %xmm6, -96(%rbp)
movq %rcx, %rax
; │ # Julia_vs_Delphi.jl:8 within `for_fun`
; │┌ # range.jl:5 within `Colon`
; ││┌ # range.jl:354 within `UnitRange`
; │││┌ # range.jl:359 within `unitrange_last`
testq %rdx, %rdx
; │└└└
jle L80
; │ # Julia_vs_Delphi.jl within `for_fun`
movq %rdx, %rcx
sarq $63, %rcx
andnq %rdx, %rcx, %r9
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
cmpq $8, %r9
jae L93
; │ # Julia_vs_Delphi.jl within `for_fun`
movl $1, %r10d
xorl %edx, %edx
xorl %r11d, %r11d
jmp L346
L80:
xorl %edx, %edx
xorl %r11d, %r11d
xorl %r9d, %r9d
jmp L386
L93: movabsq $9223372036854775800, %r8 # imm = 0x7FFFFFFFFFFFFFF8
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
andq %r9, %r8
leaq 1(%r8), %r10
movabsq $.rodata.cst32, %rcx
vmovdqa (%rcx), %ymm1
vpxor %xmm0, %xmm0, %xmm0
movabsq $.rodata.cst8, %rcx
vpbroadcastq (%rcx), %ymm2
movabsq $1023787240, %rcx # imm = 0x3D05C0E8
vpbroadcastq (%rcx), %ymm3
movabsq $1023787248, %rcx # imm = 0x3D05C0F0
vpbroadcastq (%rcx), %ymm5
vpcmpeqd %ymm6, %ymm6, %ymm6
movabsq $1023787256, %rcx # imm = 0x3D05C0F8
vpbroadcastq (%rcx), %ymm7
movq %r8, %rcx
vpxor %xmm4, %xmm4, %xmm4
vpxor %xmm8, %xmm8, %xmm8
vpxor %xmm9, %xmm9, %xmm9
nopw %cs:(%rax,%rax)
; │ # Julia_vs_Delphi.jl within `for_fun`
L224:
vpaddq %ymm2, %ymm1, %ymm10
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
vpxor %ymm3, %ymm1, %ymm11
vpcmpgtq %ymm11, %ymm5, %ymm11
vpxor %ymm3, %ymm10, %ymm10
vpcmpgtq %ymm10, %ymm5, %ymm10
vpsubq %ymm11, %ymm0, %ymm0
vpsubq %ymm10, %ymm4, %ymm4
vpaddq %ymm11, %ymm8, %ymm8
vpsubq %ymm6, %ymm8, %ymm8
vpaddq %ymm10, %ymm9, %ymm9
vpsubq %ymm6, %ymm9, %ymm9
vpaddq %ymm7, %ymm1, %ymm1
addq $-8, %rcx
jne L224
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
vpaddq %ymm8, %ymm9, %ymm1
vextracti128 $1, %ymm1, %xmm2
vpaddq %xmm2, %xmm1, %xmm1
vpshufd $238, %xmm1, %xmm2 # xmm2 = xmm1[2,3,2,3]
vpaddq %xmm2, %xmm1, %xmm1
vmovq %xmm1, %r11
vpaddq %ymm0, %ymm4, %ymm0
vextracti128 $1, %ymm0, %xmm1
vpaddq %xmm1, %xmm0, %xmm0
vpshufd $238, %xmm0, %xmm1 # xmm1 = xmm0[2,3,2,3]
vpaddq %xmm1, %xmm0, %xmm0
vmovq %xmm0, %rdx
cmpq %r8, %r9
je L386
L346:
leaq 1(%r9), %r8
nop
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
; │┌ # operators.jl:378 within `>`
; ││┌ # int.jl:83 within `<`
L352:
xorl %ecx, %ecx
cmpq $500000, %r10 # imm = 0x7A120
seta %cl
cmpq $500001, %r10 # imm = 0x7A121
; │└└
adcq $0, %rdx
addq %rcx, %r11
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
; │┌ # range.jl:837 within `iterate`
incq %r10
; ││┌ # promotion.jl:468 within `==`
cmpq %r10, %r8
; │└└
jne L352
; │ # Julia_vs_Delphi.jl:17 within `for_fun`
L386:
movq %r9, (%rax)
movq %rdx, 8(%rax)
movq %r11, 16(%rax)
vmovaps -96(%rbp), %xmm6
vmovaps -80(%rbp), %xmm7
vmovaps -64(%rbp), %xmm8
vmovaps -48(%rbp), %xmm9
vmovaps -32(%rbp), %xmm10
vmovaps -16(%rbp), %xmm11
addq $96, %rsp
popq %rbp
vzeroupper
retq
nopw %cs:(%rax,%rax)

Let's start by noting that there is no reason for an optimizing compiler to actually perform the loop, at present Delphi and Julia output similar assembler that actually run through the loop but the compilers could in the future just skip the loop and assign the values. Microbenchmarks are tricky.
The difference seems to be that Julia makes use of SIMD instructions which makes perfect sense for such loop (~8x speedup makes perfect sense depending on your CPU).
You could have a look at this blog post for thoughts on SIMD in Delphi.
Although this is not the main point of the answer, I'll expand a bit on the possibility to remove the loop altogether. I don't know for sure what the Delphi specification says but in many compiled languages, including Julia ("just-ahead-of-time"), the compiler could simply figure out the state of the variables after the loop and replace the loop with that state. Have a look at the following C++ code (compiler explorer):
#include <cstdio>
void loop() {
long total = 0, big = 0, small = 0;
for (long i = 0; i < 100; ++i) {
total++;
if (total > 50) {
big++;
} else {
small++;
}
}
std::printf("%ld %ld %ld", total, big, small);
}
this is the assembler clang trunk outputs:
loop(): # #loop()
lea rdi, [rip + .L.str]
mov esi, 100
mov edx, 50
mov ecx, 50
xor eax, eax
jmp printf#PLT # TAILCALL
.L.str:
.asciz "%ld %ld %ld"
as you can see, no loop, just the result. For longer loops clang stops doing this optimization but that's just a limitation of the compiler, other compilers could do it differently and I'm sure there is a heavily optimizing compiler out there that handles much more complex situations.

Related

Optimization bug in Apple's LLVM, or bug in code?

I have some iOS C++ code that compiles correctly on my local machine (LLVM 9.0) but compiles incorrectly on my build server (LLVM 10.0). The project is generated via CMake (same version on both) so the code being compiled is the same, with the same compiler settings.
After finally realizing that some critical values weren't being updated on the LLVM10 version I investigated the assembly and found out it was completely skipping part of the code.
void SceneDisplay::SetSize(const math::Vec2 &Size)
{
m_Size = Size;
m_ScreenWidth = int(m_Size.x * float(GraphicsUtil::WIDTH));
m_ScreenHeight = int(m_Size.y * float(GraphicsUtil::HEIGHT));
UpdateOffsetScale();
}
m_Size is initialized to 1.0,1.0 in the class constructor. This works fine and everything is perfect with LLVM9 - with LLVM10 we get the following disassembly:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq __ZN12GraphicsUtil6HEIGHTE#GOTPCREL(%rip), %rax
movq __ZN12GraphicsUtil5WIDTHE#GOTPCREL(%rip), %rcx
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movq -8(%rbp), %rsi
Ltmp2347:
movq -16(%rbp), %rdi
movq (%rdi), %rdi
movq %rdi, 56(%rsi)
movl (%rcx), %edx
movl %edx, 12(%rsi)
movl (%rax), %edx
movl %edx, 16(%rsi)
movq (%rsi), %rax
movq %rsi, %rdi
callq *136(%rax)
addq $16, %rsp
popq %rbp
retq
As you can see the assignment of the two member variables is completely 'optimized' to just assume that m_Size.x and m_Size.y are 1.0 - thus just copying the values of GraphicsUtil::WIDTH and HEIGHT.
I fixed this by changing the code to use "Size" instead of "m_Size" for those assignments, as well as making them volatile just in case. But I'm wondering if there is a legitimate compiler error here or I'm missing something?
Edit: It should be noted that m_Size is nearly never 1.0,1.0
Edit2: The correct assembly for the assignments, as generated on my machine (different arch though, not able to get the same arch as above right now)
str x8, [x0, #56]
lsr x9, x8, #32
fmov s0, w8
adrp x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #12]
fmov s0, w9
adrp x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #16]
After making a minimal test case I was able to confirm it's definitely a compiler bug.
Conditions: No other piece of code modifies m_Size, m_Size is initialized math::Vec2 m_Size{1.0, 1.0};. It works perfectly on every version of LLVM I could find before 10.0, seems some sort of regression occurred at that version.
Have submitted to Apple's LLVM team and llvm.org.
Thanks for comments.

What do these 2 lines of assembly code do?

I am in the middle of phase 2 for bomb lab and I can't seem to figure out how these two lines of assembly affect the code overall and how they play a role in the loop going on.
Here is the 2 lines of code:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
and here is the entire code:
Dump of assembler code for function phase_2:
0x08048ba4 <+0>: push %ebp
0x08048ba5 <+1>: mov %esp,%ebp
0x08048ba7 <+3>: push %ebx
0x08048ba8 <+4>: sub $0x34,%esp
0x08048bab <+7>: lea -0x20(%ebp),%eax
0x08048bae <+10>: mov %eax,0x4(%esp)
0x08048bb2 <+14>: mov 0x8(%ebp),%eax
0x08048bb5 <+17>: mov %eax,(%esp)
0x08048bb8 <+20>: call 0x804922f <read_six_numbers>
0x08048bbd <+25>: cmpl $0x0,-0x20(%ebp)
0x08048bc1 <+29>: jns 0x8048be3 <phase_2+63>
0x08048bc3 <+31>: call 0x80491ed <explode_bomb>
0x08048bc8 <+36>: jmp 0x8048be3 <phase_2+63>
0x08048bca <+38>: mov %ebx,%eax
0x08048bcc <+40>: add -0x24(%ebp,%ebx,4),%eax
0x08048bd0 <+44>: cmp %eax,-0x20(%ebp,%ebx,4)
0x08048bd4 <+48>: je 0x8048bdb <phase_2+55>
0x08048bd6 <+50>: call 0x80491ed <explode_bomb>
0x08048bdb <+55>: inc %ebx
0x08048bdc <+56>: cmp $0x6,%ebx
0x08048bdf <+59>: jne 0x8048bca <phase_2+38>
0x08048be1 <+61>: jmp 0x8048bea <phase_2+70>
0x08048be3 <+63>: mov $0x1,%ebx
0x08048be8 <+68>: jmp 0x8048bca <phase_2+38>
0x08048bea <+70>: add $0x34,%esp
0x08048bed <+73>: pop %ebx
0x08048bee <+74>: pop %ebp
0x08048bef <+75>: ret
I noticed the inc command that increments %ebx by 1 and using that as %eax in the loop. But the add and cmp trip me up every time. If I had %eax as 1 going into to the add and cmp what %eax comes out? Thanks! I also know that once %ebx gets to 5 then the loop is over and it ends the entire code.
You got a list of 6 numbers. This means you can compare at most 5 pairs of numbers. So the loop that uses %ebx does 5 iterations.
In each iteration the value at the lower address is added to the current loop count, and then compared with the value at the next higher address. As long as they match the bomb won't explode!
This loops 5 times:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
These numbers are used:
with %ebx=1 numbers are at -0x20(%ebp) and -0x1C(%ebp)
with %ebx=2 numbers are at -0x1C(%ebp) and -0x18(%ebp)
with %ebx=3 numbers are at -0x18(%ebp) and -0x14(%ebp)
with %ebx=4 numbers are at -0x14(%ebp) and -0x10(%ebp)
with %ebx=5 numbers are at -0x10(%ebp) and -0x0C(%ebp)
Those two instructions are dealing with memory at two locations, indexed by ebp and ebx. In particular, the add instruction is keeping a running total of all the numbers examined so far, and the comparison instruction is checking whether that is equal to the next number. So something like:
int total = 0;
for (i=0; ..., i++) {
total += array[i];
if (total != array[i+])
explode_bomb();
}

Displaying environment variables in assembly language

I am trying to understand how assembly works by making a basic program to display environement variables like
C code :
int main(int ac, char **av, char **env)
{
int x;
int y;
y = -1;
while (env[++y])
{
x = -1;
while (env[y][++x])
{
write(1, &(env[y][x]), 1);
}
}
return (0);
}
I compiled that with gcc -S (on cygwin64) to see how to do, and wrote it my own way (similar but not same), but it did not work...
$>gcc my_av.s && ./a.exe
HOMEPATH=\Users\hadrien▒2▒p
My assembly code :
.file "test.c"
.LC0:
.ascii "\n\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax
cltq
addq 32(%rbp), %rax
movq (%rax), %rax
movq %rax, %rdx
movl -4(%rbp), %eax
cltq
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
addq 32(%rbp), %rax
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef
Could someone explain me what is wrong with this code please ?
Also, while trying to solve the problem i tired to replace $0 by $97 in cmpq operation, thinking it would stop on 'a' character but it didn't... Why ?
You have a few issues. In this code (loop2) you have:
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
movq (%rax), %rax has read the next 8 characters in %rax. You are only interested in the first character. One way to achieve this is to compare the least significant byte in %rax with 0. You can use cmpb and use the %al register:
cmpb $0, %al
The biggest issue though is understanding that char **env is a pointer to array of char * .You first need to get the base pointer for the array, then that base pointer is indexed with y. The indexing looks something like basepointer + (y * 8) . You need to multiply y by 8 because each pointer is 8 bytes wide. The pointer at that location will be the char * for a particular environment string. Then you can index each character in the string array until you find a NUL (0) terminating character.
I've amended the code slightly and added comments on the few lines I changed:
.file "test.c"
.LC0:
.ascii "\x0a\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax /* get y index */
cltq
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rdx /* get pointer at envp+y*8
pointers are 8 bytes wide */
movl -4(%rbp), %eax /* get x */
cltq
leaq (%rdx, %rax), %rdx /* Get current character's address */
cmpb $0, (%rdx) /* Compare current byte to char 0
using cmpq will compare the next 8 bytes */
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rax /* get pointer at envp+y*8
pointers are 8 bytes wide */
cmpq $0, %rax /* Compare to NULL ptr */
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef

getting error "dyld_sim`dyld_fatal_error" after app starts

dyld_sim`dyld_fatal_error:
0x103e63000 <+0>: int3
-> 0x103e63001 <+1>: nop
My app is compile & build successfully but it ends with above error.
There're no other messages (error logs).
I set breakpoints in AppDelegate's didFinishLaunghingWithOptions method and in main.m also. But it never stops there.
My app's first view is always visible and error is coming only after it.
I couldn't find anything regarding this error – how can I solve it? Any specific suggestions.
I also tried this,
change frameworks type from Required to Optional.
But nothing works !!
And yes, I'm using CocoaPods.
Update:
My question isn't matched with any other questions, as both having contradict in titles.
Error which I'm getting - dyld_sim`dyld_fatal_error
Error in duplicate (suggestion) question - dyld`dyld_fatal_error:
Update 2:
Update 3:
Crash log
dyld_sim`dyldbootstrap::rebaseDyld:
0x10f95c002 <+0>: pushq %rbp
0x10f95c003 <+1>: movq %rsp, %rbp
0x10f95c006 <+4>: pushq %r15
0x10f95c008 <+6>: pushq %r14
0x10f95c00a <+8>: pushq %r13
0x10f95c00c <+10>: pushq %r12
0x10f95c00e <+12>: pushq %rbx
0x10f95c00f <+13>: subq $0x18, %rsp
0x10f95c013 <+17>: movq %rsi, %rbx
0x10f95c016 <+20>: movq %rdi, %r14
0x10f95c019 <+23>: movl 0x10(%r14), %r13d
0x10f95c01d <+27>: addq $0x20, %r14
0x10f95c021 <+31>: xorl %eax, %eax
0x10f95c023 <+33>: movq %rax, -0x30(%rbp)
0x10f95c027 <+37>: xorl %eax, %eax
0x10f95c029 <+39>: movq %rax, -0x38(%rbp)
0x10f95c02d <+43>: xorl %r12d, %r12d
0x10f95c030 <+46>: xorl %r15d, %r15d
-> 0x10f95c033 <+49>: movl (%r14), %eax
0x10f95c036 <+52>: cmpl $0xb, %eax
0x10f95c039 <+55>: jne 0x10f95c043 ; <+65>
0x10f95c03b <+57>: movq %r14, %r12
0x10f95c03e <+60>: jmp 0x10f95c0cc ; <+202>
0x10f95c043 <+65>: cmpl $0x19, %eax
0x10f95c046 <+68>: jne 0x10f95c0cc ; <+202>
0x10f95c04c <+74>: leaq 0x8(%r14), %rdi
0x10f95c050 <+78>: leaq 0x192c0(%rip), %rsi ; "__LINKEDIT"
0x10f95c057 <+85>: callq 0x10f9751a2 ; strcmp
0x10f95c05c <+90>: testl %eax, %eax
0x10f95c05e <+92>: movq -0x30(%rbp), %rax
0x10f95c062 <+96>: cmoveq %r14, %rax
0x10f95c066 <+100>: movq %rax, -0x30(%rbp)
0x10f95c06a <+104>: leaq 0x48(%r14), %rax
0x10f95c06e <+108>: movl 0x40(%r14), %ecx
0x10f95c072 <+112>: leaq (%rcx,%rcx,4), %rcx
0x10f95c076 <+116>: shlq $0x4, %rcx
0x10f95c07a <+120>: leaq 0x48(%r14,%rcx), %rcx
0x10f95c07f <+125>: jmp 0x10f95c085 ; <+131>
0x10f95c081 <+127>: addq $0x50, %rax
0x10f95c085 <+131>: cmpq %rcx, %rax
0x10f95c088 <+134>: jae 0x10f95c0b3 ; <+177>
0x10f95c08a <+136>: movzbl 0x40(%rax), %edx
0x10f95c08e <+140>: cmpl $0x6, %edx
0x10f95c091 <+143>: jne 0x10f95c081 ; <+127>
0x10f95c093 <+145>: movq 0x28(%rax), %rdx
0x10f95c097 <+149>: shrq $0x3, %rdx
0x10f95c09b <+153>: testl %edx, %edx
0x10f95c09d <+155>: je 0x10f95c081 ; <+127>
0x10f95c09f <+157>: movq 0x20(%rax), %rsi
0x10f95c0a3 <+161>: addq %rbx, %rsi
0x10f95c0a6 <+164>: addq %rbx, (%rsi)
0x10f95c0a9 <+167>: addq $0x8, %rsi
0x10f95c0ad <+171>: decl %edx
0x10f95c0af <+173>: jne 0x10f95c0a6 ; <+164>
0x10f95c0b1 <+175>: jmp 0x10f95c081 ; <+127>
0x10f95c0b3 <+177>: cmpq $0x0, -0x38(%rbp)
0x10f95c0b8 <+182>: jne 0x10f95c0cc ; <+202>
0x10f95c0ba <+184>: testb $0x2, 0x3c(%r14)
0x10f95c0bf <+189>: movl $0x0, %eax
0x10f95c0c4 <+194>: cmovneq %r14, %rax
0x10f95c0c8 <+198>: movq %rax, -0x38(%rbp)
0x10f95c0cc <+202>: movl 0x4(%r14), %eax
0x10f95c0d0 <+206>: addq %rax, %r14
0x10f95c0d3 <+209>: incl %r15d
0x10f95c0d6 <+212>: cmpl %r13d, %r15d
0x10f95c0d9 <+215>: jne 0x10f95c033 ; <+49>
0x10f95c0df <+221>: movl 0x48(%r12), %esi
0x10f95c0e4 <+226>: movl 0x4c(%r12), %edx
0x10f95c0e9 <+231>: testq %rdx, %rdx
0x10f95c0ec <+234>: je 0x10f95c13d ; <+315>
0x10f95c0ee <+236>: movq -0x38(%rbp), %rax
0x10f95c0f2 <+240>: movq 0x18(%rax), %rax
0x10f95c0f6 <+244>: addq %rbx, %rax
0x10f95c0f9 <+247>: movq -0x30(%rbp), %rcx
0x10f95c0fd <+251>: movq %rcx, %rdi
0x10f95c100 <+254>: movq 0x18(%rdi), %rcx
0x10f95c104 <+258>: addq %rbx, %rcx
0x10f95c107 <+261>: addq %rsi, %rcx
0x10f95c10a <+264>: subq 0x28(%rdi), %rcx
0x10f95c10e <+268>: leaq (%rcx,%rdx,8), %rdx
0x10f95c112 <+272>: movl 0x4(%rcx), %esi
0x10f95c115 <+275>: movl %esi, %edi
0x10f95c117 <+277>: andl $0x6000000, %edi
0x10f95c11d <+283>: cmpl $0x6000000, %edi
0x10f95c123 <+289>: jne 0x10f95c14c ; <+330>
0x10f95c125 <+291>: cmpl $0x10000000, %esi
0x10f95c12b <+297>: jae 0x10f95c15f ; <+349>
0x10f95c12d <+299>: movslq (%rcx), %rsi
0x10f95c130 <+302>: addq %rbx, (%rax,%rsi)
0x10f95c134 <+306>: addq $0x8, %rcx
0x10f95c138 <+310>: cmpq %rdx, %rcx
0x10f95c13b <+313>: jb 0x10f95c112 ; <+272>
0x10f95c13d <+315>: addq $0x18, %rsp
0x10f95c141 <+319>: popq %rbx
0x10f95c142 <+320>: popq %r12
0x10f95c144 <+322>: popq %r13
0x10f95c146 <+324>: popq %r14
0x10f95c148 <+326>: popq %r15
0x10f95c14a <+328>: popq %rbp
0x10f95c14b <+329>: retq
0x10f95c14c <+330>: movl $0x8, %edi
0x10f95c151 <+335>: callq 0x10f9710ea ; __cxa_allocate_exception
0x10f95c156 <+340>: leaq 0x191c5(%rip), %rcx ; "relocation in dyld has wrong size"
0x10f95c15d <+347>: jmp 0x10f95c170 ; <+366>
0x10f95c15f <+349>: movl $0x8, %edi
0x10f95c164 <+354>: callq 0x10f9710ea ; __cxa_allocate_exception
0x10f95c169 <+359>: leaq 0x191d4(%rip), %rcx ; "relocation in dyld has wrong type"
0x10f95c170 <+366>: movq %rcx, (%rax)
0x10f95c173 <+369>: leaq 0x24c56(%rip), %rcx ; typeinfo for char const*
0x10f95c17a <+376>: xorl %edx, %edx
0x10f95c17c <+378>: movq %rax, %rdi
0x10f95c17f <+381>: movq %rcx, %rsi
0x10f95c182 <+384>: callq 0x10f971354 ; __cxa_throw
I had this issue after deleting a bunch of things to try and fix another issue. I was able to fix it by reverting the following:
In Project>Build Settings>Runpath Search Paths, add the following (using the + icon, values are comma separated):
$(inherited), #executable_path/Frameworks, #loader_path/Frameworks
Unfortunately, SO text editor made me write it in a code block.

Memory transfer intel assembly AT&T

I have a problem moving a string bytewise from one memory adress to another. Been at this for hours and tried some different strategies. Im new to Intel assemby so I need some tips and insight to help me solve the problem.
The getText routine is supposed to transfer n (found in %rsi) bytes from ibuf to the adress in %rdi. counterI is the offset used to indicate where to start the transfer, and after the routine is over it should point to the next byte that wasn't transfered. If there isn't n bytes it should cancel the transfer and return the actual number of bytes transfered in %rax.
getText:
movq $ibuf, %r10
#in rsi is the number of bytes to be transfered
#rdi contains the memory adress for the memory space to transfer to
movq $0, %r8 #start with offset 0
movq $0, %rax #zero return register
movq (counterI), %r11
cmpb $0, (%r10, %r11, 1) #check if ibuf+counterI=NULL
jne MOVE #if so call and read to ibuf
call inImage
MOVE:
cmpq $0,%rsi #if number of bytes to read is 0
je EXIT #exit
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
incq counterI #increase pointer offset
decq %rsi #dec number of bytes to read
incq %r8 #inc offset in write buffert
movq %r8, %rax #returns number of bytes wrote to buf
movq (counterI), %r9
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
movb $0, (%rdi, %r8, 1) #move NULL to buf+%r8?
ret
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
The second instruction makes the first useless but given the remark I understand you will remove it. Better still, you can remove both if you would change every occurence of %R9 into %R11.
movzbq (%r10, %r9, 1), %r10 #loads one byte+zeroes to rdi from ibuf
movq %r10, (%rdi, %r8, 1) #HERE IS THE PROBLEM I THINK
Here is a dangerous construct. You're first using %R10 as an address but then drop a zero extended data byte in it. Later in the code you will again use %R10 as an address but sadly that won't be in there! The solution is to move into a different register and to not bother about the zero extention.
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
The following code can be shortened
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
as
cmpb $0, (%r10, %r9, 1) #check if ibuf+offset is NULL
jne MOVE
EXIT:

Resources