Why LLVM does not remember an integer range when the variable has been constrained by an if statement? - clang

There is a problem that has become quite popular these last days regarding the LLVM/GCC inability to optimize a trivial loop when the range is quite obvious.
Godbolt for all the examples below: https://godbolt.org/z/b3PzrsE5e
The code below will not optimize and the generated assembly will produce a loop.
uint64_t sum1( uint64_t num ) {
uint64_t sum = 0;
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
return sum;
}
produces
sum1(unsigned long): # #sum1(unsigned long)
xorl %eax, %eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
addq $1, %rax
cmpq %rdi, %rax
jbe .LBB0_1
retq
However if one adds a limiter to the range of the variable, like an AND mask, the loop is able to optimize. You can also easily make it optimize if you change the condition j<=num to j<num+1.
uint64_t sum2( uint64_t num ) {
num &= 0xFFFFFFFFULL;
uint64_t sum = 0;
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
return sum;
}
produces
sum2(unsigned long): # #sum2(unsigned long)
movl %edi, %eax
addq $1, %rax
retq
while curbing the range with an if statement does not have any effect
uint64_t sum3( uint64_t num ) {
uint64_t sum = 0;
if ( num <= 0xFF ) {
for ( uint64_t j=0; j<=num; ++j ) {
sum += 1;
}
}
return sum;
}
Produces assembly code again with a loop.
sum3(unsigned long): # #sum3(unsigned long)
xorl %eax, %eax
cmpq $255, %rdi
ja .LBB3_2
.LBB3_1: # =>This Inner Loop Header: Depth=1
addq $1, %rax
cmpq %rdi, %rax
jbe .LBB3_1
.LBB3_2:
retq
For that sake, even __builtin_assume( num < 0x100ULL ) has no effect on the result.
I have looked into the LLVM code and traced this to the failed statement at
// lib/Transforms/Scalar/IndVarSimplify.cpp:1430
const SCEV *MaxExitCount = SE->getSymbolicMaxBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(MaxExitCount)) {
printf( "Could not compute\n");
return false;
}
...
which then ends up in
// lib/Analysis/ScalarEvolution.cpp:7253
const SCEV *ScalarEvolution::getExitCount(const Loop *L,
const BasicBlock *ExitingBlock,
ExitCountKind Kind) {
switch (Kind) {
case Exact:
case SymbolicMaximum:
return getBackedgeTakenInfo(L).getExact(ExitingBlock, this);
case ConstantMaximum:
return getBackedgeTakenInfo(L).getConstantMax(ExitingBlock, this);
};
llvm_unreachable("Invalid ExitCountKind!");
}
What I don't understand is why the boundary cannot be inferred if the if statement makes it clear? Is this a feature that could be implemented? Am I in the right track?

Related

Julia massively outperforms Delphi. Obsolete asm code by Delphi compiler?

I wrote a simple for loop in Delphi.
The same program is 7.6 times faster in Julia 1.6.
procedure TfrmTester.btnForLoopClick(Sender: TObject);
VAR
i, Total, Big, Small: Integer;
s: string;
begin
TimerStart;
Total:= 0;
Big := 0;
Small:= 0;
for i:= 1 to 1000000000 DO //1 billion
begin
Total:= Total+1;
if Total > 500000
then Big:= Big+1
else Small:= Small+1;
end;
s:= TimerElapsedS;
//here code to show Big/Small on the screen
end;
The ASM code seems decent to me:
TesterForm.pas.111: TimerStart;
007BB91D E8DE7CF9FF call TimerStart
TesterForm.pas.113: Total:= 0;
007BB922 33C0 xor eax,eax
007BB924 8945F4 mov [ebp-$0c],eax
TesterForm.pas.114: Big := 0;
007BB927 33C0 xor eax,eax
007BB929 8945F0 mov [ebp-$10],eax
TesterForm.pas.115: Small:= 0;
007BB92C 33C0 xor eax,eax
007BB92E 8945EC mov [ebp-$14],eax
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB931 C745F801000000 mov [ebp-$08],$00000001
TesterForm.pas.118: Total:= Total+1;
007BB938 FF45F4 inc dword ptr [ebp-$0c]
TesterForm.pas.119: if Total > 500000
007BB93B 817DF420A10700 cmp [ebp-$0c],$0007a120
007BB942 7E05 jle $007bb949
TesterForm.pas.120: then Big:= Big+1
007BB944 FF45F0 inc dword ptr [ebp-$10]
007BB947 EB03 jmp $007bb94c
TesterForm.pas.121: else Small:= Small+1;
007BB949 FF45EC inc dword ptr [ebp-$14]
TesterForm.pas.122: end;
007BB94C FF45F8 inc dword ptr [ebp-$08]
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB94F 817DF801CA9A3B cmp [ebp-$08],$3b9aca01
007BB956 75E0 jnz $007bb938
TesterForm.pas.124: s:= TimerElapsedS;
007BB958 8D45E8 lea eax,[ebp-$18]
How can it be that Delphi has such a pathetic score compared with Julia?
Can I do anything to improve the code generated by the compiler?
Info
My Delphi 10.4.2 program is Win32 bit. Of course, I run in "Release" mode :)
But the ASM code above is for the "Debug" version because I don't know how to pause the execution of the program when I run an optimized EXE file. But the difference between a Release and a Debug exe is pretty small (1.8 vs 1.5 sec). Julia does it in 195ms.
More discussions
I do have to mention that when you run the code in Julia for the first time, its time is ridiculous high, because Julia is JIT, so it has to compile the code first. The compilation time (since it is "one-time") was not included in the measurement.
Also, as AmigoJack commented, Delphi code will run pretty much everywhere, while Julia code will probably only run in computers that have a modern CPU to support all those new/fancy instructions. I do have small tools that I produced back in 2004 and still run today.
Whatever code Julia produces cannot be delivered to "customers" unless that have Julia installed.
Anyway, all these being said, it is sad that that Delphi compiler is so outdated.
I ran other tests, finding the shortest and longest string in a list of strings is 10x faster in Delphi than Julia. Allocating small blocks of memory (10000x10000x4 bytes) has the same speed.
As AhnLab mentioned, I run pretty "dry" tests. I guess a full program that performs more complex/realistic tasks needs to be written and see at the end of the program if Julia still outperforms Delphi 7x.
Update
Ok, the Julia code seems totally alien to me. Seems to use more modern ops:
; ┌ # Julia_vs_Delphi.jl:4 within `for_fun`
pushq %rbp
movq %rsp, %rbp
subq $96, %rsp
vmovdqa %xmm11, -16(%rbp)
vmovdqa %xmm10, -32(%rbp)
vmovdqa %xmm9, -48(%rbp)
vmovdqa %xmm8, -64(%rbp)
vmovdqa %xmm7, -80(%rbp)
vmovdqa %xmm6, -96(%rbp)
movq %rcx, %rax
; │ # Julia_vs_Delphi.jl:8 within `for_fun`
; │┌ # range.jl:5 within `Colon`
; ││┌ # range.jl:354 within `UnitRange`
; │││┌ # range.jl:359 within `unitrange_last`
testq %rdx, %rdx
; │└└└
jle L80
; │ # Julia_vs_Delphi.jl within `for_fun`
movq %rdx, %rcx
sarq $63, %rcx
andnq %rdx, %rcx, %r9
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
cmpq $8, %r9
jae L93
; │ # Julia_vs_Delphi.jl within `for_fun`
movl $1, %r10d
xorl %edx, %edx
xorl %r11d, %r11d
jmp L346
L80:
xorl %edx, %edx
xorl %r11d, %r11d
xorl %r9d, %r9d
jmp L386
L93: movabsq $9223372036854775800, %r8 # imm = 0x7FFFFFFFFFFFFFF8
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
andq %r9, %r8
leaq 1(%r8), %r10
movabsq $.rodata.cst32, %rcx
vmovdqa (%rcx), %ymm1
vpxor %xmm0, %xmm0, %xmm0
movabsq $.rodata.cst8, %rcx
vpbroadcastq (%rcx), %ymm2
movabsq $1023787240, %rcx # imm = 0x3D05C0E8
vpbroadcastq (%rcx), %ymm3
movabsq $1023787248, %rcx # imm = 0x3D05C0F0
vpbroadcastq (%rcx), %ymm5
vpcmpeqd %ymm6, %ymm6, %ymm6
movabsq $1023787256, %rcx # imm = 0x3D05C0F8
vpbroadcastq (%rcx), %ymm7
movq %r8, %rcx
vpxor %xmm4, %xmm4, %xmm4
vpxor %xmm8, %xmm8, %xmm8
vpxor %xmm9, %xmm9, %xmm9
nopw %cs:(%rax,%rax)
; │ # Julia_vs_Delphi.jl within `for_fun`
L224:
vpaddq %ymm2, %ymm1, %ymm10
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
vpxor %ymm3, %ymm1, %ymm11
vpcmpgtq %ymm11, %ymm5, %ymm11
vpxor %ymm3, %ymm10, %ymm10
vpcmpgtq %ymm10, %ymm5, %ymm10
vpsubq %ymm11, %ymm0, %ymm0
vpsubq %ymm10, %ymm4, %ymm4
vpaddq %ymm11, %ymm8, %ymm8
vpsubq %ymm6, %ymm8, %ymm8
vpaddq %ymm10, %ymm9, %ymm9
vpsubq %ymm6, %ymm9, %ymm9
vpaddq %ymm7, %ymm1, %ymm1
addq $-8, %rcx
jne L224
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
vpaddq %ymm8, %ymm9, %ymm1
vextracti128 $1, %ymm1, %xmm2
vpaddq %xmm2, %xmm1, %xmm1
vpshufd $238, %xmm1, %xmm2 # xmm2 = xmm1[2,3,2,3]
vpaddq %xmm2, %xmm1, %xmm1
vmovq %xmm1, %r11
vpaddq %ymm0, %ymm4, %ymm0
vextracti128 $1, %ymm0, %xmm1
vpaddq %xmm1, %xmm0, %xmm0
vpshufd $238, %xmm0, %xmm1 # xmm1 = xmm0[2,3,2,3]
vpaddq %xmm1, %xmm0, %xmm0
vmovq %xmm0, %rdx
cmpq %r8, %r9
je L386
L346:
leaq 1(%r9), %r8
nop
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
; │┌ # operators.jl:378 within `>`
; ││┌ # int.jl:83 within `<`
L352:
xorl %ecx, %ecx
cmpq $500000, %r10 # imm = 0x7A120
seta %cl
cmpq $500001, %r10 # imm = 0x7A121
; │└└
adcq $0, %rdx
addq %rcx, %r11
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
; │┌ # range.jl:837 within `iterate`
incq %r10
; ││┌ # promotion.jl:468 within `==`
cmpq %r10, %r8
; │└└
jne L352
; │ # Julia_vs_Delphi.jl:17 within `for_fun`
L386:
movq %r9, (%rax)
movq %rdx, 8(%rax)
movq %r11, 16(%rax)
vmovaps -96(%rbp), %xmm6
vmovaps -80(%rbp), %xmm7
vmovaps -64(%rbp), %xmm8
vmovaps -48(%rbp), %xmm9
vmovaps -32(%rbp), %xmm10
vmovaps -16(%rbp), %xmm11
addq $96, %rsp
popq %rbp
vzeroupper
retq
nopw %cs:(%rax,%rax)
Let's start by noting that there is no reason for an optimizing compiler to actually perform the loop, at present Delphi and Julia output similar assembler that actually run through the loop but the compilers could in the future just skip the loop and assign the values. Microbenchmarks are tricky.
The difference seems to be that Julia makes use of SIMD instructions which makes perfect sense for such loop (~8x speedup makes perfect sense depending on your CPU).
You could have a look at this blog post for thoughts on SIMD in Delphi.
Although this is not the main point of the answer, I'll expand a bit on the possibility to remove the loop altogether. I don't know for sure what the Delphi specification says but in many compiled languages, including Julia ("just-ahead-of-time"), the compiler could simply figure out the state of the variables after the loop and replace the loop with that state. Have a look at the following C++ code (compiler explorer):
#include <cstdio>
void loop() {
long total = 0, big = 0, small = 0;
for (long i = 0; i < 100; ++i) {
total++;
if (total > 50) {
big++;
} else {
small++;
}
}
std::printf("%ld %ld %ld", total, big, small);
}
this is the assembler clang trunk outputs:
loop(): # #loop()
lea rdi, [rip + .L.str]
mov esi, 100
mov edx, 50
mov ecx, 50
xor eax, eax
jmp printf#PLT # TAILCALL
.L.str:
.asciz "%ld %ld %ld"
as you can see, no loop, just the result. For longer loops clang stops doing this optimization but that's just a limitation of the compiler, other compilers could do it differently and I'm sure there is a heavily optimizing compiler out there that handles much more complex situations.

Why app crashes if year is less than 1948 or more than 2086?

My Xcode app crashes if the user inputs a year below 1948 or for example more than the year 2086 and was wondering if anyone had this issue before and how to resolve?
The user selects their date of birth and the app displays data such as the number of seconds between today's date and their date of birth.
I am new to Xcode (Xcode 8) so still learning. Thanks in advance
#IBAction func clickScreenBtn(_ sender: Any) {
let birthDay = UserDefaults.standard.object(forKey: "birthday") as! Date
var component = NSCalendar.current.dateComponents([.day], from: birthDay, to: Date())
type = type + 1
switch type {
case 0:
unitLabel.text = "Days"
timeLabel.text = ""
timeLabel.text = timeLabel.text?.appendingFormat("%d", component.day!)
break
case 4:
type = 0
unitLabel.text = "Days"
timeLabel.text = ""
timeLabel.text = timeLabel.text?.appendingFormat("%d", component.day!)
break
case 1:
component = NSCalendar.current.dateComponents([.hour], from: birthDay, to: Date())
unitLabel.text = "Hours"
timeLabel.text = ""
timeLabel.text = timeLabel.text?.appendingFormat("%d", component.hour!)
break
case 2:
component = NSCalendar.current.dateComponents([.minute], from: birthDay, to: Date())
unitLabel.text = "Minutes"
timeLabel.text = ""
timeLabel.text = timeLabel.text?.appendingFormat("%d", component.minute!)
break
case 3:
component = NSCalendar.current.dateComponents([.second], from: birthDay, to: Date())
unitLabel.text = "Seconds"
timeLabel.text = ""
timeLabel.text = timeLabel.text?.appendingFormat("%d", component.second!)
break
default:
break
}
}
Error:
0x108142301 <+9425>: callq 0x1082a95c4 ; symbol stub for: Foundation.Calendar.dateComponents (Swift.Set<Foundation.Calendar.Component>, from : Foundation.Date, to : Foundation.Date) -> Foundation.DateComponents
0x108142306 <+9430>: movq -0x710(%rbp), %rdi
0x10814230d <+9437>: movq %rax, -0x730(%rbp)
0x108142314 <+9444>: callq 0x10813d8a0 ; swift_rt_swift_release
0x108142319 <+9449>: movq -0x730(%rbp), %rax
0x108142320 <+9456>: movq %rax, -0x8(%rbp)
0x108142324 <+9460>: movq -0x218(%rbp), %rdi
0x10814232b <+9467>: callq 0x10813d8a0 ; swift_rt_swift_release
0x108142330 <+9472>: movq 0x1e3081(%rip), %rax ; (void *)0x000000010e87f068: swift_isaMask
0x108142337 <+9479>: movq -0x168(%rbp), %rdx
0x10814233e <+9486>: movq (%rdx), %rdi
0x108142341 <+9489>: andq (%rax), %rdi
0x108142344 <+9492>: movq %rdi, -0x738(%rbp)
0x10814234b <+9499>: movq %rdx, %rdi
0x10814234e <+9502>: movq -0x738(%rbp), %rax
0x108142355 <+9509>: callq *0x80(%rax)
0x10814235b <+9515>: movq %rax, -0x70(%rbp)
0x10814235f <+9519>: movq -0x230(%rbp), %rax
0x108142366 <+9526>: movsd -0x1d8(%rbp), %xmm0 ; xmm0 = mem[0],zero
0x10814236e <+9534>: movd %xmm0, %rax
0x108142373 <+9539>: cmpq $0x0, -0x70(%rbp)
0x108142378 <+9544>: jne 0x108142409 ; <+9689> at MainViewController.swift
0x10814237e <+9550>: movq -0x230(%rbp), %rax
0x108142385 <+9557>: movsd -0x1d8(%rbp), %xmm0 ; xmm0 = mem[0],zero
0x10814238d <+9565>: movd %xmm0, %rax
0x108142392 <+9570>: movq -0x230(%rbp), %rax
0x108142399 <+9577>: movsd -0x1d8(%rbp), %xmm0 ; xmm0 = mem[0],zero
0x1081423a1 <+9585>: movd %xmm0, %rax
0x1081423a6 <+9590>: movq -0x230(%rbp), %rax
0x1081423ad <+9597>: movsd -0x1d8(%rbp), %xmm0 ; xmm0 = mem[0],zero
0x1081423b5 <+9605>: movd %xmm0, %rax
0x1081423ba <+9610>: leaq 0x1b13f9(%rip), %rdi ; "fatal error"
0x1081423c1 <+9617>: movl $0xb, %eax
0x1081423c6 <+9622>: movl %eax, %esi
0x1081423c8 <+9624>: movl $0x2, %eax
0x1081423cd <+9629>: leaq 0x1b13ac(%rip), %rcx ; "unexpectedly found nil while unwrapping an Optional value"
0x1081423d4 <+9636>: movl $0x39, %edx
0x1081423d9 <+9641>: movl %edx, %r8d
0x1081423dc <+9644>: xorl %edx, %edx
0x1081423de <+9646>: movl %edx, -0x73c(%rbp)
0x1081423e4 <+9652>: movl %eax, %edx
0x1081423e6 <+9654>: movl %eax, %r9d
0x1081423e9 <+9657>: movl $0x0, (%rsp)
0x1081423f0 <+9664>: callq 0x1082a95f4 ; symbol stub for: function signature specialization <preserving fragile attribute, Arg[2] = Dead, Arg[3] = Dead> of Swift._fatalErrorMessage (Swift.StaticString, Swift.StaticString, file : Swift.StaticString, line : Swift.UInt, flags : Swift.UInt32) -> Swift.Never
0x1081423f5 <+9669>: movq -0x230(%rbp), %rcx
0x1081423fc <+9676>: movsd -0x1d8(%rbp), %xmm0 ; xmm0 = mem[0],zero
0x108142404 <+9684>: movd %xmm0, %rcx
0x108142409 <+9689>: leaq -0x70(%rbp), %rax
0x10814240d <+9693>: movq -0x70(%rbp), %rcx
0x108142411 <+9697>: movq %rcx, %rdi
0x108142414 <+9700>: movq %rax, -0x748(%rbp)
0x10814241b <+9707>: movq %rcx, -0x750(%rbp)
0x108142422 <+9714>: callq 0x1082a9858 ; symbol stub for: objc_retain
0x108142427 <+9719>: leaq 0x1b1398(%rip), %rdi ; "Seconds"
If you get rid of all the !s and replace them with some actual error checking, you should be able to find out where the error is occurring rather quickly. For example, this version will log whatever is going wrong to the console:
#IBAction func clickScreenBtn(_ sender: Any) {
guard let birthDay = UserDefaults.standard.object(forKey: "birthday") as? Date else {
print("Couldn't get birthday")
return
}
let currentDate = Date()
guard let day = Calendar.current.dateComponents([.day], from: birthDay, to: currentDate).day else {
print("Couldn't get day")
return
}
type = type + 1
switch type {
case 0:
unitLabel.text = "Days"
timeLabel.text = "\(day)"
case 4:
type = 0
unitLabel.text = "Days"
timeLabel.text = "\(day)"
case 1:
guard let hour = Calendar.current.dateComponents([.hour], from: birthDay, to: currentDate).hour else {
print("Couldn't get hour")
return
}
unitLabel.text = "Hours"
timeLabel.text = "\(hour)"
case 2:
guard let minute = Calendar.current.dateComponents([.minute], from: birthDay, to: currentDate).minute else {
print("Couldn't get minute")
return
}
unitLabel.text = "Minutes"
timeLabel.text = "\(minute)"
case 3:
guard let second = Calendar.current.dateComponents([.second], from: birthDay, to: currentDate).second else {
print("Couldn't get second")
return
}
unitLabel.text = "Seconds"
timeLabel.text = "\(second)"
default:
break
}
}
Once you know where the issue is actually occurring, it should be much easier to fix it.

Displaying environment variables in assembly language

I am trying to understand how assembly works by making a basic program to display environement variables like
C code :
int main(int ac, char **av, char **env)
{
int x;
int y;
y = -1;
while (env[++y])
{
x = -1;
while (env[y][++x])
{
write(1, &(env[y][x]), 1);
}
}
return (0);
}
I compiled that with gcc -S (on cygwin64) to see how to do, and wrote it my own way (similar but not same), but it did not work...
$>gcc my_av.s && ./a.exe
HOMEPATH=\Users\hadrien▒2▒p
My assembly code :
.file "test.c"
.LC0:
.ascii "\n\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax
cltq
addq 32(%rbp), %rax
movq (%rax), %rax
movq %rax, %rdx
movl -4(%rbp), %eax
cltq
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
addq 32(%rbp), %rax
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef
Could someone explain me what is wrong with this code please ?
Also, while trying to solve the problem i tired to replace $0 by $97 in cmpq operation, thinking it would stop on 'a' character but it didn't... Why ?
You have a few issues. In this code (loop2) you have:
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
movq (%rax), %rax has read the next 8 characters in %rax. You are only interested in the first character. One way to achieve this is to compare the least significant byte in %rax with 0. You can use cmpb and use the %al register:
cmpb $0, %al
The biggest issue though is understanding that char **env is a pointer to array of char * .You first need to get the base pointer for the array, then that base pointer is indexed with y. The indexing looks something like basepointer + (y * 8) . You need to multiply y by 8 because each pointer is 8 bytes wide. The pointer at that location will be the char * for a particular environment string. Then you can index each character in the string array until you find a NUL (0) terminating character.
I've amended the code slightly and added comments on the few lines I changed:
.file "test.c"
.LC0:
.ascii "\x0a\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax /* get y index */
cltq
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rdx /* get pointer at envp+y*8
pointers are 8 bytes wide */
movl -4(%rbp), %eax /* get x */
cltq
leaq (%rdx, %rax), %rdx /* Get current character's address */
cmpb $0, (%rdx) /* Compare current byte to char 0
using cmpq will compare the next 8 bytes */
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rax /* get pointer at envp+y*8
pointers are 8 bytes wide */
cmpq $0, %rax /* Compare to NULL ptr */
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef

Memory transfer intel assembly AT&T

I have a problem moving a string bytewise from one memory adress to another. Been at this for hours and tried some different strategies. Im new to Intel assemby so I need some tips and insight to help me solve the problem.
The getText routine is supposed to transfer n (found in %rsi) bytes from ibuf to the adress in %rdi. counterI is the offset used to indicate where to start the transfer, and after the routine is over it should point to the next byte that wasn't transfered. If there isn't n bytes it should cancel the transfer and return the actual number of bytes transfered in %rax.
getText:
movq $ibuf, %r10
#in rsi is the number of bytes to be transfered
#rdi contains the memory adress for the memory space to transfer to
movq $0, %r8 #start with offset 0
movq $0, %rax #zero return register
movq (counterI), %r11
cmpb $0, (%r10, %r11, 1) #check if ibuf+counterI=NULL
jne MOVE #if so call and read to ibuf
call inImage
MOVE:
cmpq $0,%rsi #if number of bytes to read is 0
je EXIT #exit
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
incq counterI #increase pointer offset
decq %rsi #dec number of bytes to read
incq %r8 #inc offset in write buffert
movq %r8, %rax #returns number of bytes wrote to buf
movq (counterI), %r9
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
movb $0, (%rdi, %r8, 1) #move NULL to buf+%r8?
ret
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
The second instruction makes the first useless but given the remark I understand you will remove it. Better still, you can remove both if you would change every occurence of %R9 into %R11.
movzbq (%r10, %r9, 1), %r10 #loads one byte+zeroes to rdi from ibuf
movq %r10, (%rdi, %r8, 1) #HERE IS THE PROBLEM I THINK
Here is a dangerous construct. You're first using %R10 as an address but then drop a zero extended data byte in it. Later in the code you will again use %R10 as an address but sadly that won't be in there! The solution is to move into a different register and to not bother about the zero extention.
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
The following code can be shortened
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
as
cmpb $0, (%r10, %r9, 1) #check if ibuf+offset is NULL
jne MOVE
EXIT:

Calling a function crashes when the stack pointer is changed with inline assembly

I have written some code that changes the current stack used by modifying the stack pointer in inline assembly. Although I can call functions and create local variables, calls to println! and some functions from std::rt result in the application terminating abnormally with signal 4 (illegal instruction) in the playpen. How should I improve the code to prevent crashes?
#![feature(asm, box_syntax)]
#[allow(unused_assignments)]
#[inline(always)]
unsafe fn get_sp() -> usize {
let mut result = 0usize;
asm!("
movq %rsp, $0
"
:"=r"(result):::"volatile"
);
result
}
#[inline(always)]
unsafe fn set_sp(value: usize) {
asm!("
movq $0, %rsp
"
::"r"(value)::"volatile"
);
}
#[inline(never)]
unsafe fn foo() {
println!("Hello World!");
}
fn main() {
unsafe {
let mut stack = box [0usize; 500];
let len = stack.len();
stack[len-1] = get_sp();
set_sp(std::mem::transmute(stack.as_ptr().offset((len as isize)-1)));
foo();
asm!("
movq (%rsp), %rsp
"
::::"volatile"
);
}
}
Debugging the program with rust-lldb on x86_64 on OS X yields 300K stack traces, repeating these lines over and over:
frame #299995: 0x00000001000063c4 a`rt::util::report_overflow::he556d9d2b8eebb88VbI + 36
frame #299996: 0x0000000100006395 a`rust_stack_exhausted + 37
frame #299997: 0x000000010000157f a`__morestack + 13
morestack is assembly for each platform, like i386 and x86_64 — the i386 variant has more description that I think you will want to read carefully. This piece stuck out to me:
Each Rust function contains an LLVM-generated prologue that compares the stack space required for the current function to the space remaining in the current stack segment, maintained in a platform-specific TLS slot.
Here's the first instructions of the foo method:
a`foo::h5f80496ac1ee3d43zaa:
0x1000013e0: cmpq %gs:0x330, %rsp
0x1000013e9: ja 0x100001405 ; foo::h5f80496ac1ee3d43zaa + 37
0x1000013eb: movabsq $0x48, %r10
0x1000013f5: movabsq $0x0, %r11
-> 0x1000013ff: callq 0x100001572 ; __morestack
As you can see, I am about to call into __morestack, so the comparison check failed.
I believe that this indicates that you cannot manipulate the stack pointer and attempt to call any Rust functions.
As a side note, let's look at your get_sp assembly:
movq %rsp, $0
Doing a check check for the semantics of movq:
Copies a quadword from the source operand (second operand) to the destination operand (first operand).
That seems to indicate that your assembly is backwards, in addition to all the other problems.

Resources