Memory transfer intel assembly AT&T - memory

I have a problem moving a string bytewise from one memory adress to another. Been at this for hours and tried some different strategies. Im new to Intel assemby so I need some tips and insight to help me solve the problem.
The getText routine is supposed to transfer n (found in %rsi) bytes from ibuf to the adress in %rdi. counterI is the offset used to indicate where to start the transfer, and after the routine is over it should point to the next byte that wasn't transfered. If there isn't n bytes it should cancel the transfer and return the actual number of bytes transfered in %rax.
getText:
movq $ibuf, %r10
#in rsi is the number of bytes to be transfered
#rdi contains the memory adress for the memory space to transfer to
movq $0, %r8 #start with offset 0
movq $0, %rax #zero return register
movq (counterI), %r11
cmpb $0, (%r10, %r11, 1) #check if ibuf+counterI=NULL
jne MOVE #if so call and read to ibuf
call inImage
MOVE:
cmpq $0,%rsi #if number of bytes to read is 0
je EXIT #exit
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
incq counterI #increase pointer offset
decq %rsi #dec number of bytes to read
incq %r8 #inc offset in write buffert
movq %r8, %rax #returns number of bytes wrote to buf
movq (counterI), %r9
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
movb $0, (%rdi, %r8, 1) #move NULL to buf+%r8?
ret

movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
The second instruction makes the first useless but given the remark I understand you will remove it. Better still, you can remove both if you would change every occurence of %R9 into %R11.
movzbq (%r10, %r9, 1), %r10 #loads one byte+zeroes to rdi from ibuf
movq %r10, (%rdi, %r8, 1) #HERE IS THE PROBLEM I THINK
Here is a dangerous construct. You're first using %R10 as an address but then drop a zero extended data byte in it. Later in the code you will again use %R10 as an address but sadly that won't be in there! The solution is to move into a different register and to not bother about the zero extention.
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
The following code can be shortened
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
as
cmpb $0, (%r10, %r9, 1) #check if ibuf+offset is NULL
jne MOVE
EXIT:

Related

Julia massively outperforms Delphi. Obsolete asm code by Delphi compiler?

I wrote a simple for loop in Delphi.
The same program is 7.6 times faster in Julia 1.6.
procedure TfrmTester.btnForLoopClick(Sender: TObject);
VAR
i, Total, Big, Small: Integer;
s: string;
begin
TimerStart;
Total:= 0;
Big := 0;
Small:= 0;
for i:= 1 to 1000000000 DO //1 billion
begin
Total:= Total+1;
if Total > 500000
then Big:= Big+1
else Small:= Small+1;
end;
s:= TimerElapsedS;
//here code to show Big/Small on the screen
end;
The ASM code seems decent to me:
TesterForm.pas.111: TimerStart;
007BB91D E8DE7CF9FF call TimerStart
TesterForm.pas.113: Total:= 0;
007BB922 33C0 xor eax,eax
007BB924 8945F4 mov [ebp-$0c],eax
TesterForm.pas.114: Big := 0;
007BB927 33C0 xor eax,eax
007BB929 8945F0 mov [ebp-$10],eax
TesterForm.pas.115: Small:= 0;
007BB92C 33C0 xor eax,eax
007BB92E 8945EC mov [ebp-$14],eax
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB931 C745F801000000 mov [ebp-$08],$00000001
TesterForm.pas.118: Total:= Total+1;
007BB938 FF45F4 inc dword ptr [ebp-$0c]
TesterForm.pas.119: if Total > 500000
007BB93B 817DF420A10700 cmp [ebp-$0c],$0007a120
007BB942 7E05 jle $007bb949
TesterForm.pas.120: then Big:= Big+1
007BB944 FF45F0 inc dword ptr [ebp-$10]
007BB947 EB03 jmp $007bb94c
TesterForm.pas.121: else Small:= Small+1;
007BB949 FF45EC inc dword ptr [ebp-$14]
TesterForm.pas.122: end;
007BB94C FF45F8 inc dword ptr [ebp-$08]
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB94F 817DF801CA9A3B cmp [ebp-$08],$3b9aca01
007BB956 75E0 jnz $007bb938
TesterForm.pas.124: s:= TimerElapsedS;
007BB958 8D45E8 lea eax,[ebp-$18]
How can it be that Delphi has such a pathetic score compared with Julia?
Can I do anything to improve the code generated by the compiler?
Info
My Delphi 10.4.2 program is Win32 bit. Of course, I run in "Release" mode :)
But the ASM code above is for the "Debug" version because I don't know how to pause the execution of the program when I run an optimized EXE file. But the difference between a Release and a Debug exe is pretty small (1.8 vs 1.5 sec). Julia does it in 195ms.
More discussions
I do have to mention that when you run the code in Julia for the first time, its time is ridiculous high, because Julia is JIT, so it has to compile the code first. The compilation time (since it is "one-time") was not included in the measurement.
Also, as AmigoJack commented, Delphi code will run pretty much everywhere, while Julia code will probably only run in computers that have a modern CPU to support all those new/fancy instructions. I do have small tools that I produced back in 2004 and still run today.
Whatever code Julia produces cannot be delivered to "customers" unless that have Julia installed.
Anyway, all these being said, it is sad that that Delphi compiler is so outdated.
I ran other tests, finding the shortest and longest string in a list of strings is 10x faster in Delphi than Julia. Allocating small blocks of memory (10000x10000x4 bytes) has the same speed.
As AhnLab mentioned, I run pretty "dry" tests. I guess a full program that performs more complex/realistic tasks needs to be written and see at the end of the program if Julia still outperforms Delphi 7x.
Update
Ok, the Julia code seems totally alien to me. Seems to use more modern ops:
; ┌ # Julia_vs_Delphi.jl:4 within `for_fun`
pushq %rbp
movq %rsp, %rbp
subq $96, %rsp
vmovdqa %xmm11, -16(%rbp)
vmovdqa %xmm10, -32(%rbp)
vmovdqa %xmm9, -48(%rbp)
vmovdqa %xmm8, -64(%rbp)
vmovdqa %xmm7, -80(%rbp)
vmovdqa %xmm6, -96(%rbp)
movq %rcx, %rax
; │ # Julia_vs_Delphi.jl:8 within `for_fun`
; │┌ # range.jl:5 within `Colon`
; ││┌ # range.jl:354 within `UnitRange`
; │││┌ # range.jl:359 within `unitrange_last`
testq %rdx, %rdx
; │└└└
jle L80
; │ # Julia_vs_Delphi.jl within `for_fun`
movq %rdx, %rcx
sarq $63, %rcx
andnq %rdx, %rcx, %r9
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
cmpq $8, %r9
jae L93
; │ # Julia_vs_Delphi.jl within `for_fun`
movl $1, %r10d
xorl %edx, %edx
xorl %r11d, %r11d
jmp L346
L80:
xorl %edx, %edx
xorl %r11d, %r11d
xorl %r9d, %r9d
jmp L386
L93: movabsq $9223372036854775800, %r8 # imm = 0x7FFFFFFFFFFFFFF8
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
andq %r9, %r8
leaq 1(%r8), %r10
movabsq $.rodata.cst32, %rcx
vmovdqa (%rcx), %ymm1
vpxor %xmm0, %xmm0, %xmm0
movabsq $.rodata.cst8, %rcx
vpbroadcastq (%rcx), %ymm2
movabsq $1023787240, %rcx # imm = 0x3D05C0E8
vpbroadcastq (%rcx), %ymm3
movabsq $1023787248, %rcx # imm = 0x3D05C0F0
vpbroadcastq (%rcx), %ymm5
vpcmpeqd %ymm6, %ymm6, %ymm6
movabsq $1023787256, %rcx # imm = 0x3D05C0F8
vpbroadcastq (%rcx), %ymm7
movq %r8, %rcx
vpxor %xmm4, %xmm4, %xmm4
vpxor %xmm8, %xmm8, %xmm8
vpxor %xmm9, %xmm9, %xmm9
nopw %cs:(%rax,%rax)
; │ # Julia_vs_Delphi.jl within `for_fun`
L224:
vpaddq %ymm2, %ymm1, %ymm10
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
vpxor %ymm3, %ymm1, %ymm11
vpcmpgtq %ymm11, %ymm5, %ymm11
vpxor %ymm3, %ymm10, %ymm10
vpcmpgtq %ymm10, %ymm5, %ymm10
vpsubq %ymm11, %ymm0, %ymm0
vpsubq %ymm10, %ymm4, %ymm4
vpaddq %ymm11, %ymm8, %ymm8
vpsubq %ymm6, %ymm8, %ymm8
vpaddq %ymm10, %ymm9, %ymm9
vpsubq %ymm6, %ymm9, %ymm9
vpaddq %ymm7, %ymm1, %ymm1
addq $-8, %rcx
jne L224
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
vpaddq %ymm8, %ymm9, %ymm1
vextracti128 $1, %ymm1, %xmm2
vpaddq %xmm2, %xmm1, %xmm1
vpshufd $238, %xmm1, %xmm2 # xmm2 = xmm1[2,3,2,3]
vpaddq %xmm2, %xmm1, %xmm1
vmovq %xmm1, %r11
vpaddq %ymm0, %ymm4, %ymm0
vextracti128 $1, %ymm0, %xmm1
vpaddq %xmm1, %xmm0, %xmm0
vpshufd $238, %xmm0, %xmm1 # xmm1 = xmm0[2,3,2,3]
vpaddq %xmm1, %xmm0, %xmm0
vmovq %xmm0, %rdx
cmpq %r8, %r9
je L386
L346:
leaq 1(%r9), %r8
nop
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
; │┌ # operators.jl:378 within `>`
; ││┌ # int.jl:83 within `<`
L352:
xorl %ecx, %ecx
cmpq $500000, %r10 # imm = 0x7A120
seta %cl
cmpq $500001, %r10 # imm = 0x7A121
; │└└
adcq $0, %rdx
addq %rcx, %r11
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
; │┌ # range.jl:837 within `iterate`
incq %r10
; ││┌ # promotion.jl:468 within `==`
cmpq %r10, %r8
; │└└
jne L352
; │ # Julia_vs_Delphi.jl:17 within `for_fun`
L386:
movq %r9, (%rax)
movq %rdx, 8(%rax)
movq %r11, 16(%rax)
vmovaps -96(%rbp), %xmm6
vmovaps -80(%rbp), %xmm7
vmovaps -64(%rbp), %xmm8
vmovaps -48(%rbp), %xmm9
vmovaps -32(%rbp), %xmm10
vmovaps -16(%rbp), %xmm11
addq $96, %rsp
popq %rbp
vzeroupper
retq
nopw %cs:(%rax,%rax)
Let's start by noting that there is no reason for an optimizing compiler to actually perform the loop, at present Delphi and Julia output similar assembler that actually run through the loop but the compilers could in the future just skip the loop and assign the values. Microbenchmarks are tricky.
The difference seems to be that Julia makes use of SIMD instructions which makes perfect sense for such loop (~8x speedup makes perfect sense depending on your CPU).
You could have a look at this blog post for thoughts on SIMD in Delphi.
Although this is not the main point of the answer, I'll expand a bit on the possibility to remove the loop altogether. I don't know for sure what the Delphi specification says but in many compiled languages, including Julia ("just-ahead-of-time"), the compiler could simply figure out the state of the variables after the loop and replace the loop with that state. Have a look at the following C++ code (compiler explorer):
#include <cstdio>
void loop() {
long total = 0, big = 0, small = 0;
for (long i = 0; i < 100; ++i) {
total++;
if (total > 50) {
big++;
} else {
small++;
}
}
std::printf("%ld %ld %ld", total, big, small);
}
this is the assembler clang trunk outputs:
loop(): # #loop()
lea rdi, [rip + .L.str]
mov esi, 100
mov edx, 50
mov ecx, 50
xor eax, eax
jmp printf#PLT # TAILCALL
.L.str:
.asciz "%ld %ld %ld"
as you can see, no loop, just the result. For longer loops clang stops doing this optimization but that's just a limitation of the compiler, other compilers could do it differently and I'm sure there is a heavily optimizing compiler out there that handles much more complex situations.

Warning once only: UITableView was told to layout its visible cells and other contents without being in the view hierarchy?

I adding tableView on viewController.
When I open this screen, then I will get the below warning once.
[TableView] Warning once only: UITableView was told to layout its
visible cells and other contents without being in the view hierarchy
(the table view or one of its superviews has not been added to a
window). This may cause bugs by forcing views inside the table view to
load and perform layout without accurate information (e.g. table view
bounds, trait collection, layout margins, safe area insets, etc), and
will also cause unnecessary performance overhead due to extra layout
passes. Make a symbolic breakpoint at
UITableViewAlertForLayoutOutsideViewHierarchy to catch this in the
debugger and see what caused this to occur, so you can avoid this
action altogether if possible, or defer it until the table view has
been added to a window. Table view: <UITableView: 0x7fe2ba8d5a00;
frame = (-47 -80; 394 553); clipsToBounds = YES; autoresize = RM+BM;
gestureRecognizers = <NSArray: 0x600003447480>; layer = <CALayer:
0x600003aa2be0>; contentOffset: {0, 0}; contentSize: {394, 176};
adjustedContentInset: {0, 0, 0, 0}; dataSource:
<Project.MondayToSundayViewController: 0x7fe2b9da05c0>>
When added symbolic breakpoint for UITableViewAlertForLayoutOutsideViewHierarchy then below is shown in the console.
> UIKitCore`UITableViewAlertForLayoutOutsideViewHierarchy:
-> 0x7fff48260c7e <+0>: pushq %rbp
0x7fff48260c7f <+1>: movq %rsp, %rbp
0x7fff48260c82 <+4>: pushq %r14
0x7fff48260c84 <+6>: pushq %rbx
0x7fff48260c85 <+7>: subq $0x30, %rsp
0x7fff48260c89 <+11>: callq *0x3e51a6a1(%rip) ; (void *)0x00007fff51411350: objc_retain
0x7fff48260c8f <+17>: movq %rax, %rbx
0x7fff48260c92 <+20>: leaq 0x416a8dbf(%rip), %rdi ; _UIInternalPreference_UITableViewEnableAlertForLayoutOutsideViewHierarchy
0x7fff48260c99 <+27>: leaq 0x3e5a2140(%rip), %rsi ; #"UITableViewEnableAlertForLayoutOutsideViewHierarchy"
0x7fff48260ca0 <+34>: callq 0x7fff482a7529 ; _UIInternalPreferenceUsesDefault
0x7fff48260ca5 <+39>: testb %al, %al
0x7fff48260ca7 <+41>: jne 0x7fff48260cb3 ; <+53>
0x7fff48260ca9 <+43>: movb 0x416a8dad(%rip), %al ; _UIInternalPreference_UITableViewEnableAlertForLayoutOutsideViewHierarchy + 4
0x7fff48260caf <+49>: andb $0x1, %al
0x7fff48260cb1 <+51>: je 0x7fff48260d2d ; <+175>
0x7fff48260cb3 <+53>: leaq 0x416c3ffe(%rip), %rax ; _UIApplicationLinkedOnVersion
0x7fff48260cba <+60>: movl (%rax), %eax
0x7fff48260cbc <+62>: testl %eax, %eax
0x7fff48260cbe <+64>: je 0x7fff48260d3e ; <+192>
0x7fff48260cc0 <+66>: cmpl $0xd0000, %eax ; imm = 0xD0000
0x7fff48260cc5 <+71>: jb 0x7fff48260d2d ; <+175>
0x7fff48260cc7 <+73>: movq %rbx, %rdi
0x7fff48260cca <+76>: callq 0x7fff486252fa ; symbol stub for: objc_opt_class
0x7fff48260ccf <+81>: movq 0x41607402(%rip), %rsi ; "_isInternalTableView"
0x7fff48260cd6 <+88>: movq %rax, %rdi
0x7fff48260cd9 <+91>: callq *0x3e51a641(%rip) ; (void *)0x00007fff513f7780: objc_msgSend
0x7fff48260cdf <+97>: testb %al, %al
0x7fff48260ce1 <+99>: jne 0x7fff48260d2d ; <+175>
0x7fff48260ce3 <+101>: movq 0x3e519b46(%rip), %rax ; (void *)0x00007fff89ea06a0: _NSConcreteStackBlock
0x7fff48260cea <+108>: movq %rax, -0x38(%rbp)
0x7fff48260cee <+112>: movl $0xc2000000, %eax ; imm = 0xC2000000
0x7fff48260cf3 <+117>: movq %rax, -0x30(%rbp)
0x7fff48260cf7 <+121>: leaq 0x46728(%rip), %rax ; __UITableViewAlertForLayoutOutsideViewHierarchy_block_invoke
0x7fff48260cfe <+128>: movq %rax, -0x28(%rbp)
0x7fff48260d02 <+132>: leaq 0x3e51d237(%rip), %rax ; __block_descriptor_40_e8_32s_e5_v8?0l
0x7fff48260d09 <+139>: movq %rax, -0x20(%rbp)
0x7fff48260d0d <+143>: movq %rbx, %rdi
0x7fff48260d10 <+146>: callq *0x3e51a61a(%rip) ; (void *)0x00007fff51411350: objc_retain
0x7fff48260d16 <+152>: movq %rax, -0x18(%rbp)
0x7fff48260d1a <+156>: cmpq $-0x1, 0x416ca5c6(%rip) ; _UIInternalPreference_TableViewReorderingUsesDragAndDrop_block_invoke_10.__s_category + 7
0x7fff48260d22 <+164>: jne 0x7fff48260d52 ; <+212>
0x7fff48260d24 <+166>: movq %rax, %rdi
0x7fff48260d27 <+169>: callq *0x3e51a5fb(%rip) ; (void *)0x00007fff51411000: objc_release
0x7fff48260d2d <+175>: movq %rbx, %rdi
0x7fff48260d30 <+178>: addq $0x30, %rsp
0x7fff48260d34 <+182>: popq %rbx
0x7fff48260d35 <+183>: popq %r14
0x7fff48260d37 <+185>: popq %rbp
0x7fff48260d38 <+186>: jmpq *0x3e51a5ea(%rip) ; (void *)0x00007fff51411000: objc_release
0x7fff48260d3e <+192>: movl $0xd0000, %edi ; imm = 0xD0000
0x7fff48260d43 <+197>: callq 0x7fff48093724 ; _UIApplicationLinkedOnOrAfter
0x7fff48260d48 <+202>: testb %al, %al
0x7fff48260d4a <+204>: jne 0x7fff48260cc7 ; <+73>
0x7fff48260d50 <+210>: jmp 0x7fff48260d2d ; <+175>
0x7fff48260d52 <+212>: leaq 0x416ca58f(%rip), %rdi ; UITableViewAlertForLayoutOutsideViewHierarchy.once
0x7fff48260d59 <+219>: leaq -0x38(%rbp), %r14
0x7fff48260d5d <+223>: movq %r14, %rsi
0x7fff48260d60 <+226>: callq 0x7fff48624f8e ; symbol stub for: dispatch_once
0x7fff48260d65 <+231>: movq 0x20(%r14), %rax
0x7fff48260d69 <+235>: jmp 0x7fff48260d24 ; <+166>
Can you please help me to remove the warning?
You added constraint before you add your view to view hierarchy. Thas is why you get this error.
Be sure that turn by turn adding, for example :
view.addSubview(tableView) // 1
// 2 and start adding constraints..
tableView.translatesAutoresizingMaskIntoConstraints = false
...
In my case the answers provided didn't help. Changing my DiffableDataSource snapshot implementation from this:
dataSource.apply(snapShot, animatingDifferences: true)
to this:
dataSource.apply(snapShot, animatingDifferences: false)
Fixed the error.
If you set the recommended symbolic breakpoint:
UITableViewAlertForLayoutOutsideViewHierarchy
it will stop at the breakpoint within some obscure assembly language code, which isn't helpful to most mortals, besides being obtuse it's in code way after the fact.
But you can look at the stack backtrace from the XCode panel (alternatively type bt at lldb prompt)
The backtrace will show you exactly which table reload flow caused the problem which is usually a huge clue. The backtrace is typically enough of a clue to directly or indirectly let you deduce the problem and solution.
Side note: If you don't see backtrace in XCode left-hand panel, ensure there's no text in textField at bottom of that panel. It's a text filter, and if nothing matches the text you've entered in the filter nothing appears in panel!. Happens to me enough to point it out here... when, for example, I use the filter to find a view in the view hierarchy, by entering the view's address (which I copy/paste from an autolayout constraint warning in the XCode log window). That lets me see where the offending view is in the very useful 3D UIView visualizer tool in XCode, which you can visit by clicking the "triple-squares" button in XCode below the main window.
Case study of how I just debugged a problem:
I got the warning because I called tableView.reload() in a function I was calling from viewWillAppear(). And it happened:
After adding constraints
tableView had been added as a subview of viewController's view
Note: There's another answer to this question that says happens when you forget to add the view as a subview. But that's a different message when you make the more common mistake of forgetting to add a view to the hierarchy before activating its related NSLayoutConstraints. But since you usually get that error while programmatically writing your constraint code it's usually easier to figure out what's happenning.
So with debugger stopped at aforementioned breakpoint, I verified tableView's Window was nil ... e.g., typed p tableView.window at the lldb prompt.
Then I asked myself how it was possible the tableView is not in the Window hierarchy, and it dawns on me that invoking this kind of thing from viewWillAppear() is a bad because, at that point, the view hierarchy is about to appear but has not appeared yet.
So I scratched my head, because definitely need my table reloaded each time the viewcontroller loads, to pre-sselect a certain cell in the table, thus viewDidLoad() is not the right place either. :-(
I realized viewDidAppear() was a better place and it works.
I've just solved this problem, and it's really tricky. It took me half a day.
The solution is:
1.Put your UITableView inside a UIView, and set the UITableView constraint to be the same size as the UIView,
2.Then change the UIView's Y and Height by monitoring the scrolling coordinates.
This will give you the effect you want. Don't change the size of the UITableView directly, just wrap it one layer.
Check if you are trying to access tableView properties in
viewDidAppear
then change that to
viewWillAppear
So this way you now tableView is definitely present.
Another option is remove all ? (optionals) from "tableView?." to "tableView." This way it will crash right where the problem is.

Optimization bug in Apple's LLVM, or bug in code?

I have some iOS C++ code that compiles correctly on my local machine (LLVM 9.0) but compiles incorrectly on my build server (LLVM 10.0). The project is generated via CMake (same version on both) so the code being compiled is the same, with the same compiler settings.
After finally realizing that some critical values weren't being updated on the LLVM10 version I investigated the assembly and found out it was completely skipping part of the code.
void SceneDisplay::SetSize(const math::Vec2 &Size)
{
m_Size = Size;
m_ScreenWidth = int(m_Size.x * float(GraphicsUtil::WIDTH));
m_ScreenHeight = int(m_Size.y * float(GraphicsUtil::HEIGHT));
UpdateOffsetScale();
}
m_Size is initialized to 1.0,1.0 in the class constructor. This works fine and everything is perfect with LLVM9 - with LLVM10 we get the following disassembly:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq __ZN12GraphicsUtil6HEIGHTE#GOTPCREL(%rip), %rax
movq __ZN12GraphicsUtil5WIDTHE#GOTPCREL(%rip), %rcx
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movq -8(%rbp), %rsi
Ltmp2347:
movq -16(%rbp), %rdi
movq (%rdi), %rdi
movq %rdi, 56(%rsi)
movl (%rcx), %edx
movl %edx, 12(%rsi)
movl (%rax), %edx
movl %edx, 16(%rsi)
movq (%rsi), %rax
movq %rsi, %rdi
callq *136(%rax)
addq $16, %rsp
popq %rbp
retq
As you can see the assignment of the two member variables is completely 'optimized' to just assume that m_Size.x and m_Size.y are 1.0 - thus just copying the values of GraphicsUtil::WIDTH and HEIGHT.
I fixed this by changing the code to use "Size" instead of "m_Size" for those assignments, as well as making them volatile just in case. But I'm wondering if there is a legitimate compiler error here or I'm missing something?
Edit: It should be noted that m_Size is nearly never 1.0,1.0
Edit2: The correct assembly for the assignments, as generated on my machine (different arch though, not able to get the same arch as above right now)
str x8, [x0, #56]
lsr x9, x8, #32
fmov s0, w8
adrp x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #12]
fmov s0, w9
adrp x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #16]
After making a minimal test case I was able to confirm it's definitely a compiler bug.
Conditions: No other piece of code modifies m_Size, m_Size is initialized math::Vec2 m_Size{1.0, 1.0};. It works perfectly on every version of LLVM I could find before 10.0, seems some sort of regression occurred at that version.
Have submitted to Apple's LLVM team and llvm.org.
Thanks for comments.

What do these 2 lines of assembly code do?

I am in the middle of phase 2 for bomb lab and I can't seem to figure out how these two lines of assembly affect the code overall and how they play a role in the loop going on.
Here is the 2 lines of code:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
and here is the entire code:
Dump of assembler code for function phase_2:
0x08048ba4 <+0>: push %ebp
0x08048ba5 <+1>: mov %esp,%ebp
0x08048ba7 <+3>: push %ebx
0x08048ba8 <+4>: sub $0x34,%esp
0x08048bab <+7>: lea -0x20(%ebp),%eax
0x08048bae <+10>: mov %eax,0x4(%esp)
0x08048bb2 <+14>: mov 0x8(%ebp),%eax
0x08048bb5 <+17>: mov %eax,(%esp)
0x08048bb8 <+20>: call 0x804922f <read_six_numbers>
0x08048bbd <+25>: cmpl $0x0,-0x20(%ebp)
0x08048bc1 <+29>: jns 0x8048be3 <phase_2+63>
0x08048bc3 <+31>: call 0x80491ed <explode_bomb>
0x08048bc8 <+36>: jmp 0x8048be3 <phase_2+63>
0x08048bca <+38>: mov %ebx,%eax
0x08048bcc <+40>: add -0x24(%ebp,%ebx,4),%eax
0x08048bd0 <+44>: cmp %eax,-0x20(%ebp,%ebx,4)
0x08048bd4 <+48>: je 0x8048bdb <phase_2+55>
0x08048bd6 <+50>: call 0x80491ed <explode_bomb>
0x08048bdb <+55>: inc %ebx
0x08048bdc <+56>: cmp $0x6,%ebx
0x08048bdf <+59>: jne 0x8048bca <phase_2+38>
0x08048be1 <+61>: jmp 0x8048bea <phase_2+70>
0x08048be3 <+63>: mov $0x1,%ebx
0x08048be8 <+68>: jmp 0x8048bca <phase_2+38>
0x08048bea <+70>: add $0x34,%esp
0x08048bed <+73>: pop %ebx
0x08048bee <+74>: pop %ebp
0x08048bef <+75>: ret
I noticed the inc command that increments %ebx by 1 and using that as %eax in the loop. But the add and cmp trip me up every time. If I had %eax as 1 going into to the add and cmp what %eax comes out? Thanks! I also know that once %ebx gets to 5 then the loop is over and it ends the entire code.
You got a list of 6 numbers. This means you can compare at most 5 pairs of numbers. So the loop that uses %ebx does 5 iterations.
In each iteration the value at the lower address is added to the current loop count, and then compared with the value at the next higher address. As long as they match the bomb won't explode!
This loops 5 times:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
These numbers are used:
with %ebx=1 numbers are at -0x20(%ebp) and -0x1C(%ebp)
with %ebx=2 numbers are at -0x1C(%ebp) and -0x18(%ebp)
with %ebx=3 numbers are at -0x18(%ebp) and -0x14(%ebp)
with %ebx=4 numbers are at -0x14(%ebp) and -0x10(%ebp)
with %ebx=5 numbers are at -0x10(%ebp) and -0x0C(%ebp)
Those two instructions are dealing with memory at two locations, indexed by ebp and ebx. In particular, the add instruction is keeping a running total of all the numbers examined so far, and the comparison instruction is checking whether that is equal to the next number. So something like:
int total = 0;
for (i=0; ..., i++) {
total += array[i];
if (total != array[i+])
explode_bomb();
}

Displaying environment variables in assembly language

I am trying to understand how assembly works by making a basic program to display environement variables like
C code :
int main(int ac, char **av, char **env)
{
int x;
int y;
y = -1;
while (env[++y])
{
x = -1;
while (env[y][++x])
{
write(1, &(env[y][x]), 1);
}
}
return (0);
}
I compiled that with gcc -S (on cygwin64) to see how to do, and wrote it my own way (similar but not same), but it did not work...
$>gcc my_av.s && ./a.exe
HOMEPATH=\Users\hadrien▒2▒p
My assembly code :
.file "test.c"
.LC0:
.ascii "\n\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax
cltq
addq 32(%rbp), %rax
movq (%rax), %rax
movq %rax, %rdx
movl -4(%rbp), %eax
cltq
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
addq 32(%rbp), %rax
movq (%rax), %rax
cmpq $0, %rax
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef
Could someone explain me what is wrong with this code please ?
Also, while trying to solve the problem i tired to replace $0 by $97 in cmpq operation, thinking it would stop on 'a' character but it didn't... Why ?
You have a few issues. In this code (loop2) you have:
addq %rdx, %rax
movq %rax, %rdx
movq (%rax), %rax
cmpq $0, %rax
movq (%rax), %rax has read the next 8 characters in %rax. You are only interested in the first character. One way to achieve this is to compare the least significant byte in %rax with 0. You can use cmpb and use the %al register:
cmpb $0, %al
The biggest issue though is understanding that char **env is a pointer to array of char * .You first need to get the base pointer for the array, then that base pointer is indexed with y. The indexing looks something like basepointer + (y * 8) . You need to multiply y by 8 because each pointer is 8 bytes wide. The pointer at that location will be the char * for a particular environment string. Then you can index each character in the string array until you find a NUL (0) terminating character.
I've amended the code slightly and added comments on the few lines I changed:
.file "test.c"
.LC0:
.ascii "\x0a\0"
.LC1:
.ascii "\033[1;31m.\033[0m\0"
.LC2:
.ascii "\033[1;31m#\033[0m\0"
.LCtest0:
.ascii "\033[1;32mdebug\033[0m\0"
.LCtest1:
.ascii "\033[1;31mdebug\033[0m\0"
.LCtest2:
.ascii "\033[1;34mdebug\033[0m\0"
.def main; .scl 2; .type 32; .endef
main:
/* initialisation du main */
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl %ecx, 16(%rbp) /* int argc */
movq %rdx, 24(%rbp) /* char **argv */
movq %r8, 32(%rbp) /* char **env */
/* saut de ligne */
/* write init */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
/* write */
call write
/* debut du code */
movl $-1, -8(%rbp) /* y = -1 */
jmp .Loop_1_condition
.Loop_1_body:
movl $-1, -4(%rbp)
jmp .Loop_2_condition
.Loop_2_body:
/* affiche le charactere */
movl $1, %r8d
movl $1, %ecx
call write
.Loop_2_condition:
addl $1, -4(%rbp) /* x = -1 */
movl -8(%rbp), %eax /* get y index */
cltq
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rdx /* get pointer at envp+y*8
pointers are 8 bytes wide */
movl -4(%rbp), %eax /* get x */
cltq
leaq (%rdx, %rax), %rdx /* Get current character's address */
cmpb $0, (%rdx) /* Compare current byte to char 0
using cmpq will compare the next 8 bytes */
jne .Loop_2_body
/* saut de ligne */
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
.Loop_1_condition:
addl $1, -8(%rbp) /* ++y */
movl -8(%rbp), %eax
cltq /* passe eax en 64bits */
movq 32(%rbp), %rbx /* get envp (pointer to element 0 of char * array) */
movq (%rbx,%rax,8), %rax /* get pointer at envp+y*8
pointers are 8 bytes wide */
cmpq $0, %rax /* Compare to NULL ptr */
jne .Loop_1_body
movl $1, %r8d /* write size */
movl $1, %ecx /* sortie standart */
leaq .LC0(%rip), %rdx
call write
/* fin du programme */
movl $0, %eax /* return (0) */
addq $48, %rsp
popq %rbp
ret
.def write; .scl 2; .type 32; .endef

Resources