What do these 2 lines of assembly code do? - memory

I am in the middle of phase 2 for bomb lab and I can't seem to figure out how these two lines of assembly affect the code overall and how they play a role in the loop going on.
Here is the 2 lines of code:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
and here is the entire code:
Dump of assembler code for function phase_2:
0x08048ba4 <+0>: push %ebp
0x08048ba5 <+1>: mov %esp,%ebp
0x08048ba7 <+3>: push %ebx
0x08048ba8 <+4>: sub $0x34,%esp
0x08048bab <+7>: lea -0x20(%ebp),%eax
0x08048bae <+10>: mov %eax,0x4(%esp)
0x08048bb2 <+14>: mov 0x8(%ebp),%eax
0x08048bb5 <+17>: mov %eax,(%esp)
0x08048bb8 <+20>: call 0x804922f <read_six_numbers>
0x08048bbd <+25>: cmpl $0x0,-0x20(%ebp)
0x08048bc1 <+29>: jns 0x8048be3 <phase_2+63>
0x08048bc3 <+31>: call 0x80491ed <explode_bomb>
0x08048bc8 <+36>: jmp 0x8048be3 <phase_2+63>
0x08048bca <+38>: mov %ebx,%eax
0x08048bcc <+40>: add -0x24(%ebp,%ebx,4),%eax
0x08048bd0 <+44>: cmp %eax,-0x20(%ebp,%ebx,4)
0x08048bd4 <+48>: je 0x8048bdb <phase_2+55>
0x08048bd6 <+50>: call 0x80491ed <explode_bomb>
0x08048bdb <+55>: inc %ebx
0x08048bdc <+56>: cmp $0x6,%ebx
0x08048bdf <+59>: jne 0x8048bca <phase_2+38>
0x08048be1 <+61>: jmp 0x8048bea <phase_2+70>
0x08048be3 <+63>: mov $0x1,%ebx
0x08048be8 <+68>: jmp 0x8048bca <phase_2+38>
0x08048bea <+70>: add $0x34,%esp
0x08048bed <+73>: pop %ebx
0x08048bee <+74>: pop %ebp
0x08048bef <+75>: ret
I noticed the inc command that increments %ebx by 1 and using that as %eax in the loop. But the add and cmp trip me up every time. If I had %eax as 1 going into to the add and cmp what %eax comes out? Thanks! I also know that once %ebx gets to 5 then the loop is over and it ends the entire code.

You got a list of 6 numbers. This means you can compare at most 5 pairs of numbers. So the loop that uses %ebx does 5 iterations.
In each iteration the value at the lower address is added to the current loop count, and then compared with the value at the next higher address. As long as they match the bomb won't explode!
This loops 5 times:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
These numbers are used:
with %ebx=1 numbers are at -0x20(%ebp) and -0x1C(%ebp)
with %ebx=2 numbers are at -0x1C(%ebp) and -0x18(%ebp)
with %ebx=3 numbers are at -0x18(%ebp) and -0x14(%ebp)
with %ebx=4 numbers are at -0x14(%ebp) and -0x10(%ebp)
with %ebx=5 numbers are at -0x10(%ebp) and -0x0C(%ebp)

Those two instructions are dealing with memory at two locations, indexed by ebp and ebx. In particular, the add instruction is keeping a running total of all the numbers examined so far, and the comparison instruction is checking whether that is equal to the next number. So something like:
int total = 0;
for (i=0; ..., i++) {
total += array[i];
if (total != array[i+])
explode_bomb();
}

Related

Odd Crash in Swift, related to setting sublayers to nil

This is a follow on to this question. In this routine,
func tableView(_ tableView: UITableView, didSelectRowAt indexPath: IndexPath) {
Tracker.track("getting row \(indexPath.row)")
let ptv = tableView as? NovilloTableView
if ptv!.uiType == .textTable {
let gp = Projects.currentProject?.getPaths(type: PaletteView.getCurrentPane())
GitPaths.currentGitPath = gp![indexPath.row]
// NotificationCenter.default.post(name: NNames.updateWebText.nn(), object: nil)
return
}
let svgs = Projects.currentProject!.getPaths(type : PaletteView.getCurrentPane())
var gitPath = svgs[indexPath.row]
Tracker.track("gitpath is \(gitPath)")
var gitPaths = GitPaths.getMediaBoundingBoxes(paths: [gitPath])
guard let pathArrays = gitPath.parseForRegBeziers() else { return }
let rslt = pathArrays.0
let regBeziers = pathArrays.1
gitPath.boundingBox = gitPath.getBoundsParamsForPaths(src: regBeziers.isEmpty ? rslt : regBeziers)
GitPaths.currentGitPath = gitPath
// Tracker.track("sending notification")
NotificationCenter.default.post(name: NNames.updateMedia.nn(), object: nil, userInfo: ["path" : gitPath])
Tracker.track("completed didSelect")
return
}
…the main thread logical path I'm following is the one that ends at the bottom withTracker.track("completed didSelect"). I'm getting a crash if I execute the notification call, that throws this information:
libobjc.A.dylib`objc_msgSend:
0x18002ec00 <+0>: cmp x0, #0x0
0x18002ec04 <+4>: b.le 0x18002ec6c ; <+108>
0x18002ec08 <+8>: ldr x14, [x0]
0x18002ec0c <+12>: and x16, x14, #0x7ffffffffffff8
0x18002ec10 <+16>: mov x15, x16
-> 0x18002ec14 <+20>: ldr x10, [x16, #0x10]
0x18002ec18 <+24>: lsr x11, x10, #48
0x18002ec1c <+28>: and x10, x10, #0xffffffffffff
0x18002ec20 <+32>: and w12, w1, w11
0x18002ec24 <+36>: add x13, x10, x12, lsl #4
0x18002ec28 <+40>: ldp x17, x9, [x13], #-0x10
0x18002ec2c <+44>: cmp x9, x1
0x18002ec30 <+48>: b.ne 0x18002ec3c ; <+60>
0x18002ec34 <+52>: eor x17, x17, x16
0x18002ec38 <+56>: br x17
0x18002ec3c <+60>: cbz x9, 0x18002eea0 ; _objc_msgSend_uncached
0x18002ec40 <+64>: cmp x13, x10
0x18002ec44 <+68>: b.hs 0x18002ec28 ; <+40>
0x18002ec48 <+72>: add x13, x10, w11, uxtw #4
0x18002ec4c <+76>: add x12, x10, x12, lsl #4
0x18002ec50 <+80>: ldp x17, x9, [x13], #-0x10
0x18002ec54 <+84>: cmp x9, x1
0x18002ec58 <+88>: b.eq 0x18002ec34 ; <+52>
0x18002ec5c <+92>: cmp x9, #0x0
0x18002ec60 <+96>: ccmp x13, x12, #0x0, ne
0x18002ec64 <+100>: b.hi 0x18002ec50 ; <+80>
0x18002ec68 <+104>: b 0x18002eea0 ; _objc_msgSend_uncached
0x18002ec6c <+108>: b.eq 0x18002ec90 ; <+144>
0x18002ec70 <+112>: and x10, x0, #0x7
0x18002ec74 <+116>: asr x11, x0, #55
0x18002ec78 <+120>: cmp x10, #0x7
0x18002ec7c <+124>: csel x12, x11, x10, eq
0x18002ec80 <+128>: adrp x10, 232550
0x18002ec84 <+132>: add x10, x10, #0xa00 ; objc_debug_taggedpointer_classes
0x18002ec88 <+136>: ldr x16, [x10, x12, lsl #3]
0x18002ec8c <+140>: b 0x18002ec10 ; <+16>
0x18002ec90 <+144>: mov x1, #0x0
0x18002ec94 <+148>: movi d0, #0000000000000000
0x18002ec98 <+152>: movi d1, #0000000000000000
0x18002ec9c <+156>: movi d2, #0000000000000000
0x18002eca0 <+160>: movi d3, #0000000000000000
0x18002eca4 <+164>: ret
0x18002eca8 <+168>: nop
0x18002ecac <+172>: nop
0x18002ecb0 <+176>: nop
0x18002ecb4 <+180>: nop
0x18002ecb8 <+184>: nop
0x18002ecbc <+188>: nop
According to another post in Stackoverflow, that message has come up when functions that need to be visible to Objective-C aren't marked with #objc, but as you can see, this one is (below).
This wasn't happening at first, and I'm not sure why, but the function called by the Notification is this:
#objc func updateMedia(notification : Notification) {
let path = (notification.userInfo?["path"] ?? GitPaths.currentGitPath!) as? GitPaths
Tracker.track("sublayers: \(mediaDisplay!.layer.sublayers == nil)")
mediaDisplay!.layer.sublayers = nil
mediaDisplay!.mask = nil
// Tracker.track("render beziers for \(path)")
// path!.renderBeziers(tgt: mediaDisplay!, path : path) //, data:["style" : "media"])
// refreshMediaInfo()
// updateSelectedMedia( src : GitPaths.currentGitPath! )
// return
}
I've commented most lines out to see where the crash can be induced, and it's the line mediaDisplay!.layer.sublayers = nil. If I comment this line out, the function executes correctly; if I include it, it will crash, but not as that line executes; the whole function will return, and the crash happens at the end of the function that called the Notification in the first place, which is the one at the top of this post. Tracker.track() is just a way to print messages in a formatted way, and isn't a contributor to this; so basically, after the Notification returns, nothing else happens; if I step through, it gets to the final bracket of the function, before returning control the the user.
I've checked that the object mediaDisplay exists, and it does because it's actually doing what is being asked; when not commented out, the line path!.renderBeziers(tgt: mediaDisplay!, path : path) , uncommented is drawing a bunch of Bezier paths into that view, which as this screenshot taken after the crash shows it does successfully. In other words, the line that causes the crash doesn't stop all the other code that's behind path!.renderBeziers(tgt: mediaDisplay!, path : path) from doing its job, when I uncomment those and run the same thing. The table in the palette is the object that initiates all this, btw.
The view has a big question mark hanging over it; it is a subclass of a WKWebView, which is the big change here. I'm using it here as for regular UIView capability, of acting as a container for a bunch of CAShapeLayers. This is working exactly as it was before, when it was a UIView.
The reason for the change is that I want to be able to display html content in the same view as the CAShapeLayers, as a way of having html interleaved between different drawn elements on the screen; think of text with the dark purple shape behind it and the lighter one in front. In this, I'm following a question I asked which was answered here.
In any case, referring to the container in the next line, where the mask is set to nil does not cause the crash; so it seems to have to do with the layer of the WKWebView, and the sublayers of it. They exist, and I've checked that, but setting them to nil seems to blow this up, in this weird way.
I'm sure I'm missing something; I haven't used WebViews before, so I'm expecting that maybe that's the issue; but it's not intuitive to me what could be going wrong, and I've tried multiple strategies for debugging this. The one that has gotten me the closest to pinpointing the problem is what I've shown here, where I can locate it in the one line; but it seems pretty unproblematic to me...am I missing something obvious?
Thanks in advance for your ideas and insights.
It would appear that #Larme was on the right track: I eventually traced it to the line where the view's layer's sublayers were set to nil. This was the problem. Iterating through the sublayers if present and removing them individually from the parent layer caused the crash to disappear.
The same problem cropped up in a second view, also a WKWebView, where applying the same solution caused a similar crash. In both cases, the error message was entirely unhelpful. In the second case, I simply commented out all the code related to sublayer, and things worked fine. I suspect that this might cause problems at a later stage when I need to update the view with other sublayer information, but I am not in a situation to test that right now.
I'm travelling right now without access to my original project, so sorry for no code to show; but the basic iteration through the layer's sublayers should not be too hard to work out.

Julia massively outperforms Delphi. Obsolete asm code by Delphi compiler?

I wrote a simple for loop in Delphi.
The same program is 7.6 times faster in Julia 1.6.
procedure TfrmTester.btnForLoopClick(Sender: TObject);
VAR
i, Total, Big, Small: Integer;
s: string;
begin
TimerStart;
Total:= 0;
Big := 0;
Small:= 0;
for i:= 1 to 1000000000 DO //1 billion
begin
Total:= Total+1;
if Total > 500000
then Big:= Big+1
else Small:= Small+1;
end;
s:= TimerElapsedS;
//here code to show Big/Small on the screen
end;
The ASM code seems decent to me:
TesterForm.pas.111: TimerStart;
007BB91D E8DE7CF9FF call TimerStart
TesterForm.pas.113: Total:= 0;
007BB922 33C0 xor eax,eax
007BB924 8945F4 mov [ebp-$0c],eax
TesterForm.pas.114: Big := 0;
007BB927 33C0 xor eax,eax
007BB929 8945F0 mov [ebp-$10],eax
TesterForm.pas.115: Small:= 0;
007BB92C 33C0 xor eax,eax
007BB92E 8945EC mov [ebp-$14],eax
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB931 C745F801000000 mov [ebp-$08],$00000001
TesterForm.pas.118: Total:= Total+1;
007BB938 FF45F4 inc dword ptr [ebp-$0c]
TesterForm.pas.119: if Total > 500000
007BB93B 817DF420A10700 cmp [ebp-$0c],$0007a120
007BB942 7E05 jle $007bb949
TesterForm.pas.120: then Big:= Big+1
007BB944 FF45F0 inc dword ptr [ebp-$10]
007BB947 EB03 jmp $007bb94c
TesterForm.pas.121: else Small:= Small+1;
007BB949 FF45EC inc dword ptr [ebp-$14]
TesterForm.pas.122: end;
007BB94C FF45F8 inc dword ptr [ebp-$08]
TesterForm.pas.**116**: for i:= 1 to 1000000000 DO //1 billion
007BB94F 817DF801CA9A3B cmp [ebp-$08],$3b9aca01
007BB956 75E0 jnz $007bb938
TesterForm.pas.124: s:= TimerElapsedS;
007BB958 8D45E8 lea eax,[ebp-$18]
How can it be that Delphi has such a pathetic score compared with Julia?
Can I do anything to improve the code generated by the compiler?
Info
My Delphi 10.4.2 program is Win32 bit. Of course, I run in "Release" mode :)
But the ASM code above is for the "Debug" version because I don't know how to pause the execution of the program when I run an optimized EXE file. But the difference between a Release and a Debug exe is pretty small (1.8 vs 1.5 sec). Julia does it in 195ms.
More discussions
I do have to mention that when you run the code in Julia for the first time, its time is ridiculous high, because Julia is JIT, so it has to compile the code first. The compilation time (since it is "one-time") was not included in the measurement.
Also, as AmigoJack commented, Delphi code will run pretty much everywhere, while Julia code will probably only run in computers that have a modern CPU to support all those new/fancy instructions. I do have small tools that I produced back in 2004 and still run today.
Whatever code Julia produces cannot be delivered to "customers" unless that have Julia installed.
Anyway, all these being said, it is sad that that Delphi compiler is so outdated.
I ran other tests, finding the shortest and longest string in a list of strings is 10x faster in Delphi than Julia. Allocating small blocks of memory (10000x10000x4 bytes) has the same speed.
As AhnLab mentioned, I run pretty "dry" tests. I guess a full program that performs more complex/realistic tasks needs to be written and see at the end of the program if Julia still outperforms Delphi 7x.
Update
Ok, the Julia code seems totally alien to me. Seems to use more modern ops:
; ┌ # Julia_vs_Delphi.jl:4 within `for_fun`
pushq %rbp
movq %rsp, %rbp
subq $96, %rsp
vmovdqa %xmm11, -16(%rbp)
vmovdqa %xmm10, -32(%rbp)
vmovdqa %xmm9, -48(%rbp)
vmovdqa %xmm8, -64(%rbp)
vmovdqa %xmm7, -80(%rbp)
vmovdqa %xmm6, -96(%rbp)
movq %rcx, %rax
; │ # Julia_vs_Delphi.jl:8 within `for_fun`
; │┌ # range.jl:5 within `Colon`
; ││┌ # range.jl:354 within `UnitRange`
; │││┌ # range.jl:359 within `unitrange_last`
testq %rdx, %rdx
; │└└└
jle L80
; │ # Julia_vs_Delphi.jl within `for_fun`
movq %rdx, %rcx
sarq $63, %rcx
andnq %rdx, %rcx, %r9
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
cmpq $8, %r9
jae L93
; │ # Julia_vs_Delphi.jl within `for_fun`
movl $1, %r10d
xorl %edx, %edx
xorl %r11d, %r11d
jmp L346
L80:
xorl %edx, %edx
xorl %r11d, %r11d
xorl %r9d, %r9d
jmp L386
L93: movabsq $9223372036854775800, %r8 # imm = 0x7FFFFFFFFFFFFFF8
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
andq %r9, %r8
leaq 1(%r8), %r10
movabsq $.rodata.cst32, %rcx
vmovdqa (%rcx), %ymm1
vpxor %xmm0, %xmm0, %xmm0
movabsq $.rodata.cst8, %rcx
vpbroadcastq (%rcx), %ymm2
movabsq $1023787240, %rcx # imm = 0x3D05C0E8
vpbroadcastq (%rcx), %ymm3
movabsq $1023787248, %rcx # imm = 0x3D05C0F0
vpbroadcastq (%rcx), %ymm5
vpcmpeqd %ymm6, %ymm6, %ymm6
movabsq $1023787256, %rcx # imm = 0x3D05C0F8
vpbroadcastq (%rcx), %ymm7
movq %r8, %rcx
vpxor %xmm4, %xmm4, %xmm4
vpxor %xmm8, %xmm8, %xmm8
vpxor %xmm9, %xmm9, %xmm9
nopw %cs:(%rax,%rax)
; │ # Julia_vs_Delphi.jl within `for_fun`
L224:
vpaddq %ymm2, %ymm1, %ymm10
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
vpxor %ymm3, %ymm1, %ymm11
vpcmpgtq %ymm11, %ymm5, %ymm11
vpxor %ymm3, %ymm10, %ymm10
vpcmpgtq %ymm10, %ymm5, %ymm10
vpsubq %ymm11, %ymm0, %ymm0
vpsubq %ymm10, %ymm4, %ymm4
vpaddq %ymm11, %ymm8, %ymm8
vpsubq %ymm6, %ymm8, %ymm8
vpaddq %ymm10, %ymm9, %ymm9
vpsubq %ymm6, %ymm9, %ymm9
vpaddq %ymm7, %ymm1, %ymm1
addq $-8, %rcx
jne L224
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
vpaddq %ymm8, %ymm9, %ymm1
vextracti128 $1, %ymm1, %xmm2
vpaddq %xmm2, %xmm1, %xmm1
vpshufd $238, %xmm1, %xmm2 # xmm2 = xmm1[2,3,2,3]
vpaddq %xmm2, %xmm1, %xmm1
vmovq %xmm1, %r11
vpaddq %ymm0, %ymm4, %ymm0
vextracti128 $1, %ymm0, %xmm1
vpaddq %xmm1, %xmm0, %xmm0
vpshufd $238, %xmm0, %xmm1 # xmm1 = xmm0[2,3,2,3]
vpaddq %xmm1, %xmm0, %xmm0
vmovq %xmm0, %rdx
cmpq %r8, %r9
je L386
L346:
leaq 1(%r9), %r8
nop
; │ # Julia_vs_Delphi.jl:10 within `for_fun`
; │┌ # operators.jl:378 within `>`
; ││┌ # int.jl:83 within `<`
L352:
xorl %ecx, %ecx
cmpq $500000, %r10 # imm = 0x7A120
seta %cl
cmpq $500001, %r10 # imm = 0x7A121
; │└└
adcq $0, %rdx
addq %rcx, %r11
; │ # Julia_vs_Delphi.jl:13 within `for_fun`
; │┌ # range.jl:837 within `iterate`
incq %r10
; ││┌ # promotion.jl:468 within `==`
cmpq %r10, %r8
; │└└
jne L352
; │ # Julia_vs_Delphi.jl:17 within `for_fun`
L386:
movq %r9, (%rax)
movq %rdx, 8(%rax)
movq %r11, 16(%rax)
vmovaps -96(%rbp), %xmm6
vmovaps -80(%rbp), %xmm7
vmovaps -64(%rbp), %xmm8
vmovaps -48(%rbp), %xmm9
vmovaps -32(%rbp), %xmm10
vmovaps -16(%rbp), %xmm11
addq $96, %rsp
popq %rbp
vzeroupper
retq
nopw %cs:(%rax,%rax)
Let's start by noting that there is no reason for an optimizing compiler to actually perform the loop, at present Delphi and Julia output similar assembler that actually run through the loop but the compilers could in the future just skip the loop and assign the values. Microbenchmarks are tricky.
The difference seems to be that Julia makes use of SIMD instructions which makes perfect sense for such loop (~8x speedup makes perfect sense depending on your CPU).
You could have a look at this blog post for thoughts on SIMD in Delphi.
Although this is not the main point of the answer, I'll expand a bit on the possibility to remove the loop altogether. I don't know for sure what the Delphi specification says but in many compiled languages, including Julia ("just-ahead-of-time"), the compiler could simply figure out the state of the variables after the loop and replace the loop with that state. Have a look at the following C++ code (compiler explorer):
#include <cstdio>
void loop() {
long total = 0, big = 0, small = 0;
for (long i = 0; i < 100; ++i) {
total++;
if (total > 50) {
big++;
} else {
small++;
}
}
std::printf("%ld %ld %ld", total, big, small);
}
this is the assembler clang trunk outputs:
loop(): # #loop()
lea rdi, [rip + .L.str]
mov esi, 100
mov edx, 50
mov ecx, 50
xor eax, eax
jmp printf#PLT # TAILCALL
.L.str:
.asciz "%ld %ld %ld"
as you can see, no loop, just the result. For longer loops clang stops doing this optimization but that's just a limitation of the compiler, other compilers could do it differently and I'm sure there is a heavily optimizing compiler out there that handles much more complex situations.

What would be the best approach patch-finding the pointer of a certain function on the XNU Kernel?

I am currently working on an iOS Jailbreak for iOS 13.7.
As part of the jailbreak, I need to do a series of patches to the XNU Kernel live in the memory.
Of course, the kernel is protected by kASLR, KPP / KTRR, and other memory watchdogs that would trigger a Kernel Panic if something is modified.
As luck would have it, KTRR (Kernel Text Ready Only Region) can only protect, well, static data that is not supposed to change (i.e. the TEXT section and constants). The variables can still be altered.
I am building a PatchFinder which is supposed to locate a function or a variable in the XNU memory based on tell-tale symbols and I am wondering what would be the most effective approach for this.
I am currently adapting on top of the PatchFinder made publicly available back in the iOS 8 era by in7egal which looks like this:
uint32_t find_cs_enforcement_disable_amfi(uint32_t region, uint8_t* kdata, size_t ksize)
{
// Find a function referencing cs_enforcement_disable_amfi
const uint8_t search_function[] = {0x20, 0x68, 0x40, 0xF4, 0x40, 0x70, 0x20, 0x60, 0x00, 0x20, 0x90, 0xBD};
uint8_t* ptr = memmem(kdata, ksize, search_function, sizeof(search_function));
if(!ptr)
return 0;
// Only LDRB in there should try to dereference cs_enforcement_disable_amfi
uint16_t* ldrb = find_last_insn_matching(region, kdata, ksize, (uint16_t*) ptr, insn_is_ldrb_imm);
if(!ldrb)
return 0;
// Weird, not the right one.
if(insn_ldrb_imm_imm(ldrb) != 0 || insn_ldrb_imm_rt(ldrb) > 12)
return 0;
// See what address that LDRB is dereferencing
return find_pc_rel_value(region, kdata, ksize, ldrb, insn_ldrb_imm_rn(ldrb));
}
I wonder if there is any faster way or a more reliable way to locate the cs_enforcement_disable_amfi.
Once found by the PatchFinder in the XNU Kernel memory, it's used like this:
uint32_t cs_enforcement_disable_amfi = find_cs_enforcement_disable_amfi(kernel_base, kdata, ksize);
printf("cs_enforcement_disable_amfi is at=0x%08x\n",cs_enforcement_disable_amfi);
if (cs_enforcement_disable_amfi){
char patch[] ="\x00\xbf\x00\xbf\x00\xbf\x00\xbf\x00\xbf";
kern_return_t kernret = vm_write(proccessTask, cs_enforcement_disable_amfi+kernel_base, patch, sizeof(patch)-1);
if (kernret == KERN_SUCCESS){
printf("Successfully patched cs_enforcement_disable_amfi\n");
}
}
So the PatchFinder has to be able to reliably return the pointer to cs_enforcement_disable_amfi otherwise I am blindly writing to an invalid (or valid but different) address which almost certainly will trigger memory corruption.
The current code does return a valid pointer to cs_enforcement_disable_amfi most of the time, but randomly panics the kernel about 10-15% of the time which means the address it returns 10-15% of the time is invalid. Not sure how to make it more reliable.
The variable you're looking for doesn't exist anymore.
The bytes in your first snippet make up Thumb instructions, which find this function in AMFI in a 32bit kernelcache:
0x8074ad04 90b5 push {r4, r7, lr}
0x8074ad06 01af add r7, sp, 4
0x8074ad08 0d48 ldr r0, [0x8074ad40]
0x8074ad0a 7844 add r0, pc
0x8074ad0c 0078 ldrb r0, [r0]
0x8074ad0e 0128 cmp r0, 1
0x8074ad10 03d1 bne 0x8074ad1a
0x8074ad12 0020 movs r0, 0
0x8074ad14 00f04efa bl 0x8074b1b4
0x8074ad18 30b9 cbnz r0, 0x8074ad28
0x8074ad1a 7c69 ldr r4, [r7, 0x14]
0x8074ad1c 002c cmp r4, 0
0x8074ad1e 05d0 beq 0x8074ad2c
0x8074ad20 2068 ldr r0, [r4]
0x8074ad22 40f44070 orr r0, r0, 0x300
0x8074ad26 2060 str r0, [r4]
0x8074ad28 0020 movs r0, 0
0x8074ad2a 90bd pop {r4, r7, pc}
Given the magic constant 0x300 and the fact that AMFI's __TEXT_EXEC segment is quite small, we can easily find this in other kernels, including 64bit ones.
This is what it looks like on an iPhone 5s on 8.4:
0xffffff800268d2e4 f44fbea9 stp x20, x19, [sp, -0x20]!
0xffffff800268d2e8 fd7b01a9 stp x29, x30, [sp, 0x10]
0xffffff800268d2ec fd430091 add x29, sp, 0x10
0xffffff800268d2f0 f30307aa mov x19, x7
0xffffff800268d2f4 e8fc1110 adr x8, section.com.apple.driver.AppleMobileFileIntegrity.10.__DATA.__bss
0xffffff800268d2f8 1f2003d5 nop
0xffffff800268d2fc 08054039 ldrb w8, [x8, 1]
0xffffff800268d300 a8000037 tbnz w8, 0, 0xffffff800268d314
0xffffff800268d304 130100b4 cbz x19, 0xffffff800268d324
0xffffff800268d308 680240b9 ldr w8, [x19]
0xffffff800268d30c 08051832 orr w8, w8, 0x300
0xffffff800268d310 680200b9 str w8, [x19]
0xffffff800268d314 00008052 mov w0, 0
0xffffff800268d318 fd7b41a9 ldp x29, x30, [sp, 0x10]
0xffffff800268d31c f44fc2a8 ldp x20, x19, [sp], 0x20
0xffffff800268d320 c0035fd6 ret
But by the time of iOS 11, the variable is gone:
0xfffffff006245d84 f44fbea9 stp x20, x19, [sp, -0x20]!
0xfffffff006245d88 fd7b01a9 stp x29, x30, [sp, 0x10]
0xfffffff006245d8c fd430091 add x29, sp, 0x10
0xfffffff006245d90 f30307aa mov x19, x7
0xfffffff006245d94 130100b4 cbz x19, 0xfffffff006245db4
0xfffffff006245d98 680240b9 ldr w8, [x19]
0xfffffff006245d9c 08051832 orr w8, w8, 0x300
0xfffffff006245da0 680200b9 str w8, [x19]
0xfffffff006245da4 00008052 mov w0, 0
0xfffffff006245da8 fd7b41a9 ldp x29, x30, [sp, 0x10]
0xfffffff006245dac f44fc2a8 ldp x20, x19, [sp], 0x20
0xfffffff006245db0 c0035fd6 ret
Looking at iOS 12.0b1, we can learn the signature of that function:
_vnode_check_exec(ucred*, vnode*, vnode*, label*, label*, label*, componentname*, unsigned int*, void*, unsigned long)
So yeah, finding this function is really easy:
Find AMFI's __TEXT_EXEC segment.
Find an orr wN, wN, 0x300 in it.
But that won't help you unless you defeat kernel integrity.

Optimization bug in Apple's LLVM, or bug in code?

I have some iOS C++ code that compiles correctly on my local machine (LLVM 9.0) but compiles incorrectly on my build server (LLVM 10.0). The project is generated via CMake (same version on both) so the code being compiled is the same, with the same compiler settings.
After finally realizing that some critical values weren't being updated on the LLVM10 version I investigated the assembly and found out it was completely skipping part of the code.
void SceneDisplay::SetSize(const math::Vec2 &Size)
{
m_Size = Size;
m_ScreenWidth = int(m_Size.x * float(GraphicsUtil::WIDTH));
m_ScreenHeight = int(m_Size.y * float(GraphicsUtil::HEIGHT));
UpdateOffsetScale();
}
m_Size is initialized to 1.0,1.0 in the class constructor. This works fine and everything is perfect with LLVM9 - with LLVM10 we get the following disassembly:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq __ZN12GraphicsUtil6HEIGHTE#GOTPCREL(%rip), %rax
movq __ZN12GraphicsUtil5WIDTHE#GOTPCREL(%rip), %rcx
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movq -8(%rbp), %rsi
Ltmp2347:
movq -16(%rbp), %rdi
movq (%rdi), %rdi
movq %rdi, 56(%rsi)
movl (%rcx), %edx
movl %edx, 12(%rsi)
movl (%rax), %edx
movl %edx, 16(%rsi)
movq (%rsi), %rax
movq %rsi, %rdi
callq *136(%rax)
addq $16, %rsp
popq %rbp
retq
As you can see the assignment of the two member variables is completely 'optimized' to just assume that m_Size.x and m_Size.y are 1.0 - thus just copying the values of GraphicsUtil::WIDTH and HEIGHT.
I fixed this by changing the code to use "Size" instead of "m_Size" for those assignments, as well as making them volatile just in case. But I'm wondering if there is a legitimate compiler error here or I'm missing something?
Edit: It should be noted that m_Size is nearly never 1.0,1.0
Edit2: The correct assembly for the assignments, as generated on my machine (different arch though, not able to get the same arch as above right now)
str x8, [x0, #56]
lsr x9, x8, #32
fmov s0, w8
adrp x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil5WIDTHE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #12]
fmov s0, w9
adrp x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGE
ldr x8, [x8, __ZN12GraphicsUtil6HEIGHTE#GOTPAGEOFF]
ldr s1, [x8]
ucvtf s1, s1
fmul s0, s0, s1
fcvtzs w8, s0
str w8, [x0, #16]
After making a minimal test case I was able to confirm it's definitely a compiler bug.
Conditions: No other piece of code modifies m_Size, m_Size is initialized math::Vec2 m_Size{1.0, 1.0};. It works perfectly on every version of LLVM I could find before 10.0, seems some sort of regression occurred at that version.
Have submitted to Apple's LLVM team and llvm.org.
Thanks for comments.

Memory transfer intel assembly AT&T

I have a problem moving a string bytewise from one memory adress to another. Been at this for hours and tried some different strategies. Im new to Intel assemby so I need some tips and insight to help me solve the problem.
The getText routine is supposed to transfer n (found in %rsi) bytes from ibuf to the adress in %rdi. counterI is the offset used to indicate where to start the transfer, and after the routine is over it should point to the next byte that wasn't transfered. If there isn't n bytes it should cancel the transfer and return the actual number of bytes transfered in %rax.
getText:
movq $ibuf, %r10
#in rsi is the number of bytes to be transfered
#rdi contains the memory adress for the memory space to transfer to
movq $0, %r8 #start with offset 0
movq $0, %rax #zero return register
movq (counterI), %r11
cmpb $0, (%r10, %r11, 1) #check if ibuf+counterI=NULL
jne MOVE #if so call and read to ibuf
call inImage
MOVE:
cmpq $0,%rsi #if number of bytes to read is 0
je EXIT #exit
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
incq counterI #increase pointer offset
decq %rsi #dec number of bytes to read
incq %r8 #inc offset in write buffert
movq %r8, %rax #returns number of bytes wrote to buf
movq (counterI), %r9
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
movb $0, (%rdi, %r8, 1) #move NULL to buf+%r8?
ret
movq counterI, %r9
movq $0, %r9 #used for debugging only shold not be 0
The second instruction makes the first useless but given the remark I understand you will remove it. Better still, you can remove both if you would change every occurence of %R9 into %R11.
movzbq (%r10, %r9, 1), %r10 #loads one byte+zeroes to rdi from ibuf
movq %r10, (%rdi, %r8, 1) #HERE IS THE PROBLEM I THINK
Here is a dangerous construct. You're first using %R10 as an address but then drop a zero extended data byte in it. Later in the code you will again use %R10 as an address but sadly that won't be in there! The solution is to move into a different register and to not bother about the zero extention.
movb (%r10, %r9, 1), %bl #loads one byte to rdi from ibuf
movb %bl, (%rdi, %r8, 1)
The following code can be shortened
cmpb $0, (%r10, %r9,1) #check if ibuf+offset is NULL
je EXIT #if so exit
cmpq $0, %rsi #can be cleaned up later
jne MOVE
EXIT:
as
cmpb $0, (%r10, %r9, 1) #check if ibuf+offset is NULL
jne MOVE
EXIT:

Resources