I've been researching a game's engine for the last few months and after extracting the DX9 bytecode, have begun examining the game's shaders. I mainly worked with OpenGL so some of the syntax or way of doing things is new to me.
Anyway, this is a 5x5 gaussian blur pixel shader (3.0). Can anyone explain what is going on with the texture coords declaration? Does this mean 4 coordinates are being sent into the pixel shader? Or is the same texture coordinate being reused?
def c0, 0.120259, 0.162103, 0.089216, 0.000000
dcl_texcoord0 v0.xy
dcl_texcoord1 v1
dcl_texcoord2 v2
dcl_texcoord3 v3
dcl_texcoord4 v4
dcl_2d s0
tex_pp r0, v1.zwzw, s0
mul r0, r0, c0.x
tex_pp r1, v1, s0
mad_pp r0, r1, c0.y, r0
tex_pp r1, v2, s0
mad_pp r0, r1, c0.x, r0
tex_pp r1, v2.zwzw, s0
mad_pp r0, r1, c0.x, r0
tex_pp r1, v3, s0
mad_pp r0, r1, c0.x, r0
tex_pp r1, v3.zwzw, s0
mad_pp r0, r1, c0.z, r0
tex_pp r1, v4, s0
mad_pp r0, r1, c0.z, r0
tex_pp r1, v4.zwzw, s0
mad_pp r0, r1, c0.z, r0
tex_pp r1, v0, s0
mad_pp oC0, r1, c0.z, r0
end
The dcl_texcoord ASM instruction declares a register that contains data. For a pixel shader in D3D9, this data would be written in the vertex stage. So, yes, in this shader there are in fact 5 sets of texture coordinates in use (0..4). The first coordinate only uses the two first components, which is why is declared with those explicitly (this reduces the bandwidth required between the stages). The other four do additional samples with the upper two components as well, so the shader samples in 9 times total.
Related
I'm trying to trigger a DMA channel to perform a memory-to-memory transfer within the SRAM. The channel throws up the busy flag, but there is no change to the transfer count.
I have the following code:
;reset dma
ldr r0, *resets_clr ;clear reset
mov r1, 4
str r1, (r0, 0)
ldr r0, *resets_rw ;confirm reset clear
*rst_dma
ldr r2, (r0, 2)
and r2, r1
beq *rst_dma
;start dma channel
ldr r0, *dma_rw
adr r1, *var
str r1, (r0, 0) ;read addr
ldr r1, *sram_addr
str r1, (r0, 1) ;write addr
mov r1, 8
str r1, (r0, 2) ;trans count
mov r1, 47 ;incr_write | size_word | high_priority | en
str r1, (r0, 3) ;ctrl trig
;wait for transfer marked complete
mov r2, 1
lsl r2, r2, 24
*busy_dma
ldr r1, (r0, 3)
and r1, r2
bne *busy_dma
;test
ldr r0, *sram_addr
ldr r1, (r0, 0)
mov lr, pc
cmp r1, 7
beq *led_on
With the following word-sized values:
resets_rw 0x4000c000
resets_clr 0x4000f000
var 7
dma_rw 0x50000000
sram_addr 0x20001000
The chip hangs up in the rst_dma loop, which waits until the channel drops its busy flag. If I run without the loop, the memory test fails. The channel throws no AHB error.
I already typed the question out so might as well answer it.
The channel control register has a field to select a transfer request signal (TREQ_SEL), which resets to the first data request channel, DREQ_PIO0_TX0. If you're not using the DMA with a peripheral, and don't care about timing the transfer pace, you should set the field to 0x3f for unpaced transfers.
mov r1, 63 ;treq_sel unpaced transfer
lsl r1, r1, 15
add r1, 47 ;inc_write | size_word | en
str r1, (r0, 3) ;ctrl trig
Without this, the DMA will just sit there, waiting for a non-existent signal.
I am currently using Keil uVision4 and I am trying to implement a linked list that will go through a preset list and stop only when it either reaches the end or it finds a value that matches what is in register r0. I have debugged my code and noticed that during the first run through of the loop, the initial LDR r0, [r0] stores the value inside the first node in r0, but in the second time it comes to go through the loop, it loads 0x00000000 into r0 when it executes LDR r0, [r0]. I'm trying to figure out how it can go to the next node in the list rather than return a zero value.
AREA question1, CODE, READONLY
ENTRY
;--------------------------------------------------------------------------
LDR r1, =0x12347777
ADR r0, List ;r0 points to the first element in list
Loop LDR r4, [r0] ;places the next pointer in r0
CMP r0, r1
LDR r0, [r0, #4]
BEQ Store
CMP r0, #0x00 ;checks if it is the end of the linked list
BNE Loop ;if its not the end of the list, then continue reading the next node
LDR r2, =0xF0F0F0F0 ;Failure, set register r2
B fin
Store MOV r2, #0xFFFFFFFF ;Success, set register r2
LDR r3, [r0] ;Success, store pointer in r3
fin B fin
;---------------------------------------------------------------------------
AREA question1, DATA, READWRITE
List DCD 0x12341111, Item5
Item2 DCD 0x12342222, Item3
Item3 DCD 0x12343333, Item4
Item4 DCD 0x12344444, Item6
Item5 DCD 0x12345555, Item2
Item6 DCD 0x12346666, Item7
Item7 DCD 0x12347777, 0x00 ;terminator
;---------------------------------------------------------------------------
END
I have developed application for ipad, already developed for iphone and works great but when moving to ipad I merged the pages together so the page has more than one table and too many objects
when open the main page it works sometimes but when I go to another page and try to go back to the main page the app crashes and gives this
libobjc.A.dylib`objc_autorelease:
0x3b61d660: cbz r0, 0x3b61d67a ; objc_autorelease + 26
0x3b61d662: ldr r1, [r0]
0x3b61d664: movs r2, #2
0x3b61d666: ldr r1, [r1, #16]
0x3b61d668: bfi r1, r2, #0, #2
0x3b61d66c: ldrb r1, [r1]
0x3b61d66e: tst.w r1, #2
0x3b61d672: bne 0x3b61d67e ; objc_autorelease + 30
0x3b61d674: movs r1, #0
0x3b61d676: b.w 0x3b61e230 ; -[NSObject autorelease]
0x3b61d67a: movs r0, #0
0x3b61d67c: bx lr
0x3b61d67e: movw r1, #5170
0x3b61d682: movt r1, #503
0x3b61d686: add r1, pc
0x3b61d688: ldr r1, [r1]
0x3b61d68a: b.w 0x3b60d5c0 ; objc_msgSend
0x3b61d68e: nop
I have read many articles some of them said that it releasing objects already released and some of them is not helping at all telling that I should look for allocating but I don't know how to use it, tried but nothing works.
So please any one can help me with this issue?
Overreleased objects are called zombies. If you look around on the net, you can find some help on how to use the profiler (Apple Instruments) to track them down.
Here are a couple of Apple links about zombie hunting.
Finding Messages Sent To Deallocated Objects
Eradicating Zombies with the Zombies Trace Template
the following code works fine on linux-x86, darwin-x86, but not for ios-armv7.
the right output should be:
m[0]: 0.500000, v: 0.500000
m[1]: 0.500000, v: 0.500000
m[2]: 0.500000, v: 0.500000
m[3]: 0.500000, v: 0.500000
m[4]: 0.500000, v: 0.500000
but I found the wrong output:
m[0]: 0.500000, v: 0.500000
m[1]: 0.500000, v: 0.000000
m[2]: 0.500000, v: 0.000000
m[3]: 0.500000, v: 0.000000
m[4]: 0.500000, v: 0.000000
I also found the stange when it's built for ios-armv7:
[a] remove function 'func', move the function body to 'main' function, it works fine
[b] declare the array 'm[5]' as 'double m[5]', it works fine
[c] set the variable 'v' as 'v = 0.5 or v = sqrt(2.0f/8)', it works fine
[d] if the gcc optimize option is '-O0', it works fine, but when it's '-O1 or -O2', wrong output occurs
My iPad1 was cracked, so I can cross-compile a executable on my MacBook Air, and 'scp' the executable to iPad1 and run it. The following is details:
1. cross-compile a executable on Mac:
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/llvm-gcc -O1 -Wall -arch armv7 -mcpu=cortex-a8 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk -I/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/include -D__IPHONE_OS__ -miphoneos-version-min=4.0 foo.c
2. 'scp' the executable to iPad
scp a.out mobile#192.168.1.106:~
3. 'ssh' to iPad
ssh mobile#192.168.1.106 #the default password is 'alpine'
4. run a.out on iPad
./a.out
#include <stdio.h>
#include <math.h>
int
func(int n) /* [a] */
{
int i;
float m[5]; /* [b] */
double v;
v = sqrt(2.0f/n); /* [c] */
for(i=0;i<5;++i) {
m[i]=v;
printf("m[%d]: %f, v: %f\n", i, m[i], v);
}
return 0;
}
int
main(int argc, char **argv)
{
return func(8);
}
You can also find the whole code on https://gist.github.com/ashun/5992120
The following is the assembly. you can find the difference with the help of command 'vim -d'
assembly of the previous code, declare the array 'm[5]' as 'double m[5]'
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.section __TEXT,__const_coal,coalesced
.section __TEXT,__picsymbolstub4,symbol_stubs,none,16
.section __TEXT,__StaticInit,regular,pure_instructions
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _func
.align 2
.code 16
.thumb_func _func
_func:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
str r8, [sp, #-4]!
sub sp, #8
vmov.f32 s0, #2.000000e+00
movw r8, :lower16:(L_.str-(LPC0_0+4))
vmov s2, r0
movt r8, :upper16:(L_.str-(LPC0_0+4))
vcvt.f32.s32 d1, d1
LPC0_0:
add r8, pc
movs r4, #0
vdiv.f32 s0, s0, s2
vsqrt.f32 s0, s0
vcvt.f64.f32 d16, s0
vmov r5, r6, d16
LBB0_1:
mov r1, r4
mov r0, r8
mov r2, r5
mov r3, r6
vstr.64 d16, [sp]
adds r4, #1
blx _printf
cmp r4, #5
bne LBB0_1
movs r0, #0
add sp, #8
ldr r8, [sp], #4
pop {r4, r5, r6, r7, pc}
.globl _main
.align 2
.code 16
.thumb_func _main
_main:
push {r7, lr}
mov r7, sp
movs r0, #8
bl _func
movs r0, #0
pop {r7, pc}
.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "m[%d]: %f, v: %f\n"
.subsections_via_symbols
assembly of the previous code, declare the array 'm[5]' as 'double m[5]'
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.section __TEXT,__const_coal,coalesced
.section __TEXT,__picsymbolstub4,symbol_stubs,none,16
.section __TEXT,__StaticInit,regular,pure_instructions
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _func
.align 2
.code 16
.thumb_func _func
_func:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
str r8, [sp, #-4]!
**vpush {d8}**
sub sp, #8
vmov.f32 s0, #2.000000e+00
movw r8, :lower16:(L_.str-(LPC0_0+4))
vmov s2, r0
movt r8, :upper16:(L_.str-(LPC0_0+4))
vcvt.f32.s32 d1, d1
LPC0_0:
add r8, pc
movs r4, #0
vdiv.f32 s0, s0, s2
vcvt.f64.f32 d16, s0
vsqrt.f64 d8, d16
vmov r5, r6, d8
LBB0_1:
mov r1, r4
mov r0, r8
mov r2, r5
mov r3, r6
vstr.64 d8, [sp]
adds r4, #1
blx _printf
cmp r4, #5
bne LBB0_1
movs r0, #0
add sp, #8
vpop {d8}
ldr r8, [sp], #4
pop {r4, r5, r6, r7, pc}
.globl _main
.align 2
.code 16
.thumb_func _main
_main:
push {r7, lr}
mov r7, sp
movs r0, #8
bl _func
movs r0, #0
pop {r7, pc}
.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "m[%d]: %f, v: %f\n"
.subsections_via_symbols
It is not clear which assembly goes wrong as both are marked as 'declare the array m[5] as double m[5]' and unfortunately I don't have the hardware nor the cross-compiler to reproduce your problem.
Remarkably the loops in both assembly codes are very similar. The only difference being that v is located in d16 in the first and in d8 in the second. The v that is passed to the printf is located in (r5,r6) in both loops and correctly copied to (r2,r3) before calling printf. For variadic functions the floating point registers shall not be used to pass parameters, contrary to non-variadic functions. Thus both loops look correct.
The only explanation I can think of is a mismatch in the ABI used for the compiled code and the ABI of the library containing printf. Especially considering that the compiled code comes from a cross compiler and I'm assuming the printf comes from a dynamic library on the system. As the printf is called conforming to the EABI for ARM, I think the bug is in the printf of the library.
If your cross compiler allows static linking, you may try that as you will be using a library that corresponds with the compiler. Of course the application becomes bigger, but it could at least confirm suspicion on the implementation of the printf. You may want to check if the library is compiled with an EABI complying compiler. If you can step through the printf on a debugger on the iPad, then you should be able to determine where the printf is taking its floating point parameter from. It should take it from (r2,r3).
Unfortunately I can not give a conclusive answer, but I hope my pointers for further investigation are helpful.
I wrote this arm assembler code calling pthread to implement multi-threading features. I wrote two similar files, but this one is quite tricky.
The main function is:
main:
stmfd sp!, {fp,lr}
add fp, sp, #4
sub sp, sp, #8
sub r3, sp, #8
mov r0, r3
mov r1, #0
ldr r2, .l_thrd1
mov r3, #0
bl pthread_create
ldr r3, [fp, #-8]
mov r0, r3
mov r1, #0
bl pthread_join
...
use objdump to see the related disassembled code:
00405468 <pthread_join>:
405468: e5903068 ldr r3, [r0, #104] ; 0x68
40546c: e92d45f0 push {r4, r5, r6, r7, r8, sl, lr}
405470: e3530000 cmp r3, #0
405474: e24dd014 sub sp, sp, #20
405478: e1a05000 mov r5, r0
40547c: e1a06001 mov r6, r1
405480: ba00004a blt 4055b0 <pthread_join+0x148>
405484: e590321c ldr r3, [r0, #540] ; 0x21c
....
It looks normal, unless it caused the segment error. The qemu.log looks messy and crappy around this:
----------------
IN: pthread_join
INST: isa=[0] opk=[JMP_OP] src={-,-,-,-} dst={-,-} shift={-,-,-} c=[1] s=[-] imm=[24,74] rotate_reg=[-] vfp={-,-,-,-} vfp_val={-,-,-,-} ###
0x00405468: e5903068 ### ldr r3, [r0, #104]
0x0040546c: e92d45f0 ### push {r4, r5, r6, r7, r8, sl, lr}
0x00405470: e3530000 ### cmp r3, #0 ; 0x0
0x00405474: e24dd014 ### sub sp, sp, #20 ; 0x14
0x00405478: e1a05000 ### mov r5, r0
0x0040547c: e1a06001 ### mov r6, r1
0x00405480: ba00004a ### b.lt 0x4055b0
----------------
IN: pthread_join
INST: isa=[0] opk=[JMP_OP] src={-,-,-,-} dst={-,-} shift={-,-,-} c=[1] s=[-] imm=[24,74] rotate_reg=[-] vfp={-,-,-,-} vfp_val={-,-,-,-} ###
0x00405468: e5903068 ### ldr r3, [r0, #104]
0x0040546c: e92d45f0 ### push----------------
IN: start_thread
INST: isa=[0] opk=[JMP_OP] src={-,-,-,-} dst={-,-} shift={-,-,-} c=[0] s=[-] imm=[24,4148] rotate_reg=[-] vfp={-,-,-,-} vfp_val={-,-,-,-} ###
0x00404274: e7802003 ### str{r4, r5 , r6, r7r2, ,r8, sl[r0, , lrr3}]
0x00405470: e3530000 ###
....
Obviously, pthread_join has been entered twice. And at the second time, the 'push' instruction seems not to have been fully executed. The registers also seem normal. I just do not get it.
Another code runs in the right order. They almost same coded.
Nobody's got an answer for the question. I got to answer it myself.
The problem was caused by because stack pointer (r13) was unintentionally saved on the memory and changed by another thread. So r13 pointed to another memory address and caused the segment error.