Just trying http://api.madewithmarmalade.com/ExampleArmASM.html and using iOS; the program run if I comment out the loop and the res is printed as 28. But if not comment it out, it will abend without printing the res.
Any hint why and how to fix it.
Thanks in advance.
My code is as follows:
#include <stdio.h>
#include <stdlib.h>
#define ARRAY_SIZE 512
#if defined __arm__ && defined __ARM_NEON__
static int computeSumNeon(const int a[])
{
// Computes the sum of all elements in the input array
int res = 0;
asm(".align 4 \n\t" //dennis warning avoiding
"vmov.i32 q8, #0 \n\t" //clear our accumulator register
"mov r3, #512 \n\t" //Loop condition n = ARRAY_SIZE
// ".loop1: \n\t" // No loop add 0-7 works as 28
"vld1.32 {d0, d1, d2, d3}, [%[input]]! \n\t" //load 8 elements into d0, d1, d2, d3 = q0, q1
"pld [%[input]] \n\t" // preload next set of elements
"vadd.i32 q8, q0, q8 \n\t" // q8 += q0
"vadd.i32 q8, q1, q8 \n\t" // q8 += q1
"subs r3, r3, #8 \n\t" // n -= 8
// "bne .loop1 \n\t" // n == 0?
"vpadd.i32 d0, d16, d17 \n\t" // d0[0] = d16[0] + d16[1], d0[1] = d17[0] + d17[1]
"vpaddl.u32 d0, d0 \n\t" // d0[0] = d0[0] + d0[1]
"vmov.32 %[result], d0[0] \n\t"
: [result] "=r" (res) , [input] "+r" (a)
:
: "q0", "q1", "q8", "r3");
return res;
}
#else
static int computeSumNeon(const int a[])
{
int i, res = 0;
for (i = 0; i < ARRAY_SIZE; i++)
res += a[i];
}
#endif
...
#implementation AppDelegate
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// Override point for customization after application launch.
//int* inp;
int inp[ARRAY_SIZE];
//posix_memalign((void**)&inp, 64, ARRAY_SIZE*sizeof(int)); // Align to cache line size (64bytes on a cortex A8)
// Initialise the array with consecutive integers.
int i;
for (i = 0; i < ARRAY_SIZE; i++)
{
inp[i] = i;
}
for (i = 0; i < ARRAY_SIZE; i++)
{
printf("%i,", inp[i]);
}
printf("\n\n sum 0-7:%i\n", 0+1+2+3+4+5+6+7);
int res = 0;
res = computeSumNeon(inp);
printf("res NEO :%i\n", res);
// free(inp); // error pointer being free was not allocated !!!
UISplitViewController *splitViewController = (UISplitViewController *)self.window.rootViewController;
UINavigationController *navigationController = [splitViewController.viewControllers lastObject];
navigationController.topViewController.navigationItem.leftBarButtonItem = splitViewController.displayModeButtonItem;
splitViewController.delegate = self;
return YES;
}
- (void)applicationWillResignActive:(UIApplication *)application {
...
==== assembly code generated
.align 1
.code 16 # #computeSumNeon
.thumb_func _computeSumNeon
_computeSumNeon:
Lfunc_begin3:
.loc 18 133 0 is_stmt 1 # ...
.cfi_startproc
# BB#0:
sub sp, #8
movs r1, #0
str r0, [sp, #4]
.loc 18 135 9 prologue_end # ...
Ltmp18:
str r1, [sp]
.loc 18 136 5 # ...
ldr r0, [sp, #4]
# InlineAsm Start
.align 4
vmov.i32 q8, #0x0
movw r3, #504
.loop1:
vld1.32 {d0, d1, d2, d3}, [r0]!
vadd.i32 q8, q0, q8
vadd.i32 q8, q1, q8
subs r3, #8
bne .loop1
vpadd.i32 d0, d16, d17
vpaddl.u32 d0, d0
vmov.32 r1, d0[0]
# InlineAsm End
str r1, [sp]
str r0, [sp, #4]
.loc 18 155 12 # ...
ldr r0, [sp]
.loc 18 155 5 is_stmt 0 # ...
add sp, #8
bx lr
Ltmp19:
Lfunc_end3:
.cfi_endproc
Related
I am following tutorial to float hearts like Periscope.
Link to Tutorial
To give basic hint, I am posting the code below
let heartHeight: CGFloat = 18.0
let heartsFile = "heart-bubbles.sks"
class HeartBubblesScene : SKScene {
var emitter: SKEmitterNode?
override func didMoveToView(view: SKView) {
scaleMode = .ResizeFill // make scene's size == view's size
//backgroundColor = UIColor.clearColor()
}
func beginBubbling() {
emitter = SKEmitterNode(fileNamed: heartsFile)
let x = floor(size.width / 2.0)
let y = heartHeight
emitter!.position = CGPointMake(x, y)
emitter!.name = "heart-bubbles"
emitter!.targetNode = self
emitter?.numParticlesToEmit = 1
addChild(emitter!)
emitter?.resetSimulation()
}
In my viewdidload, I have code like this to present the scene
heartBubblesView.presentScene(heartBubblesScene)
Where heartBubblesView is a SKView, which I made through an Outlet.
The issue arises when I to and fro to that view controller; suddenly, it crashed and shows me below logical:
SpriteKit`std::__1::__tree_iterator*, int> std::__1::__tree, std::__1::allocator >::find:
0x29fd0f4c <+0>: ldr r3, [r0, #4]!
0x29fd0f50 <+4>: cbz r3, 0x29fd0f82 ; <+54>
0x29fd0f52 <+6>: ldr.w r12, [r1]
0x29fd0f56 <+10>: mov r9, r0
-> 0x29fd0f58 <+12>: ldr r2, [r3, #0x10]
0x29fd0f5a <+14>: cmp r2, r12
0x29fd0f5c <+16>: bhs 0x29fd0f66 ; <+26>
0x29fd0f5e <+18>: ldr r3, [r3, #0x4]
0x29fd0f60 <+20>: cmp r3, #0x0
0x29fd0f62 <+22>: bne 0x29fd0f58 ; <+12>
0x29fd0f64 <+24>: b 0x29fd0f70 ; <+36>
0x29fd0f66 <+26>: ldr r2, [r3]
0x29fd0f68 <+28>: mov r9, r3
0x29fd0f6a <+30>: cmp r2, #0x0
0x29fd0f6c <+32>: mov r3, r2
0x29fd0f6e <+34>: bne 0x29fd0f58 ; <+12>
0x29fd0f70 <+36>: cmp r9, r0
0x29fd0f72 <+38>: beq 0x29fd0f82 ; <+54>
0x29fd0f74 <+40>: ldr.w r2, [r9, #0x10]
0x29fd0f78 <+44>: ldr r1, [r1]
0x29fd0f7a <+46>: cmp r1, r2
0x29fd0f7c <+48>: it lo
0x29fd0f7e <+50>: movlo r9, r0
0x29fd0f80 <+52>: b 0x29fd0f84 ; <+56>
0x29fd0f82 <+54>: mov r9, r0
0x29fd0f84 <+56>: mov r0, r9
0x29fd0f86 <+58>: bx lr
Then I tried to add the deinit to above class
deinit
{
emitter?.targetNode = nil
emitter!.removeAllChildren()
}
That does not work either.
I don't know but I put this below deinitialser method in my view controller with removing all children of that SKScene seems to resolve the issue.
deinit
{
heartBubblesScene.removeAllChildren()
}
I'm facing a hard problem. When I run my app on iphone 5s, my iphone will be reboot (Apple logo appear) in the first time, but then run my app again, it can run fine. It just happen when run in the first time and run on iphone 5s ( iphone 4s and iphone 5 dont encounter this issue). And when it reboot, Xcode console shows message error :
dyld`_dyld_start:
0x2befd028: mov r8, sp
0x2befd02c: sub sp, sp, #16
0x2befd030: bic sp, sp, #7
0x2befd034: ldr r3, [pc, #112] ; _dyld_start + 132
0x2befd038: sub r0, pc, #8
0x2befd03c: ldr r3, [r0, r3]
0x2befd040: sub r3, r0, r3
0x2befd044: ldr r0, [r8]
0x2befd048: ldr r1, [r8, #4]
0x2befd04c: add r2, r8, #8
0x2befd050: ldr r4, [pc, #88] ; _dyld_start + 136
0x2befd054: add r4, r4, pc
0x2befd058: str r4, [sp]
0x2befd05c: add r4, sp, #12
0x2befd060: str r4, [sp, #4]
0x2befd064: blx 0x2befd0d0 ; dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*)
0x2befd068: ldr r5, [sp, #12]
0x2befd06c: cmp r5, #0
0x2befd070: bne 0x2befd07c ; _dyld_start + 84
0x2befd074: add sp, r8, #4
0x2befd078: bx r0
0x2befd07c: mov lr, r5
0x2befd080: mov r5, r0
0x2befd084: ldr r0, [r8, #4]
0x2befd088: add r1, r8, #8
0x2befd08c: add r2, r1, r0, lsl #2
0x2befd090: add r2, r2, #4
0x2befd094: mov r3, r2
0x2befd098: ldr r4, [r3]
0x2befd09c: add r3, r3, #4
0x2befd0a0: cmp r4, #0
0x2befd0a4: bne 0x2befd098 ; _dyld_start + 112
0x2befd0a8: bx r5
0x2befd0ac: strheq r3, [r2], -r0
0x2befd0b0: .long 0xffffefa4 ; unknown opcode
This code is when start up app:
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
// Override point for customization after application launch.
// Add the tab bar controller's current view as a subview of the window
if (UI_USER_INTERFACE_IDIOM() == UIUserInterfaceIdiomPhone) {
loginViewControl = [[LoginTab alloc] initWithNibName:#"LoginTab" bundle:nil];
} else {
loginViewControl = [[LoginTab alloc] initWithNibName:#"LoginTab~ipad" bundle:nil];
}
// loginViewControl = [[LoginTab alloc] init];
UINavigationController *objNavigationController=[[[UINavigationController alloc]initWithRootViewController:loginViewControl]autorelease];
self.window.rootViewController = objNavigationController;
[self.window makeKeyAndVisible];
completeTab = [[CompletedTab alloc] init];
// photoButton.frame = CGRectMake(0, 430, 160, 49);
return YES;
}
I found on the internet but i can not find out the solution for this issue. Anybody know this issue and solution? Thanks so much.
I wrote some very simple code, aimed to work on bare metal RaspberryPi. My code consists of gpio.s (with function "flash", which turns LED on and off) and main.s, shown below.
.section .init
.globl _start
_start:
mov sp, $0x8000
b main
.section .text
.globl main
main:
ldr r5, =variable
ldr r4, [r5]
cmp r4, $100
bleq flash
loop:
b loop
.section .data
.align 4
.globl variable
variable:
.word 100
So r4 should be filled with 100 => condition flag should be eq => LED should flash! But it does not. Why?
Apart from that example, function "flash" works, as well as in the case of adding these lines after "ldr r5, =variable":
mov r1, $100
str r1, [r5]
So it seems like memory is accessible, but doesn't get initialized. I would be grateful for your explanations.
Disassembly:
./build/output.elf: file format elf32-littlearm
Disassembly of section .init:
00000000 <_start>:
0: e3a0d902 mov sp, #32768 ; 0x8000
4: ea00205c b 817c <main>
Disassembly of section .text:
00008000 <getGpioAddr>:
8000: e59f0170 ldr r0, [pc, #368] ; 8178 <flash2+0x14>
8004: e1a0f00e mov pc, lr
00008008 <setGpioFunct>:
8008: e3500035 cmp r0, #53 ; 0x35
800c: 93510007 cmpls r1, #7 ; 0x7
8010: 83a00001 movhi r0, #1 ; 0x1
8014: 81a0f00e movhi pc, lr
8018: e92d0030 push {r4, r5}
801c: e1a02001 mov r2, r1
8020: e1a01000 mov r1, r0
8024: e92d4000 push {lr}
8028: ebfffff4 bl 8000 <getGpioAddr>
802c: e8bd4000 pop {lr}
8030: e3a04000 mov r4, #0 ; 0x0
00008034 <subTen>:
8034: e351000a cmp r1, #10 ; 0xa
8038: 2241100a subcs r1, r1, #10 ; 0xa
803c: 22844001 addcs r4, r4, #1 ; 0x1
8040: 2afffffb bcs 8034 <subTen>
8044: e3a05004 mov r5, #4 ; 0x4
8048: e0030594 mul r3, r4, r5
804c: e0800003 add r0, r0, r3
8050: e3a05003 mov r5, #3 ; 0x3
8054: e0030591 mul r3, r1, r5
8058: e1a02312 lsl r2, r2, r3
805c: e3e0430e mvn r4, #939524096 ; 0x38000000
8060: e3a05009 mov r5, #9 ; 0x9
8064: e0451001 sub r1, r5, r1
8068: e3a05003 mov r5, #3 ; 0x3
806c: e0030591 mul r3, r1, r5
8070: e1a04374 ror r4, r4, r3
8074: e5905000 ldr r5, [r0]
8078: e0055004 and r5, r5, r4
807c: e1855002 orr r5, r5, r2
8080: e5805000 str r5, [r0]
8084: e8bd0030 pop {r4, r5}
8088: e3a00000 mov r0, #0 ; 0x0
808c: e1a0f00e mov pc, lr
00008090 <setPin>:
8090: e3500035 cmp r0, #53 ; 0x35
8094: 83a00001 movhi r0, #1 ; 0x1
8098: 81a0f00e movhi pc, lr
809c: e92d0020 push {r5}
80a0: e3500020 cmp r0, #32 ; 0x20
80a4: 22401020 subcs r1, r0, #32 ; 0x20
80a8: 31a01000 movcc r1, r0
80ac: 23a02020 movcs r2, #32 ; 0x20
80b0: 33a0201c movcc r2, #28 ; 0x1c
80b4: e92d4000 push {lr}
80b8: ebffffd0 bl 8000 <getGpioAddr>
80bc: e8bd4000 pop {lr}
80c0: e3a05001 mov r5, #1 ; 0x1
80c4: e1a05115 lsl r5, r5, r1
80c8: e7805002 str r5, [r0, r2]
80cc: e3a00000 mov r0, #0 ; 0x0
80d0: e8bd0020 pop {r5}
80d4: e1a0f00e mov pc, lr
000080d8 <clearPin>:
80d8: e3500035 cmp r0, #53 ; 0x35
80dc: 83a00001 movhi r0, #1 ; 0x1
80e0: 81a0f00e movhi pc, lr
80e4: e92d0020 push {r5}
80e8: e3500020 cmp r0, #32 ; 0x20
80ec: 22401020 subcs r1, r0, #32 ; 0x20
80f0: 31a01000 movcc r1, r0
80f4: 23a0202c movcs r2, #44 ; 0x2c
80f8: 33a02028 movcc r2, #40 ; 0x28
80fc: e92d4000 push {lr}
8100: ebffffbe bl 8000 <getGpioAddr>
8104: e8bd4000 pop {lr}
8108: e3a05001 mov r5, #1 ; 0x1
810c: e1a05115 lsl r5, r5, r1
8110: e7805002 str r5, [r0, r2]
8114: e3a00000 mov r0, #0 ; 0x0
8118: e8bd0020 pop {r5}
811c: e1a0f00e mov pc, lr
00008120 <flash>:
8120: e92d4013 push {r0, r1, r4, lr}
8124: e3a00010 mov r0, #16 ; 0x10
8128: e3a01001 mov r1, #1 ; 0x1
812c: ebffffb5 bl 8008 <setGpioFunct>
8130: e3a00010 mov r0, #16 ; 0x10
8134: ebffffe7 bl 80d8 <clearPin>
8138: eb000004 bl 8150 <wait>
813c: e3a00010 mov r0, #16 ; 0x10
8140: ebffffd2 bl 8090 <setPin>
8144: eb000001 bl 8150 <wait>
8148: e8bd4013 pop {r0, r1, r4, lr}
814c: e1a0f00e mov pc, lr
00008150 <wait>:
8150: e3a0583f mov r5, #4128768 ; 0x3f0000
00008154 <loop>:
8154: e2455001 sub r5, r5, #1 ; 0x1
8158: e3550000 cmp r5, #0 ; 0x0
815c: 1afffffc bne 8154 <loop>
8160: e1a0f00e mov pc, lr
00008164 <flash2>:
8164: e92d4000 push {lr}
8168: ebffffec bl 8120 <flash>
816c: ebffffeb bl 8120 <flash>
8170: e8bd4000 pop {lr}
8174: e1a0f00e mov pc, lr
8178: 20200000 .word 0x20200000
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e5954000 ldr r4, [r5]
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 00008194 .word 0x00008194
Disassembly of section .data:
00008194 <variable>:
8194: 00000064 .word 0x00000064
Linker scripts, makefile etc. taken from: http://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ok01.html
from your link (you should not ask questions here using links, put the code in the question)
0000817c <main>:
817c: e59f500c ldr r5, [pc, #12] ; 8190 <loop+0x4>
8180: e3a01064 mov r1, #100 ; 0x64
8184: e3540064 cmp r4, #100 ; 0x64
8188: 0bffffe4 bleq 8120 <flash>
0000818c <loop>:
818c: eafffffe b 818c <loop>
8190: 000081a0 .word 0x000081a0
Disassembly of section .data:
000081a0 <variable>:
81a0: 00000064 .word 0x00000064
...
you are moving r1 a 100 but comparing r4 which has not been initialized at least in this code, so that is unpredictable what will happen. if you replace that with a mov r4,[r5] it should work as desired as r5 is getting the address of the word that contains the #100 and then you read from that address into r4.
I assume you have verified that if you simply bl flash it works (not a conditional but always go there) as desired?
In this bare metal mode you definitely have access to read/write memory, no worries there.
David
Memory is normally initialized as part of the C runtime code. If you are writing bare-metal assembly without including the functionality of the C runtime then your variables in RAM will not be initialized. You need to explicitly initialize the value of variable in your own code.
Finally found out! Really subtle, and it's not my fault indeed. I had taken the makefile and linker script from Alex Chadwick tutorial, and the linker script looked like that:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x0000 : {
*(.init)
}
/*
* We allow room for the ATAGs and the stack and then start our code at
* 0x8000.
*/
.text 0x8000 : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
.init section was based at 0x0000, and then the .text started at 0x8000. But actually, kernel.img is loaded at address 0x8000 by Pi (real address of .init was 0x8000), so: whole .text section (as well as the following sections) were shifted - due to that fact, addresses of labels were misassumed at the assembling-linking time. Only pc-relative addressing could work, as PC was set correctly. The solution is to start the image at 0x8000:
SECTIONS {
/*
* First and formost we need the .init section, containing the IVT.
*/
.init 0x8000 : {
*(.init)
}
.text : {
*(.text)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
/*
* Finally comes everything else. A fun trick here is to put all other
* sections into this section, which will be discarded by default.
*/
/DISCARD/ : {
*(*)
}
}
I've just checked the template on his website and it's corrected now, so there is no point contacting him. I must have downloaded template before this correction. Thank you guys for your attempts.
I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory is 64KB. But when is this memory actually allocated on the device? Is this limit apply to each kernel, cuda context or for the whole application?
Let's say there are several kernels in a .cu file, each using less than 64K constant memory. But the total constant memory usage is more than 64K. Is it possible to call these kernels sequentially? What happens if they are called concurrently using different streams?
What happens if there is a large CUDA dynamic library with lots of kernels each using different amounts of constant memory?
What happens if there are two applications each requiring more than half of the available constant memory? The first application runs fine, but when will the second app fail? At app start, at cudaMemcpyToSymbol() calls or at kernel execution?
Parallel Thread Execution ISA Version 3.1 section 5.1.3 discusses constant banks.
Constant memory is restricted in size, currently limited to 64KB which
can be used to hold statically-sized constant variables. There is an
additional 640KB of constant memory, organized as ten independent 64KB
regions. The driver may allocate and initialize constant buffers in
these regions and pass pointers to the buffers as kernel function
parameters. Since the ten regions are not contiguous, the driver
must ensure that constant buffers are allocated so that each buffer
fits entirely within a 64KB region and does not span a region
boundary.
A simple program can be used to illustrate the use of constant memory.
__constant__ int kd_p1;
__constant__ short kd_p2;
__constant__ char kd_p3;
__constant__ double kd_p4;
__constant__ float kd_floats[8];
__global__ void parameters(int p1, short p2, char p3, double p4, int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = p1;
*pp2 = p2;
*pp3 = p3;
*pp4 = p4;
return;
}
__global__ void constants(int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = kd_p1;
*pp2 = kd_p2;
*pp3 = kd_p3;
*pp4 = kd_p4;
return;
}
Compile this for compute_30, sm_30 and execute cuobjdump -sass <executable or obj> to disassemble you should see
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 32bit
identifier = c:/dev/constant_banks/kernel.cu
code for sm_30
Function : _Z10parametersiscdPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x40001de428004005*/ MOV R0, c [0x0] [0x150]; // pp1
/*0018*/ /*0x50009de428004005*/ MOV R2, c [0x0] [0x154]; // pp2
/*0020*/ /*0x0001dde428004005*/ MOV R7, c [0x0] [0x140]; // p1
/*0028*/ /*0x13f0dc4614000005*/ LDC.U16 R3, c [0x0] [0x144]; // p2
/*0030*/ /*0x60011de428004005*/ MOV R4, c [0x0] [0x158]; // pp3
/*0038*/ /*0x70019de428004005*/ MOV R6, c [0x0] [0x15c]; // pp4
/*0048*/ /*0x20021de428004005*/ MOV R8, c [0x0] [0x148]; // p4
/*0050*/ /*0x30025de428004005*/ MOV R9, c [0x0] [0x14c]; // p4
/*0058*/ /*0x1bf15c0614000005*/ LDC.U8 R5, c [0x0] [0x146]; // p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7; // *pp1 = p1
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3; // *pp2 = p2
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5; // *pp3 = p3
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8; // *pp4 = p4
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
...........................................
Function : _Z9constantsPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x00001de428004005*/ MOV R0, c [0x0] [0x140]; // p1
/*0018*/ /*0x10009de428004005*/ MOV R2, c [0x0] [0x144]; // p2
/*0020*/ /*0x0001dde428004c00*/ MOV R7, c [0x3] [0x0]; // kd_p1
/*0028*/ /*0x13f0dc4614000c00*/ LDC.U16 R3, c [0x3] [0x4]; // kd_p2
/*0030*/ /*0x20011de428004005*/ MOV R4, c [0x0] [0x148]; // p3
/*0038*/ /*0x30019de428004005*/ MOV R6, c [0x0] [0x14c]; // p4
/*0048*/ /*0x20021de428004c00*/ MOV R8, c [0x3] [0x8]; // kd_p4
/*0050*/ /*0x30025de428004c00*/ MOV R9, c [0x3] [0xc]; // kd_p4
/*0058*/ /*0x1bf15c0614000c00*/ LDC.U8 R5, c [0x3] [0x6]; // kd_p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7;
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3;
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5;
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8;
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
.....................................
I annotated to the right of the SASS.
On sm30 you can see that parameters are passed in constant bank 0 starting at offset 0x140.
User defined __constant__ variables are defined in constant bank 3.
If you execute cuobjdump --dump-elf <executable or obj> you can find other interesting constant information.
32bit elf: abi=6, sm=30, flags = 0x1e011e
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 34 142 0 1 STRTAB 0 0 0 .shstrtab
2 176 19b 0 1 STRTAB 0 0 0 .strtab
3 314 d0 10 4 SYMTAB 0 2 a .symtab
4 3e4 50 0 4 CUDA_INFO 0 3 b .nv.info._Z9constantsPiPsPcPd
5 434 30 0 4 CUDA_INFO 0 3 0 .nv.info
6 464 90 0 4 CUDA_INFO 0 3 a .nv.info._Z10parametersiscdPiPsPcPd
7 4f4 160 0 4 PROGBITS 2 0 a .nv.constant0._Z10parametersiscdPiPsPcPd
8 654 150 0 4 PROGBITS 2 0 b .nv.constant0._Z9constantsPiPsPcPd
9 7a8 30 0 8 PROGBITS 2 0 0 .nv.constant3
a 7d8 c0 0 4 PROGBITS 6 3 a00000b .text._Z10parametersiscdPiPsPcPd
b 898 c0 0 4 PROGBITS 6 3 a00000c .text._Z9constantsPiPsPcPd
.section .strtab
.section .shstrtab
.section .symtab
index value size info other shndx name
0 0 0 0 0 0 (null)
1 0 0 3 0 a .text._Z10parametersiscdPiPsPcPd
2 0 0 3 0 7 .nv.constant0._Z10parametersiscdPiPsPcPd
3 0 0 3 0 b .text._Z9constantsPiPsPcPd
4 0 0 3 0 8 .nv.constant0._Z9constantsPiPsPcPd
5 0 0 3 0 9 .nv.constant3
6 0 4 1 0 9 kd_p1
7 4 2 1 0 9 kd_p2
8 6 1 1 0 9 kd_p3
9 8 8 1 0 9 kd_p4
10 16 32 1 0 9 kd_floats
11 0 192 12 10 a _Z10parametersiscdPiPsPcPd
12 0 192 12 10 b _Z9constantsPiPsPcPd
The kernel parameter constant bank is versioned per launch so that concurrent kernels can be executed. The compiler and user constants are per CUmodule. It is the responsibility of the developer to manage coherency of this data. For example, the developer has to ensure that a cudaMemcpyToSymbol is update in a safe manner.
I am trying to save images in photo album.I am doing this by following code:
CGSize targetSize =self.view.frame.size;
UIGraphicsBeginImageContext(targetSize);
//UIGraphicsBeginImageContextWithOptions(targetSize, NO, 2.0);
// Also tried this but no improvement
UIImage* image1 = mkImage1.image;
UIImage* image2 = mkImage2.image;
UIImage* image3 = mkImage3.image;
CGRect rectImage1 = CGRectMake(mkImage1.frame.origin.x , mkImage1.frame.origin.y , mkImage1.frame.size.width , mkImage1.frame.size.height );
CGRect rectImage2 = CGRectMake(mkImage2.frame.origin.x , mkImage2.frame.origin.y , mkImage2.frame.size.width , mkImage2.frame.size.height );
CGRect rectImage3 = CGRectMake(mkImage3.frame.origin.x , mkImage3.frame.origin.y , mkImage3.frame.size.width , mkImage3.frame.size.height );
[image1 drawInRect:rectImage1]; // crashing line
[image2 drawInRect:rectImage2];
[image3 drawInRect:rectImage3];
tempImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
UIImageWriteToSavedPhotosAlbum(tempImage, nil,nil,nil);
My app is crashing while I try to draw image1.However if I comment that line, all other images are saving properly without any issue..
My Log Info Gives me this Info:
ImageIO`ImageIO_ABGR_TO_ARGB_8Bit:
0x330abc70: push {r4, r5, r6, r7, lr}
0x330abc72: add r7, sp, #12
0x330abc74: push.w {r8, r10, r11}
0x330abc78: ldr r1, [r0]
0x330abc7a: ldr r2, [r0, #12]
0x330abc7c: cmp.w r2, r1, lsl #2
0x330abc80: blo 0x330abd06 ; ImageIO_ABGR_TO_ARGB_8Bit + 150
0x330abc82: ldr r3, [r0, #24]
0x330abc84: lsls r2, r1, #2
0x330abc86: cmp r3, r2
0x330abc88: blo 0x330abd06 ; ImageIO_ABGR_TO_ARGB_8Bit + 150
0x330abc8a: ldr r2, [r0, #4]
0x330abc8c: cmp r2, #0
0x330abc8e: beq 0x330abd06 ; ImageIO_ABGR_TO_ARGB_8Bit + 150
0x330abc90: bic lr, r1, #7
0x330abc94: ldr r3, [r0, #8]
0x330abc96: ldr.w r12, [r0, #20]
0x330abc9a: sub.w r4, r1, lr
0x330abc9e: asrs r5, r1, #3
0x330abca0: mov r6, r12
0x330abca2: mov r8, r3
0x330abca4: cbz r5, 0x330abcbe ; ImageIO_ABGR_TO_ARGB_8Bit + 78
0x330abca6: mov r8, r3
0x330abca8: mov r9, r5
0x330abcaa: mov r6, r12
0x330abcac: vld4.8 {d0, d1, d2, d3}, [r8]!
0x330abcb0: vswp d0, d2
0x330abcb4: vst4.8 {d0, d1, d2, d3}, [r6]! // EXC_BAD_ACCESS
0x330abcb8: subs.w r9, r9, #1
0x330abcbc: bne 0x330abeac ; slab_dealloc + 132
0x330abcbe: cmp lr, r1
0x330abcc0: bge 0x330abcf8 ; ImageIO_ABGR_TO_ARGB_8Bit + 136
0x330abcc2: add.w r8, r8, #2
0x330abcc6: adds r6, #2
0x330abcc8: mov r9, r4
0x330abcca: ldrb.w r11, [r8]
0x330abcce: subs.w r9, r9, #1
0x330abcd2: ldrb r10, [r8, #-2]
0x330abcd6: strb r11, [r6, #-2]
0x330abcda: ldrb r11, [r8, #-1]
0x330abcde: strb r11, [r6, #-1]
//*************///
Thread 1, Queue : com.apple.main-thread
0x32ac2526 in -[UIImage drawInRect:] ()
0x00018e32 in -[MyAppViewController saveToAlbum]
when I convert that image from png to jpg, everything is working Fine.
But I want png format only to save.