Error in the output of my flex file - token

I've written a .l file and want to output the contents in "c17.isc".
But there is an error I don't know why. I've given the file I plan to read, the flex file and the execution result.
This is the c17.isc file
The contents means
number gate_name gate_type output_number input_number fault
The line with "from" means fanout.
The line with 2 numbers only means input list.
*c17 iscas example (to test conversion program only)
*---------------------------------------------------
*
*
* total number of lines in the netlist .............. 17
* simplistically reduced equivalent fault set size = 22
* lines from primary input gates ....... 5
* lines from primary output gates ....... 2
* lines from interior gate outputs ...... 4
* lines from ** 3 ** fanout stems ... 6
*
* avg_fanin = 2.00, max_fanin = 2
* avg_fanout = 2.00, max_fanout = 2
*
*
*
*
*
1 1gat inpt 1 0 >sa1
2 2gat inpt 1 0 >sa1
3 3gat inpt 2 0 >sa0 >sa1
8 8fan from 3gat >sa1
9 9fan from 3gat >sa1
6 6gat inpt 1 0 >sa1
7 7gat inpt 1 0 >sa1
10 10gat nand 1 2 >sa1
1 8
11 11gat nand 2 2 >sa0 >sa1
9 6
14 14fan from 11gat >sa1
15 15fan from 11gat >sa1
16 16gat nand 2 2 >sa0 >sa1
2 14
20 20fan from 16gat >sa1
21 21fan from 16gat >sa1
19 19gat nand 1 2 >sa1
15 7
22 22gat nand 0 2 >sa0 >sa1
10 20
23 23gat nand 0 2 >sa0 >sa1
21 19
This is the flex file I've written.
First, this is declare file:
# include <stdio.h>
# include <string.h>
# include <stdlib.h>
# define INPT 1
# define NOR 2
# define NAND 3
# define NOT 4
# define XOR 5
# define AND 6
# define BUFF 7
# define FROM 8
Second, this is the flex file:
%{
# include "declare.h"
/*gi=1,it's input;gi=7,it's fanout;otherwise,it's gate*/
int gi=-1;
int inum=0;
int val;
struct{
char *symbol;
int val;
} symtab[]={
"inpt", INPT,
"nor", NOR,
"nand", NAND,
"not", NOT,
"xor", XOR,
"and", AND,
"buff", BUFF,
"from",FROM,
"0",0
};
extern FILE *yyin;
%}
%start A B C D E
DIGITS [0-9]+
BLANK [ \t\n]+
ALPHA [a-z]+
%%
"*".*\n {ECHO; BEGIN A;}
<A>{BLANK}{DIGITS} {printf("num=%s\t",yytext); BEGIN B;}
<B>{BLANK}{DIGITS}{ALPHA} {printf("name=%s",yytext); BEGIN C;}
<C>{BLANK}{DIGITS} {printf("op=%s\t",yytext);BEGIN D;}
<C>{BLANK}{DIGITS}{ALPHA} {ECHO; BEGIN A;}
<D>{BLANK}{DIGITS} {inum=atoi(yytext);
printf("ip=%s\t",yytext);
if(gi==1)
{BEGIN A;}
if(gi!=1)
{BEGIN E;}
}
<E>{BLANK}{DIGITS} {inum--;
if(inum<0)
{printf("num=%s\t",yytext); BEGIN B;}
else
{printf("il=%s\t",yytext); BEGIN E;}
}
{ALPHA} {gi=lookup(yytext);
if(gi!=0) printf("\tty=%d\t",gi);
else ECHO;
}
{BLANK}">sa"[0-1] {val=atoi(&yytext[yyleng-1]);printf("\tfl=%d",val);}
{BLANK} ;
%%
lookup(s)
char* s;
{int i;
for (i=0;symtab[i].val!=0;i++)
{
if(strcmp(symtab[i].symbol,s)==0)
break;
}
return(symtab[i].val);
}
main()
{
FILE *x=fopen("c17.isc","r");
yyin=x;
yylex();
}
This is the execution result. And I've marked the wrong places using *. Basically the errors occur at the lines with input lists.
For example, the first wrong line in the picture should be "num=10", the second wrong line should be "il=1 il=8" etc.
My operation on input lists in flex file lies in part E.But I don't know why it doesn't work.
num=1 name=1gat ty=1 op=1 ip=0 fl=1
num=2 name=2gat ty=1 op=1 ip=0 fl=1
num=3 name=3gat ty=1 op=2 ip=0 fl=0 fl=1
num=8 name=8fan ty=8 3gat fl=1
num=9 name=9fan ty=8 3gat fl=1
num=6 name=6gat ty=1 op=1 ip=0 fl=1
num=7 name=7gat ty=1 op=1 ip=0 fl=1
**il=10** name=10gat ty=3 op=1 ip=2 fl=1
**num=1** il=8
**il=11** name=11gat ty=3 op=2 ip=2 fl=0 fl=1
**num=9** il=6
**num=4** ...
**num=5** ...
**il=16** ...
**num=2** il=14
**num=0** ...
**num=1** ...
**il=19** ...
**num=15** il=7
**il=22** ...
**il=23** ...

This adaptation of your code seems likely to be working as you intended. There are various changes, most notably outputting some newlines, and making it clear where the num= parts are recognized.
%{
#include "declare.h"
/*gi=1, it's input;gi=7, it's fanout;otherwise, it's gate*/
static int gi = -1;
static int inum = 0;
extern int lookup(const char *s);
struct
{
char *symbol;
int val;
} symtab[]=
{
{ "inpt", INPT },
{ "nor", NOR },
{ "nand", NAND },
{ "not", NOT },
{ "xor", XOR },
{ "and", AND },
{ "buff", BUFF },
{ "from", FROM },
{ "0", 0 },
};
extern FILE *yyin;
%}
%start A B C D E
DIGITS [0-9]+
BLANK [ \t\n]+
ALPHA [a-z]+
%%
"*".*\n {ECHO; BEGIN A;}
<A>{DIGITS} {printf("\nnum1=%s\t", yytext); BEGIN B;}
<B>{DIGITS}{ALPHA} {printf(" name=%s\t", yytext); BEGIN C;}
<C>{DIGITS} {printf(" op=%s\t", yytext); BEGIN D;}
<C>{DIGITS}{ALPHA} {ECHO; BEGIN A;}
<D>{DIGITS} {
inum=atoi(yytext);
printf(" ip=%s\t", yytext);
if (gi==1)
{BEGIN A;}
if (gi!=1)
{BEGIN E;}
}
<E>{DIGITS} {inum--;
if (inum<0)
{printf("\nnum2=%s\t", yytext); BEGIN B;}
else
{printf(" il=%s\t", yytext); BEGIN E;}
}
{ALPHA} {
gi = lookup(yytext);
if (gi!=0) printf(" ty=%d (%s)\t", gi, yytext);
else { printf("Lookup failed: "); ECHO; }
}
">sa"[0-1] {int val=atoi(&yytext[yyleng-1]);printf(" fl=%d", val);}
{BLANK} ;
. { printf("Unmatched: %s\n", yytext); }
%%
int lookup(const char *s)
{
int i;
for (i = 0; symtab[i].val != 0; i++)
{
if (strcmp(symtab[i].symbol, s) == 0)
break;
}
return(symtab[i].val);
}
int main(void)
{
FILE *x=fopen("c17.isc", "r");
yyin=x;
yylex();
putchar('\n');
}
For your sample input, the output is:
*c17 iscas example (to test conversion program only)
*---------------------------------------------------
*
*
* total number of lines in the netlist .............. 17
* simplistically reduced equivalent fault set size = 22
* lines from primary input gates ....... 5
* lines from primary output gates ....... 2
* lines from interior gate outputs ...... 4
* lines from ** 3 ** fanout stems ... 6
*
* avg_fanin = 2.00, max_fanin = 2
* avg_fanout = 2.00, max_fanout = 2
*
*
*
*
*
num1=1 name=1gat ty=1 (inpt) op=1 ip=0 fl=1
num1=2 name=2gat ty=1 (inpt) op=1 ip=0 fl=1
num1=3 name=3gat ty=1 (inpt) op=2 ip=0 fl=0 fl=1
num1=8 name=8fan ty=8 (from) 3gat fl=1
num1=9 name=9fan ty=8 (from) 3gat fl=1
num1=6 name=6gat ty=1 (inpt) op=1 ip=0 fl=1
num1=7 name=7gat ty=1 (inpt) op=1 ip=0 fl=1
num1=10 name=10gat ty=3 (nand) op=1 ip=2 fl=1 il=1 il=8
num2=11 name=11gat ty=3 (nand) op=2 ip=2 fl=0 fl=1 il=9 il=6
num2=14 name=14fan ty=8 (from) 11gat fl=1
num1=15 name=15fan ty=8 (from) 11gat fl=1
num1=16 name=16gat ty=3 (nand) op=2 ip=2 fl=0 fl=1 il=2 il=14
num2=20 name=20fan ty=8 (from) 16gat fl=1
num1=21 name=21fan ty=8 (from) 16gat fl=1
num1=19 name=19gat ty=3 (nand) op=1 ip=2 fl=1 il=15 il=7
num2=22 name=22gat ty=3 (nand) op=0 ip=2 fl=0 fl=1 il=10 il=20
num2=23 name=23gat ty=3 (nand) op=0 ip=2 fl=0 fl=1 il=21 il=19
The line with num1=10 has il=1 and il=8 associated with it, which seems to reflect the data. (I modified the printout to include the type name as well as the type number.)
I'm not sure which changes are the significant ones. Losing the {BLANK} part of the rules that match digits and alpha simplifies things, I think (it is very common for scanners to substantially ignore spacing).

I'm not sure I understand your scenario correctly, but it looks like you are doing all the work of parsing the file using Flex and regular expressions?
The usual way is to use Flex to generate a scanner (the function yylex) that just identifies the tokens. A token can be a single number or gate name. The scanner then returns as soon as it has found a token. So the scanner transforms the input (the sequence of characters on your file) to a sequence of tokens.
Then you use a parser generator, typically Bison, to generate a parser, which compares those individual tokens to the grammar, and the larger structure of your input is then handled on the parser level.
It gets very complicated when you are trying to do it all in Flex, which isn't really suited for it.

Related

TSL237 sensor on ESP8266 Wemos D1 Mini

I'm trying to read a TSL237 light sensor using my ESP8266 Wemos D1 Mini board. I have got code for an Arduino Uno from here (see copied below) and loaded it on my board. I first tried pin D0 on my board (GPIO 16) for the sensor data input, then pin D1 (GPIO5). In both cases I get the same useless output; I've copied one loop of this output below. Any ideas what I'm doing wrong?
CODE:
#define TSL237 2
volatile unsigned long pulse_cnt = 0;
void setup() {
attachInterrupt(0, add_pulse, RISING);
pinMode(TSL237, INPUT);
Serial.begin(9600);
}
void add_pulse(){
pulse_cnt++;
return;
}
unsigned long Frequency() {
pulse_cnt = 0;
delay(10000);// this delay controlls pulse_cnt read out. longer delay == higher number
// DO NOT change this delay; it will void calibration.
unsigned long frequency = pulse_cnt;
return (frequency);
pulse_cnt = 0;
}
void loop() {
unsigned long frequency = Frequency();
Serial.println(frequency);
delay(5000);
}
OUTPUT:
14:26:59.605 -> ets Jan 8 2013,rst cause:2, boot mode:(3,6)
14:26:59.605 ->
14:26:59.605 -> load 0x4010f000, len 3460, room 16
14:26:59.605 -> tail 4
14:26:59.605 -> chksum 0xcc
14:26:59.605 -> load 0x3fff20b8, len 40, room 4
14:26:59.605 -> tail 4
14:26:59.605 -> chksum 0xc9
14:26:59.605 -> csum 0xc9
14:26:59.605 -> v00041fe0
14:26:59.605 -> ~ld

How do I go about making this work with a #include? It works fine when dropped straight into the code

I have a block of code that I want to #include in my z/OS Metal C program, it works fine when it's just part of the program, but when I put it into a .h file and #include it, the code won't compile.
I have successfully gotten this code to work without #include. I'm sure I'm overlooking something having to do with #include...
This code works:
#pragma margins(2,72)
*#if 0!=0
Test DSECT
Test# DS A
TestINT DS F
TestChar DS C
.ago end
*#endif
*struct Test {
* void *Test1;
* int TestInt;
* char TestChar;
*};
*#if 0!=0
.end
MEND
*#endif
#pragma nomargins
Giving compiler output that looks like this:
207 |#pragma margins(2,72)
207 +
208 |#if 0!=0
214 |#endif
215 |struct Test {
216 | void *Test1;
5650ZOS V2.1.1 z/OS XL C 'SSAF.METALC.C(CKKTHING)'
* * * * * S O U R C E * * * * *
LINE STMT
*...+....1....+....2....+....3....+....4....+....5....+....6....+
217 | int TestInt;
218 | char TestChar;
219 |};
220 |#if 0!=0
223 |#endif
224 |#pragma nomargins
But, when I put the code into an #include file like this:
EDIT SSAF.METALC.H(CKKTEST)
Command ===>
****** **************************
000001 *#if 0!=0
000002 Test DSECT
000003 Test# DS A
000004 TestINT DS F
000005 TestChar DS C
000006 .ago end
000007 *#endif
000008 *struct Test {
000009 * void *Test1;
000010 * int TestInt;
000011 * char TestChar;
000012 *};
000013 *#if 0!=0
000014 .end
000015 MEND
000016 *#endif
****** **************************
and include it in my Metal C program:
EDIT SSAF.METALC.C(CKLTHING) - 01.00
Command ===>
000205 #include"ckkprolg.h"
000206
000207 #pragma margins(2,72)
000208 #include"ckktest.h"
000209 #pragma nomargins
I get a bunch of error messages:
205 |#include"ckkprolg.h" /* Include assembler macros needed
206 | for Metal C prolog and epilog */
207 |#pragma margins(2,72)
207 +
208 |#include"ckktest.h"
*=ERROR===========> CCN3275 Unexpected text 'struct' encountered.
*=ERROR===========> CCN3166 Definition of function Test requires parentheses.
*=ERROR===========> CCN3275 Unexpected text 'void' encountered.
5650ZOS V2.1.1 z/OS XL C 'SSAF.METALC.C(CKLTHING)' 10/04/2019
* * * * * S O U R C E * * * * *
LINE STMT
*...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8....+....9...
*=ERROR===========> CCN3045 Undeclared identifier Test1.
*=ERROR===========> CCN3275 Unexpected text 'int' encountered.
*=ERROR===========> CCN3045 Undeclared identifier TestInt.
*=ERROR===========> CCN3275 Unexpected text 'char' encountered.
*=ERROR===========> CCN3045 Undeclared identifier TestChar.
*=ERROR===========> CCN3046 Syntax error.
*=ERROR===========> CCN3273 Missing type in declaration of theESTAEXStatic.
209 |#pragma nomargins
The include file is missing #pragma margins. Since it is a file level directive, it needs to be present in each source file. Please see IBM Knowledge Center, which says, "The setting specified by the #pragma margins directive applies only to the source file or include file in which it is found. It has no effect on other include files."

How is the output 47?

#include<stdio.h>
#include<conio.h>
#define FIRST_PART 7
#define LAST_PART 5
#define ALL_PARTS FIRST_PART+LAST_PART
int main()
{
printf ("The Square root of all parts is %d", ALL_PARTS * ALL_PARTS) ;
getch();
return(0);
}
In the above code the FIRST_PART is defined as 7
LAST_PART is defined as 5
and ALL_PARTS is initialized as FIRST_PART+LAST_PART (which is ideally 12)
but when i am printing ALL_PARTS * ALL_PARTS is giving me 47 as the output!(But i thought answer would be 144)
Please can anyone explain me how ?
The answer should be 47
FIRST_PART + LAST_PART * FIRST_PART + LAST_PART
MULTIPLICATION HAS MORE PRECEDENCE
SO 7 + 5 * 7 + 5
7 + 35 + 5
47

iOS Neon assembler sample questions

Just trying http://api.madewithmarmalade.com/ExampleArmASM.html and using iOS; the program run if I comment out the loop and the res is printed as 28. But if not comment it out, it will abend without printing the res.
Any hint why and how to fix it.
Thanks in advance.
My code is as follows:
#include <stdio.h>
#include <stdlib.h>
#define ARRAY_SIZE 512
#if defined __arm__ && defined __ARM_NEON__
static int computeSumNeon(const int a[])
{
// Computes the sum of all elements in the input array
int res = 0;
asm(".align 4 \n\t" //dennis warning avoiding
"vmov.i32 q8, #0 \n\t" //clear our accumulator register
"mov r3, #512 \n\t" //Loop condition n = ARRAY_SIZE
// ".loop1: \n\t" // No loop add 0-7 works as 28
"vld1.32 {d0, d1, d2, d3}, [%[input]]! \n\t" //load 8 elements into d0, d1, d2, d3 = q0, q1
"pld [%[input]] \n\t" // preload next set of elements
"vadd.i32 q8, q0, q8 \n\t" // q8 += q0
"vadd.i32 q8, q1, q8 \n\t" // q8 += q1
"subs r3, r3, #8 \n\t" // n -= 8
// "bne .loop1 \n\t" // n == 0?
"vpadd.i32 d0, d16, d17 \n\t" // d0[0] = d16[0] + d16[1], d0[1] = d17[0] + d17[1]
"vpaddl.u32 d0, d0 \n\t" // d0[0] = d0[0] + d0[1]
"vmov.32 %[result], d0[0] \n\t"
: [result] "=r" (res) , [input] "+r" (a)
:
: "q0", "q1", "q8", "r3");
return res;
}
#else
static int computeSumNeon(const int a[])
{
int i, res = 0;
for (i = 0; i < ARRAY_SIZE; i++)
res += a[i];
}
#endif
...
#implementation AppDelegate
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
// Override point for customization after application launch.
//int* inp;
int inp[ARRAY_SIZE];
//posix_memalign((void**)&inp, 64, ARRAY_SIZE*sizeof(int)); // Align to cache line size (64bytes on a cortex A8)
// Initialise the array with consecutive integers.
int i;
for (i = 0; i < ARRAY_SIZE; i++)
{
inp[i] = i;
}
for (i = 0; i < ARRAY_SIZE; i++)
{
printf("%i,", inp[i]);
}
printf("\n\n sum 0-7:%i\n", 0+1+2+3+4+5+6+7);
int res = 0;
res = computeSumNeon(inp);
printf("res NEO :%i\n", res);
// free(inp); // error pointer being free was not allocated !!!
UISplitViewController *splitViewController = (UISplitViewController *)self.window.rootViewController;
UINavigationController *navigationController = [splitViewController.viewControllers lastObject];
navigationController.topViewController.navigationItem.leftBarButtonItem = splitViewController.displayModeButtonItem;
splitViewController.delegate = self;
return YES;
}
- (void)applicationWillResignActive:(UIApplication *)application {
...
==== assembly code generated
.align 1
.code 16 # #computeSumNeon
.thumb_func _computeSumNeon
_computeSumNeon:
Lfunc_begin3:
.loc 18 133 0 is_stmt 1 # ...
.cfi_startproc
# BB#0:
sub sp, #8
movs r1, #0
str r0, [sp, #4]
.loc 18 135 9 prologue_end # ...
Ltmp18:
str r1, [sp]
.loc 18 136 5 # ...
ldr r0, [sp, #4]
# InlineAsm Start
.align 4
vmov.i32 q8, #0x0
movw r3, #504
.loop1:
vld1.32 {d0, d1, d2, d3}, [r0]!
vadd.i32 q8, q0, q8
vadd.i32 q8, q1, q8
subs r3, #8
bne .loop1
vpadd.i32 d0, d16, d17
vpaddl.u32 d0, d0
vmov.32 r1, d0[0]
# InlineAsm End
str r1, [sp]
str r0, [sp, #4]
.loc 18 155 12 # ...
ldr r0, [sp]
.loc 18 155 5 is_stmt 0 # ...
add sp, #8
bx lr
Ltmp19:
Lfunc_end3:
.cfi_endproc

How CUDA constant memory allocation works?

I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory is 64KB. But when is this memory actually allocated on the device? Is this limit apply to each kernel, cuda context or for the whole application?
Let's say there are several kernels in a .cu file, each using less than 64K constant memory. But the total constant memory usage is more than 64K. Is it possible to call these kernels sequentially? What happens if they are called concurrently using different streams?
What happens if there is a large CUDA dynamic library with lots of kernels each using different amounts of constant memory?
What happens if there are two applications each requiring more than half of the available constant memory? The first application runs fine, but when will the second app fail? At app start, at cudaMemcpyToSymbol() calls or at kernel execution?
Parallel Thread Execution ISA Version 3.1 section 5.1.3 discusses constant banks.
Constant memory is restricted in size, currently limited to 64KB which
can be used to hold statically-sized constant variables. There is an
additional 640KB of constant memory, organized as ten independent 64KB
regions. The driver may allocate and initialize constant buffers in
these regions and pass pointers to the buffers as kernel function
parameters. Since the ten regions are not contiguous, the driver
must ensure that constant buffers are allocated so that each buffer
fits entirely within a 64KB region and does not span a region
boundary.
A simple program can be used to illustrate the use of constant memory.
__constant__ int kd_p1;
__constant__ short kd_p2;
__constant__ char kd_p3;
__constant__ double kd_p4;
__constant__ float kd_floats[8];
__global__ void parameters(int p1, short p2, char p3, double p4, int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = p1;
*pp2 = p2;
*pp3 = p3;
*pp4 = p4;
return;
}
__global__ void constants(int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = kd_p1;
*pp2 = kd_p2;
*pp3 = kd_p3;
*pp4 = kd_p4;
return;
}
Compile this for compute_30, sm_30 and execute cuobjdump -sass <executable or obj> to disassemble you should see
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 32bit
identifier = c:/dev/constant_banks/kernel.cu
code for sm_30
Function : _Z10parametersiscdPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x40001de428004005*/ MOV R0, c [0x0] [0x150]; // pp1
/*0018*/ /*0x50009de428004005*/ MOV R2, c [0x0] [0x154]; // pp2
/*0020*/ /*0x0001dde428004005*/ MOV R7, c [0x0] [0x140]; // p1
/*0028*/ /*0x13f0dc4614000005*/ LDC.U16 R3, c [0x0] [0x144]; // p2
/*0030*/ /*0x60011de428004005*/ MOV R4, c [0x0] [0x158]; // pp3
/*0038*/ /*0x70019de428004005*/ MOV R6, c [0x0] [0x15c]; // pp4
/*0048*/ /*0x20021de428004005*/ MOV R8, c [0x0] [0x148]; // p4
/*0050*/ /*0x30025de428004005*/ MOV R9, c [0x0] [0x14c]; // p4
/*0058*/ /*0x1bf15c0614000005*/ LDC.U8 R5, c [0x0] [0x146]; // p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7; // *pp1 = p1
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3; // *pp2 = p2
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5; // *pp3 = p3
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8; // *pp4 = p4
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
...........................................
Function : _Z9constantsPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x00001de428004005*/ MOV R0, c [0x0] [0x140]; // p1
/*0018*/ /*0x10009de428004005*/ MOV R2, c [0x0] [0x144]; // p2
/*0020*/ /*0x0001dde428004c00*/ MOV R7, c [0x3] [0x0]; // kd_p1
/*0028*/ /*0x13f0dc4614000c00*/ LDC.U16 R3, c [0x3] [0x4]; // kd_p2
/*0030*/ /*0x20011de428004005*/ MOV R4, c [0x0] [0x148]; // p3
/*0038*/ /*0x30019de428004005*/ MOV R6, c [0x0] [0x14c]; // p4
/*0048*/ /*0x20021de428004c00*/ MOV R8, c [0x3] [0x8]; // kd_p4
/*0050*/ /*0x30025de428004c00*/ MOV R9, c [0x3] [0xc]; // kd_p4
/*0058*/ /*0x1bf15c0614000c00*/ LDC.U8 R5, c [0x3] [0x6]; // kd_p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7;
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3;
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5;
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8;
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
.....................................
I annotated to the right of the SASS.
On sm30 you can see that parameters are passed in constant bank 0 starting at offset 0x140.
User defined __constant__ variables are defined in constant bank 3.
If you execute cuobjdump --dump-elf <executable or obj> you can find other interesting constant information.
32bit elf: abi=6, sm=30, flags = 0x1e011e
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 34 142 0 1 STRTAB 0 0 0 .shstrtab
2 176 19b 0 1 STRTAB 0 0 0 .strtab
3 314 d0 10 4 SYMTAB 0 2 a .symtab
4 3e4 50 0 4 CUDA_INFO 0 3 b .nv.info._Z9constantsPiPsPcPd
5 434 30 0 4 CUDA_INFO 0 3 0 .nv.info
6 464 90 0 4 CUDA_INFO 0 3 a .nv.info._Z10parametersiscdPiPsPcPd
7 4f4 160 0 4 PROGBITS 2 0 a .nv.constant0._Z10parametersiscdPiPsPcPd
8 654 150 0 4 PROGBITS 2 0 b .nv.constant0._Z9constantsPiPsPcPd
9 7a8 30 0 8 PROGBITS 2 0 0 .nv.constant3
a 7d8 c0 0 4 PROGBITS 6 3 a00000b .text._Z10parametersiscdPiPsPcPd
b 898 c0 0 4 PROGBITS 6 3 a00000c .text._Z9constantsPiPsPcPd
.section .strtab
.section .shstrtab
.section .symtab
index value size info other shndx name
0 0 0 0 0 0 (null)
1 0 0 3 0 a .text._Z10parametersiscdPiPsPcPd
2 0 0 3 0 7 .nv.constant0._Z10parametersiscdPiPsPcPd
3 0 0 3 0 b .text._Z9constantsPiPsPcPd
4 0 0 3 0 8 .nv.constant0._Z9constantsPiPsPcPd
5 0 0 3 0 9 .nv.constant3
6 0 4 1 0 9 kd_p1
7 4 2 1 0 9 kd_p2
8 6 1 1 0 9 kd_p3
9 8 8 1 0 9 kd_p4
10 16 32 1 0 9 kd_floats
11 0 192 12 10 a _Z10parametersiscdPiPsPcPd
12 0 192 12 10 b _Z9constantsPiPsPcPd
The kernel parameter constant bank is versioned per launch so that concurrent kernels can be executed. The compiler and user constants are per CUmodule. It is the responsibility of the developer to manage coherency of this data. For example, the developer has to ensure that a cudaMemcpyToSymbol is update in a safe manner.

Resources