FLEX scanner- unable to return the correct token number - flex-lexer

%option noyywrap
%{
#include<stdlib.h>
#define INTEGER 1
#define PLUS 2
struct data
{
int value;
}dataObj;
%}
%%
[0-9]+ dataObj.value=atoi(yytext);return INTEGER;
[+] return PLUS;
%%
int main()
{
int ret_value;
while(ret_value = yylex() !=0)
printf("value:%d \t token type:%d\n",dataObj.value,ret_value);
}
if i use expression 3+5 it is giving value of 3 and 5 as 1 which is correct but it is giving the value of [+] as 1 rather it should be 2. i am using flex version 2.5.4

As you explained, but not very clearly, in your comment, the fault is in the while loop. If you run your code as originally posted, using the debug enabled in flex, the following output is produced:
--(end of buffer or a NUL)
3+5
--accepting rule at line 13 ("3")
value:3 token type:1
--accepting rule at line 14 ("+")
value:3 token type:1
--accepting rule at line 13 ("5")
value:5 token type:1
--accepting default rule ("
")
--(end of buffer or a NUL)
--EOF (start condition 0)
If however, the main program is changed to this:
int main()
{
int ret_value;
while((ret_value = yylex()) !=0)
printf("value:%d \t token type:%d\n",dataObj.value,ret_value);
}
The following output is generated, which is correct:
--(end of buffer or a NUL)
3+5
--accepting rule at line 13 ("3")
value:3 token type:1
--accepting rule at line 14 ("+")
value:3 token type:2
--accepting rule at line 13 ("5")
value:5 token type:1
--(end of buffer or a NUL)
Your fault, as you surmised is a typo in the main program.

Related

How do I go about making this work with a #include? It works fine when dropped straight into the code

I have a block of code that I want to #include in my z/OS Metal C program, it works fine when it's just part of the program, but when I put it into a .h file and #include it, the code won't compile.
I have successfully gotten this code to work without #include. I'm sure I'm overlooking something having to do with #include...
This code works:
#pragma margins(2,72)
*#if 0!=0
Test DSECT
Test# DS A
TestINT DS F
TestChar DS C
.ago end
*#endif
*struct Test {
* void *Test1;
* int TestInt;
* char TestChar;
*};
*#if 0!=0
.end
MEND
*#endif
#pragma nomargins
Giving compiler output that looks like this:
207 |#pragma margins(2,72)
207 +
208 |#if 0!=0
214 |#endif
215 |struct Test {
216 | void *Test1;
5650ZOS V2.1.1 z/OS XL C 'SSAF.METALC.C(CKKTHING)'
* * * * * S O U R C E * * * * *
LINE STMT
*...+....1....+....2....+....3....+....4....+....5....+....6....+
217 | int TestInt;
218 | char TestChar;
219 |};
220 |#if 0!=0
223 |#endif
224 |#pragma nomargins
But, when I put the code into an #include file like this:
EDIT SSAF.METALC.H(CKKTEST)
Command ===>
****** **************************
000001 *#if 0!=0
000002 Test DSECT
000003 Test# DS A
000004 TestINT DS F
000005 TestChar DS C
000006 .ago end
000007 *#endif
000008 *struct Test {
000009 * void *Test1;
000010 * int TestInt;
000011 * char TestChar;
000012 *};
000013 *#if 0!=0
000014 .end
000015 MEND
000016 *#endif
****** **************************
and include it in my Metal C program:
EDIT SSAF.METALC.C(CKLTHING) - 01.00
Command ===>
000205 #include"ckkprolg.h"
000206
000207 #pragma margins(2,72)
000208 #include"ckktest.h"
000209 #pragma nomargins
I get a bunch of error messages:
205 |#include"ckkprolg.h" /* Include assembler macros needed
206 | for Metal C prolog and epilog */
207 |#pragma margins(2,72)
207 +
208 |#include"ckktest.h"
*=ERROR===========> CCN3275 Unexpected text 'struct' encountered.
*=ERROR===========> CCN3166 Definition of function Test requires parentheses.
*=ERROR===========> CCN3275 Unexpected text 'void' encountered.
5650ZOS V2.1.1 z/OS XL C 'SSAF.METALC.C(CKLTHING)' 10/04/2019
* * * * * S O U R C E * * * * *
LINE STMT
*...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8....+....9...
*=ERROR===========> CCN3045 Undeclared identifier Test1.
*=ERROR===========> CCN3275 Unexpected text 'int' encountered.
*=ERROR===========> CCN3045 Undeclared identifier TestInt.
*=ERROR===========> CCN3275 Unexpected text 'char' encountered.
*=ERROR===========> CCN3045 Undeclared identifier TestChar.
*=ERROR===========> CCN3046 Syntax error.
*=ERROR===========> CCN3273 Missing type in declaration of theESTAEXStatic.
209 |#pragma nomargins
The include file is missing #pragma margins. Since it is a file level directive, it needs to be present in each source file. Please see IBM Knowledge Center, which says, "The setting specified by the #pragma margins directive applies only to the source file or include file in which it is found. It has no effect on other include files."

Bison won't return correct tokens [duplicate]

flex code:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include "stdio.h"
4 #include "tp.tab.h"
5 %}
6
7 %%
8 "{" {return '{';}
9 "}" {return '}';}
10 ";" {return ';';}
11 "create" {return CREATE;}
12 "cmd" {return CMD;}
13 "int" {yylval.intval = 20;return INT;}
14 [a-zA-Z]+ {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
15 [ \t\n]
16 <<EOF>> {return 0;}
17 . {printf("mistery char\n");}
18
bison code:
1 %{
2 #include "stdlib.h"
3 #include "stdio.h"
4 #include "stdarg.h"
5 void yyerror(char *s, ...);
6 #define YYDEBUG 1
7 int yydebug = 1;
8 %}
9
10 %union{
11 char *strval;
12 int intval;
13 }
14
15 %token <strval> ID
16 %token <intval> INT
17 %token CREATE
18 %token CMD
19
20 %type <strval> col_definition
21 %type <intval> create_type
22 %start stmt_list
23
24 %%
25 stmt_list:stmt ';'
26 | stmt_list stmt ';'
27 ;
28
29 stmt:create_cmd_stmt {/*printf("create cmd\n");*/}
30 ;
31
32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}' {printf("%s\n" , $3);}
33 ;
34 create_col_list:col_definition
35 | create_col_list col_definition
36 ;
37
38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
39 ;
40
41 create_type:INT {$$ = $1;}
42 ;
43
44 %%
45 extern FILE *yyin;
46
47 void
48 yyerror(char *s, ...)
49 {
50 extern yylineno;
51 va_list ap;
52 va_start(ap, s);
53 fprintf(stderr, "%d: error: ", yylineno);
54 vfprintf(stderr, s, ap);
55 fprintf(stderr, "\n");
56 }
57
58 int main(int argc , char *argv[])
59 {
60 yyin = fopen(argv[1] , "r");
61 if(!yyin){
62 printf("open file %s failed\n" ,argv[1]);
63 return -1;
64 }
65
66 if(!yyparse()){
67 printf("parse work!\n");
68 }else{
69 printf("parse failed!\n");
70 }
71
72 fclose(yyin);
73 return 0;
74 }
75
test input file:
create cmd keeplive
{
int a;
int b;
};
test output:
root#VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
int a;
int b;
}
parse work!
I have two questions:
1) Why does the action at line 38 print the token ';'? For instance, "20 , a;" and "20 , b;"
2) Why does the action at line 32 print "keeplive
{
int a;
int b;
}" instead of simply "keeplive"?
Short answer:
yylval.strval = yytext;
You can't use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.
Longer answer:
yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.
So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

Why is $1 matching the whole grouping and not its token? [duplicate]

flex code:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include "stdio.h"
4 #include "tp.tab.h"
5 %}
6
7 %%
8 "{" {return '{';}
9 "}" {return '}';}
10 ";" {return ';';}
11 "create" {return CREATE;}
12 "cmd" {return CMD;}
13 "int" {yylval.intval = 20;return INT;}
14 [a-zA-Z]+ {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
15 [ \t\n]
16 <<EOF>> {return 0;}
17 . {printf("mistery char\n");}
18
bison code:
1 %{
2 #include "stdlib.h"
3 #include "stdio.h"
4 #include "stdarg.h"
5 void yyerror(char *s, ...);
6 #define YYDEBUG 1
7 int yydebug = 1;
8 %}
9
10 %union{
11 char *strval;
12 int intval;
13 }
14
15 %token <strval> ID
16 %token <intval> INT
17 %token CREATE
18 %token CMD
19
20 %type <strval> col_definition
21 %type <intval> create_type
22 %start stmt_list
23
24 %%
25 stmt_list:stmt ';'
26 | stmt_list stmt ';'
27 ;
28
29 stmt:create_cmd_stmt {/*printf("create cmd\n");*/}
30 ;
31
32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}' {printf("%s\n" , $3);}
33 ;
34 create_col_list:col_definition
35 | create_col_list col_definition
36 ;
37
38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
39 ;
40
41 create_type:INT {$$ = $1;}
42 ;
43
44 %%
45 extern FILE *yyin;
46
47 void
48 yyerror(char *s, ...)
49 {
50 extern yylineno;
51 va_list ap;
52 va_start(ap, s);
53 fprintf(stderr, "%d: error: ", yylineno);
54 vfprintf(stderr, s, ap);
55 fprintf(stderr, "\n");
56 }
57
58 int main(int argc , char *argv[])
59 {
60 yyin = fopen(argv[1] , "r");
61 if(!yyin){
62 printf("open file %s failed\n" ,argv[1]);
63 return -1;
64 }
65
66 if(!yyparse()){
67 printf("parse work!\n");
68 }else{
69 printf("parse failed!\n");
70 }
71
72 fclose(yyin);
73 return 0;
74 }
75
test input file:
create cmd keeplive
{
int a;
int b;
};
test output:
root#VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
int a;
int b;
}
parse work!
I have two questions:
1) Why does the action at line 38 print the token ';'? For instance, "20 , a;" and "20 , b;"
2) Why does the action at line 32 print "keeplive
{
int a;
int b;
}" instead of simply "keeplive"?
Short answer:
yylval.strval = yytext;
You can't use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.
Longer answer:
yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.
So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

ctags: To get C function end line number

is it possible via ctags to get the function end line number as well
"ctags -x --c-kinds=f filename.c"
Above command lists the function definition start line-numbers. Wanted a way to get the function end line numbers.
Other Approaches:
awk 'NR > first && /^}$/ { print NR; exit }' first=$FIRST_LINE filename.c
This needs the code to be properly formatted
Example:
filename.c
1 #include<stdio.h>
2 #include<stdlib.h>
3 int main()
4 {
5 const char *name;
6
7 int a=0
8 printf("name");
9 printf("sssss: %s",name);
10
11 return 0;
12 }
13
14 void code()
15 {
16 printf("Code \n");
17 }
18
19 int code2()
20 {
21 printf("code2 \n");
22 return 1
23 }
24
Input: filename and the function start line no.
Example:
Input: filename.c 3
Output: 12
Input : filename.c 19
Output : 23
Is there any better/simple way of doing this ?
C/C++ parser of Universal-ctags(https://ctags.io) has end: field.
jet#localhost tmp]$ cat -n foo.c
1 int
2 main( void )
3 {
4
5 }
6
7 int
8 bar (void)
9 {
10
11 }
12
13 struct x {
14 int y;
15 };
16
[jet#localhost tmp]$ ~/var/ctags/ctags --fields=+ne -o - --sort=no foo.c
main foo.c /^main( void )$/;" f line:2 typeref:typename:int end:5
bar foo.c /^bar (void)$/;" f line:8 typeref:typename:int end:11
x foo.c /^struct x {$/;" s line:13 file: end:15
y foo.c /^ int y;$/;" m line:14 struct:x typeref:typename:int file:
awk to the rescue!
doesn't handle curly braces within comments but should handle blocks within functions, please give it a try...
$ awk -v s=3 'NR>=s && /{/ {c++}
NR>=s && /}/ && c && !--c {print NR; exit}' file
finds the matching brace for the first one after the specified start line number s.

decodingTCAP message - dialoguePortion

I'm writing an simulator (for learning purposes) for complete M3UA-SCCP-TCAP-MAP stack (over SCTP). So far M3UA+SCCP stacks are OK.
M3UA Based on the RFC 4666 Sept 2006
SCCP Based on the ITU-T Q.711-Q716
TCAP Based on the ITU-T Q.771-Q775
But upon decoding TCAP part I got lost on dialoguePortion.
TCAP is asn.1 encoded, so everything is tag+len+data.
Wireshark decode it differently than my decoder.
Message is:
62434804102f00676b1e281c060700118605010101a011600f80020780a1090607040000010005036c1ba1190201010201163011800590896734f283010086059062859107
Basically, my message is BER-decoded as
Note: Format: hex(tag) + (BER splitted to CLS+PC+TAG in decimal) + hex(data)
62 ( 64 32 2 )
48 ( 64 0 8 ) 102f0067
6b ( 64 32 11 )
28 ( 0 32 8 )
06 ( 0 0 6 ) 00118605010101 OID=0.0.17.773.1.1.1
a0 ( 128 32 0 )
60 ( 64 32 0 )
80 ( 128 0 0 ) 0780
a1 ( 128 32 1 )
06 ( 0 0 6 ) 04000001000503 OID=0.4.0.0.1.0.5.3
6c ( 64 32 12 )
...
So I can see begin[2] message containing otid[8], dialogPortion[11] and componentPortion[12].
otid and ComponentPortion are decoded correctly. But not dialogPortion.
ASN for dialogPortion does not mention any of these codes.
Even more confusing, wireshark decode it differently (oid-as-dialogue is NOT in the dialoguePortion, but as a field after otid, which is NOT as described in ITU-T documentation - or not as I'm understanding it)
Wireshark decoded Transaction Capabilities Application Part
begin
Source Transaction ID
otid: 102f0067
oid: 0.0.17.773.1.1.1 (id-as-dialogue)
dialogueRequest
Padding: 7
protocol-version: 80 (version1)
1... .... = version1: True
application-context-name: 0.4.0.0.1.0.5.3 (locationInfoRetrievalContext-v3)
components: 1 item
...
I can't find any reference for Padding in dialoguePDU ASN.
Can someone point me in the right direction?
I would like to know how to properly decode this message
DialoguePDU format should be simple in this case:
dialogue-as-id OBJECT IDENTIFIER ::= {itu-t recommendation q 773 as(1) dialogue-as(1) version1(1)}
DialoguePDU ::= CHOICE {
dialogueRequest AARQ-apdu,
dialogueResponse AARE-apdu,
dialogueAbort ABRT-apdu
}
AARQ-apdu ::= [APPLICATION 0] IMPLICIT SEQUENCE {
protocol-version [0] IMPLICIT BIT STRING {version1(0)} DEFAULT {version1},
application-context-name [1] OBJECT IDENTIFIER,
user-information [30] IMPLICIT SEQUENCE OF EXTERNAL OPTIONAL
}
Wireshark is still wrong :-). But then... that is display. It displays values correctly - only in the wrong section. Probably some reason due to easier decoding.
What I was missing was definition of EXTERNAL[8]. DialoguePortion is declared as EXTERNAL...so now everything makes sense.
For your message, my very own decoder says:
begin [APPLICATION 2] (x67)
otid [APPLICATION 8] (x4) =102f0067h
dialoguePortion [APPLICATION 11] (x30)
EXTERNAL (x28)
direct-reference [OBJECT IDENTIFIER] (x7) =00118605010101h
encoding:single-ASN1-type [0] (x17)
dialogueRequest [APPLICATION 0] (x15)
protocol-version [0] (x2) = 80 {version1 (0) } spare bits= 7
application-context-name [1] (x9)
OBJECT IDENTIFIER (x7) =04000001000503h
components [APPLICATION 12] (x27)
invoke [1] (x25)
invokeID [INTEGER] (x1) =1d (01h)
operationCode [INTEGER] (x1) = (22) SendRoutingInfo
parameter [SEQUENCE] (x17)
msisdn [0] (x5) = 90896734f2h
Nature of Address: international number (1)
Numbering Plan Indicator: unknown (0)
signal= 9876432
interrogationType [3] (x1) = (0) basicCall
gmsc-Address [6] (x5) = 9062859107h
Nature of Address: international number (1)
Numbering Plan Indicator: unknown (0)
signal= 26581970
Now, wireshark's padding 7 and my spare bits=7 both refer to the the protocol-version field, defined in Q.773 as:
AARQ-apdu ::= [APPLICATION 0] IMPLICIT SEQUENCE {
protocol-version [0] IMPLICIT BIT STRING { version1 (0) }
DEFAULT { version1 },
application-context-name [1] OBJECT IDENTIFIER,
user-information [30] IMPLICIT SEQUENCE OF EXTERNAL
OPTIONAL }
the BIT STRING definition, assigns name to just the leading bit (version1)... the rest (7 bits) are not given a name and wireshark consider them as padding

Resources