Output produced for the given input using the bottom up parsing - parsing

I tried solving this question and the answer comes out to be option c. But in few textbooks answer given is option b. I am confused what would be the correct answer? Plz help me out!

GAAAAT is the correct answer; it is the output produced by a parser which honours the order of the actions in the translation rules (some of which occur in mid-rule).
Yacc/bison is one such parser, which makes it very easy to verify:
%{
#include <ctype.h>
#include <stdio.h>
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
}
int yylex(void) {
int ch;
while ((ch = getchar()) != EOF) {
if (isalnum(ch)) return ch;
}
return 0;
}
%}
%%
S: 'p' { putchar('G'); } P
P: 'q' { putchar('A'); } Q
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' { putchar('A'); } P
Q: %empty { putchar('O'); }
%%
int main(void) {
yyparse();
putchar('\n');
}
$ bison -o gate.c gate.y
$ gcc -std=c99 -Wall -o gate gate.c
$ ./gate<<<pqsqsr
GAAAAT
If we modify the grammar to put all of the actions at the end of their respective rule, we obtain answer (b). (Other than the grammar, everything is the same as the previous example, so I'm only showing the new translation rules.)
S: 'p' P { putchar('G'); }
P: 'q' Q { putchar('A'); }
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' P { putchar('A'); }
Q: %empty { putchar('O'); }
$ bison -o gate_no_mra.c gate_no_mra.y
$ gcc -std=c99 -Wall -o gate_no_mra gate_no_mra.c
$ ./gate_no_mra<<<pqsqsr
TAAAAG

Related

Flex: match if preceded by a character/pattern

How to match a pattern R only if it is preceded by another pattern S without reading S (to give S-matched input back to lex) ?
file.l :
%%
\\foo {
yytext++; // To remove the starting backslash
printf("%s\n", yytext);
}
\\ printf("backslash!\n");
.
%%
int main() {
yylex();
}
In the above example i want to accept foo only when it is preceded by a backslash \.
But in my current implementation i am eating the \ which is matched below.
To check run as :
lex file.l
gcc -lfl lex.yy.c
./a.out
Edit 1:
I tried using unput as suggested by #rici. But as i implemented there are strings having both patterns which do not detect both patterns.
Like bar\foo
%%
\\foo {
yytext++; // To remove the starting backslash
printf("foo\n");
unput('\\');
}
bar\\ printf("bar\n");
.
%%
int main() {
yylex();
}
Edit 2:
Got the answer here. It uses flex start condition.
Edit 3:
%x backslash
%%
<backslash>foo {
printf("foo\n");
BEGIN(INITIAL);
}
bar\\ {
BEGIN(backslash);
printf("bar\n");
}
.
%%
int main() {
yylex();
}

Difference "Flex and Bison" code in windows and linux

I am currently working through sample code from the O'Reilly press book entitled "Flex and Bison". I am using the GNU C compiler for Windows with Flex and Bison binary install for Windows which is launched using gcc rather than the Linux cc command.
The problem is that the code if copied directly from the book does not compile and I have had to hack it a bit to get it to work.
Example from book
Example 1-1. Word count fb1-1.l
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
I compiled the code in the flowing way using Windows command line:
flex file.l
gcc lex.yy.c -o a.exe
It crashed stating yywrap() was not found. I added that and then it worked but did not complete the printf in the main function as it just hung waiting for more input!
Here is my solution that works but feels like a hack and that I am not in full understanding of the process.
/* just like Unix wc */
%{
*#include <string.h>*
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
*"." { return ;}*
. { chars++; }
%%
*int yywrap(void)
{
return 1;
}*
int main(void)
{
yylex();
printf("num lines is %8d, num words is %8d, num chars is %8d\n", lines, words, chars);
return 0;
}
I had to add a new rule to return out of yylex() which was not in the book, add yywrap()- not really knowing why and add string.h which was not present!. My main question is are there significant differences between flex for Windows and Unix and is it possible to run the original code with my gcc compiler and gnu flex without the said hacks?
I do not understand what have you achieved with this:
#include <string.h>
"." { return; }
But what I know for sure is that if you are running FLEX without specified input file you have to mark the end of input. Otherwise FLEX will wait for input. What I would suggest:
%{
#include <stdio.h>
int chars = 0;
int words = 0;
int lines = 0;
%}
WORD [a-zA-Z]+
%%
{WORD} {
words++;
chars += strlen(yytext);
}
\n {
lines++;
/* chars++; why this? there was no columns here - it's a new line */
}
\s {
/* count the spaces */
chars++;
}
\t {
/* count the tabs */
chars += 4 /* or 8 */;
}
. {
printf("Error (unknown symbol):\t%c\n", yytext[0]);
chars++;
}
%%
int main()
{
/* iterate until end of input and even if errors - continue */
while(yylex()){ }
printf("lines:\t%8d\nwords:\t%8d\nchars:\t%8d\n", lines, words, chars);
return 0;
}
Build with:
flex input.l
output will be lex.yy.c
Then build:
gcc -o scanner.exe lex.yy.c -lfl
Create a txt file with input. Run following:
scanner.exe <in.txt>out.txt
Less sign means redirect input from file in.txt while greater sign means redirect output to out.txt Cause file has EOF at the end of file FLEX will properly stop.

Flex (lexical analyzer) not recognizing or operator

I have a problem with flex. It doesn't recognize the or operator in this rule:
[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
If I split it into three rules then it is recognized:
[0-9A-Za-z]+{CORRECT} {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
To explain myself better the pattern I am trying to recognize is:
CORRECT [1-9]*_[1-9]*0
And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.
Full flex code:
%option noyywrap
%{
#include <stdio.h>
int num_lines=1;
%}
CORRECT [1-9]*_[1-9]*0
%%
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }
[0-9A-Za-z]+{CORRECT} { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext); }
"\n" { num_lines++; }
" "
"\t"
"\r"
. { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
%%
int main(int argc,char **argv)
{
++argv,--argc;
if(argc>0)
yyin=fopen(argv[0],"r");
else
yyin=stdin;
yylex();
}
Whitespace is significant in a lex pattern. a | b is not the same as a|b. In the troublesome pattern, you have whitespace that I don't think you intended.
That said, in my opinion, your 3-pattern solution is easier to read and maintain.

How to present error when #define is incomplete in Flex?

I'm trying to present to the screen an error output when the user enters an incomplete
define , e.g :
#define A // this is wrong , error message should appear
#define A 5 // this is the correct form , no error message would be presented
but it doesn't work , here's the code :
%{
#include <stdio.h>
#include <string.h>
%}
%s FULLCOMMENT
%s LINECOMMENT
%s DEFINE
%s INCLUDE
%s PRINT
%s PSEUDO_C_CODE
STRING [^ \n]*
%%
<FULLCOMMENT>"*/" { BEGIN INITIAL; }
<FULLCOMMENT>. { /* Do Nothing */ }
<INCLUDE>"<stdio.h>"|"<stdlib.h>"|"<string.h>" { BEGIN INITIAL; }
<INCLUDE>. { printf("error\n"); return 0 ; }
<DEFINE>[ \t] { printf("eat a space within define\n"); }
<DEFINE>{STRING} { printf("eat string %s\n" , yytext);}
<DEFINE>\n { printf("eat a break line within define\n"); BEGIN INITIAL; }
"#include" { BEGIN INCLUDE; }
"#define" { printf("you gonna to define\n"); BEGIN DEFINE; }
"#define"+. { printf("error\n"); }
%%
int yywrap(void) { return 1; } // Callback at end of file
int main(void) {
yylex();
return 0 ;
}
Where did I go wrong ?
Here is the output :
a#ubuntu:~/Desktop$ ./a.out
#define A 4
error
A 4
#define A
error
A
The rule "#define"+. is longer and gets precedence over the earlier #define even for correct input. You could say just this :-
"#define"[:space:]*$ {printf("error\n"); }
Consider using the -dvT options with flex to get detailed debug output. Also I am not sure if you need such extensive use of states for anything except maybe comments. But you would know better.

How does a Lex & Yacc parser output values?

So for a project that I'm working on, I am using Lex and Yacc to parse a FTP configuration file. The configuration files look something like this:
global {
num_daemons = 10
etc = /etc/ftpd
};
host "ftp-1.foobar.com" {
ftproot = /var/ftp/server1
max_out_bandwidth = 20.7
};
host "ftp-2.foobar.com" {
ftproot = /var/ftp/server2
exclude = /var/ftp/server2/private
};
host "ftp-3.foobar.com" {
ftproot = /var/ftp/server3
};
Now, my question is, how do I obtain this information in a usable way? Let's say I wanted to put things like the address after the host token into a struct. How would I do that? Also, how would I simply print out the values that I've parsed to the command line? Also, to run it, do I just cat the config file and pipe in the compiled c program? Thanks in advance for any help!
Here is my code:
%{
// tokens.l
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
int yyparse();
%}
%option noyywrap
%x OPTION
%x OPTID
%%
<INITIAL>global { return GLOBAL; }
<INITIAL>host { return HOST; }
<INITIAL>"[a-zA-z1-9./-]+" { return NAME; }
<INITIAL>\{ { return CURLY_OPEN; BEGIN OPTION; }
<INITIAL>\n { return EOLN; }
<INITIAL><<EOF>> { return EOFTOK; }
<OPTION>[a-zA-z1-9./-_]+ { return ID_NAME; BEGIN OPTID; }
<OPTION>[\t] {}
<OPTION>[\};] { return OPTION_CLOSE; BEGIN INITIAL;}
<OPTID>[a-zA-z1-9./-]+ { return ID_STRING; BEGIN OPTION; }
<OPTID>[0-9.]+ { return ID_NUM; BEGIN OPTION; }
<OPTID>[\n] { return EOLN; }
%%
int main(int argc, char **argv) {
// Where I am confused..
}
and my yacc file:
%{
// parse.y
#include <stdio.h>
#include <stdlib.h>
int yyerror(char *);
int yylex(void);
%}
%token ERROR EOLN EOFTOK
%token OPTION_CLOSE GLOBAL HOST NAME ID_NAME ID_STRING ID_NUM CURLY_OPEN
%%
input
: lines EOFTOK { YYACCEPT; }
;
lines
:
| lines line
;
line
: option
| opident
| OPTION_CLOSE
;
option
: GLOBAL CURLY_OPEN
| HOST NAME CURLY_OPEN
;
opident
: ID_NAME '=' ID_STRING
| ID_NAME '=' ID_NUM
;
%%
int yyerror(char *msg) {}
You would generally have variables which were accessible and set up before calling the parser, like a linked list of key/value pairs:
typedef struct sNode {
char *key;
char *val;
struct sNode *next;
} tNode;
tNode *lookupHead = NULL;
Then, in your Yacc code, something like:
opident
: ID_NAME '=' ID_STRING { addLookupStr (lookupHead, $1, $3); }
| ID_NAME '=' ID_NUM { other function call here }
;
This would basically execute that code as the rules are found (replacing the $ variables with the item in the rule, $1 is the value for the ID_NAME token, $2 is the =, and so on).
The function would be something like:
void addLookupStr (char *newkey, char *newval) {
// Check for duplicate keys, then attempt to add. All premature returns
// should also be logging errors and setting error flags as needed.
tNode *curr = lookupHead;
while (curr != NULL) {
if (strcmp (curr->key, newkey) == 0)
return;
curr = curr->next;
}
if ((curr = malloc (sizeof (tNode))) == NULL)
return;
if ((curr->key = strdup (newkey)) == NULL) {
free (curr);
return;
}
if ((curr->val = strdup (newval)) == NULL) {
free (curr->newkey);
free (curr);
return;
}
// All possibly-failing ops complete, insert at head of list.
curr->next = lookupHead;
lookupHead = curr;
}

Resources