I am trying to run a lex program to find number of vowels and consonants in a given string. The program accepts the input but not giving the output. After giving the input , the program accepts the input but not giving the output and also it is not terminating.
Please help me in finding out the mistake.
I am also attaching code here
%{
#include<stdio.h>
int vowel=0;
int consonant=0;
%}
%%
[aeiouAEIOU] {vowel++ ;}
[a-zA-Z] {consonant++ ; }
%%
int yywrap()
{
return 1;
}
int main()
{
printf("Enter the string :");
yylex();
printf("Number of vowels are: %d\n",vowel);
printf("Number of consonants are: %d\n",consonant);
return 0;
}
Finally , I have got the output with a little bit of addition in code.
The updated code is as follows:
%{
#include<stdio.h>
int vowel=0;
int consonant=0;
%}
%%
[aeiouAEIOU] {vowel++ ;}
[a-zA-Z] {consonant++ ; }
"\n" return 0;
%%
int yywrap()
{
return 1;
}
int main()
{
printf("Enter the string :");
yylex();
printf("Number of vowels are: %d\n",vowel);
printf("Number of consonants are: %d\n",consonant);
return 0;
}
Hello I'm trying to do a parser for matrix operations.
The lex part is
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}
%%
[0-9]+ {sscanf (yytext, "%d", &yylval); return (NUM);}
[\+\-*^] {return (OPERATOR);}
[ \n\t] {return (yytext[0]);}
. ;
%%
And the yacc part is
%{
#include <stdio.h>
#include <stdlib.h>
#include "lex.yy.c"
int rows, cols;
int matrix[100];
int k = 0;
%}
%token OPERATOR NUM
%%
matrix: dimensions array { displayMatrix(rows, cols, matrix);
}
;
dimensions: NUM NUM {
rows = $1;
cols = $2;
}
;
array: '[' array
| NUM array {
matrix[k++] = $1;
}
| ']'
;
%%
void displayMatrix(int rows, int cols, int array)
{
printf("\nThe matrix introduced:\n");
int k = 0;
for(int i = 0; i < rows; i++)
{
for(int j = 0; j < cols; j++)
printf("%d "), matrix[k++];
printf("\n");
}
}
void yyerror()
{
printf("Invalid input");
}
int main()
{
printf("Introduce the first matrix:\n");
yyparse();
}
What I'm trying to do is to gave the input like this
no_rows no_cols [ array representing the elements ]
for example 2 2 [1 2 3 4]
I need to store there values because I need to do operations like + or * on matrix. I don't know exactly how to store these values. I tried something but i run my program it says "invalid input" and I don't have any examples of how I can declare variables and store the values I got form Lex. Also I don't know the dimension of my input array and I don't know how to assign every value.
The commands I used
lex matrix.l
yacc matrix.y
gcc y.tab.c -ll -ly
Thank you!
Given the following language described as:
formally: (identifier operator identifier+)*
in plain English: zero or more operations written as an identifier (the lvalue), then an operator, then one or more identifiers (the rvalue)
An example of a sequence of operations in that language would be, given the arbitrary operator #:
A # B C X # Y
Whitespace is not significant and it may also be written more clearly as:
A # B C
X # Y
How would you parse this with a yacc-like LALR parser ?
What I tried so far
I know how to parse explicitly delimited operations, say A # B C ; X # Y but I would like to know if parsing the above input is feasible and how. Hereafter is a (non-functional) minimal example using Flex/Bison.
lex.l:
%{
#include "y.tab.h"
%}
%option noyywrap
%option yylineno
%%
[a-zA-Z][a-zA-Z0-9_]* { return ID; }
# { return OP; }
[ \t\r\n]+ ; /* ignore whitespace */
. { return ERROR; } /* any other character causes parse error */
%%
yacc.y:
%{
#include <stdio.h>
extern int yylineno;
void yyerror(const char *str);
int yylex();
%}
%define parse.lac full
%define parse.error verbose
%token ID OP ERROR
%left OP
%start opdefs
%%
opright:
| opright ID
;
opdef: ID OP ID opright
;
opdefs:
| opdefs opdef
;
%%
void yyerror(const char *str) {
fprintf(stderr, "error#%d: %s\n", yylineno, str);
}
int main(int argc, char *argv[]) {
yyparse();
}
Build with: $ flex lex.l && yacc -d yacc.y --report=all --verbose && gcc lex.yy.c y.tab.c
The issue: I cannot get the parser to not include the next lvalue identifier to the rvalue of the first operation.
$ ./a.out
A # B C X # Y
error#1: syntax error, unexpected OP, expecting $end or ID
The above is always parsed as: reduce(A # B reduce(C X)) # Y
I get the feeling I have to somehow put a condition on the lookahead token that says that if it is the operator, the last identifier should not be shifted and the current stack should be reduced:
A # B C X # Y
^ * // ^: current, *: lookahead
-> reduce 'A # B C' !
-> shift 'X' !
I tried all kind of operator precedence arrangements but cannot get it to work.
I would be willing to accept a solution that does not apply to Bison as well.
A naïve grammar for that language is LALR(2), and bison does not generate LALR(2) parsers.
Any LALR(2) grammar can be mechanically modified to produce an LALR(1) grammar with a compatible parse tree, but I don't know of any automatic tool which does that.
It's possible but annoying to do the transformation by hand, but be aware that you will need to adjust the actions in order to recover the correct parse tree:
%{
typedef struct IdList { char* id; struct IdList* next; };
typedef struct Def { char* lhs; IdList* rhs; };
typedef struct DefList { Def* def; struct DefList* next; };
%}
union {
Def* def;
DefList* defs;
char* id;
}
%type <def> ophead
%type <defs> opdefs
%token <id> ID
%%
prog : opdefs { $1->def->rhs = IdList_reverse($1->def->rhs);
DefList_show(DefList_reverse($1)); }
ophead: ID '#' ID { $$ = Def_new($1);
$$->rhs = IdList_push($$->rhs, $3); }
opdefs: ophead { $$ = DefList_push(NULL, $1); }
| opdefs ID { $1->def->rhs = IdList_push($1->def->rhs, $2); }
| opdefs ophead { $1->def->rhs = IdList_reverse($1->def->rhs);
$$ = DefList_push($1, $2); }
This precise problem is, ironically, part of bison itself, because productions do not require a ; terminator. Bison uses itself to generate a parser, and it solves this problem in the lexer rather than jumping through the loops as outlined above. In the lexer, once an ID is found, the scan continues up to the next non-whitespace character. If that is a :, then the lexer returns an identifier-definition token; otherwise, the non-whitespace character is returned to the input stream, and an ordinary identifier token is returned.
Here's one way of implementing that in the lexer:
%x SEEK_AT
%%
/* See below for explanation, if needed */
static int deferred_eof = 0;
if (deferred_eof) { deferred_eof = 0; return 0; }
[[:alpha:]][[:alnum:]_]* yylval = strdup(yytext); BEGIN(SEEK_AT);
[[:space:]]+ ; /* ignore whitespace */
/* Could be other rules here */
. return *yytext; /* Let the parser handle errors */
<SEEK_AT>{
[[:space:]]+ ; /* ignore whitespace */
"#" BEGIN(INITIAL); return ID_AT;
. BEGIN(INITIAL); yyless(0); return ID;
<EOF> BEGIN(INITIAL); deferred_eof = 1; return ID;
}
In the SEEK_AT start condition, we're only interested in #. If we find one, then the ID was the start of a def, and we return the correct token type. If we find anything else (other than whitespace), we return the character to the input stream using yyless, and return the ID token type. Note that yylval was already set from the initial scan of the ID, so there is no need to worry about it here.
The only complicated bit of the above code is the EOF handling. Once an EOF has been detected, it is not possible to reinsert it into the input stream, neither with yyless nor with unputc. Nor is it legal to let the scanner read the EOF again. So it needs to be fully dealt with. Unfortunately, in the SEEK_AT start condition, fully dealing with EOF requires sending two tokens: first the already detected ID token, and then the 0 which yyparse will recognize as end of input. Without a push-parser, we cannot send two tokens from a single scanner action, so we need to register the fact of having received an EOF, and check for that on the next call to the scanner.
Indented code before the first rule is inserted at the top of the yylex function, so it can declare local variables and do whatever needs to be done before the scan starts. As written, this lexer is not re-entrant, but it is restartable because the persistent state is reset in the if (deferred_eof) action. To make it re-entrant, you'd only need to put deferred_eof in the yystate structure instead of making it a static local.
Following rici's useful comment and answer, here is what I came up with:
lex.l:
%{
#include "y.tab.h"
%}
%option noyywrap
%option yylineno
%%
[a-zA-Z][a-zA-Z0-9_]* { yylval.a = strdup(yytext); return ID; }
# { return OP; }
[ \t\r\n]+ ; /* ignore whitespace */
. { return ERROR; } /* any other character causes parse error */
%%
yacc.y:
%{
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
extern int yylineno;
void yyerror(const char *str);
int yylex();
#define STR_OP " # "
#define STR_SPACE " "
char *concat3(const char *, const char *, const char *);
struct oplist {
char **ops;
size_t capacity, count;
} my_oplist = { NULL, 0, 0 };
int oplist_append(struct oplist *, char *);
void oplist_clear(struct oplist *);
void oplist_dump(struct oplist *);
%}
%union {
char *a;
}
%define parse.lac full
%define parse.error verbose
%token ID OP END ERROR
%start input
%%
opbase: ID OP ID {
char *s = concat3($<a>1, STR_OP, $<a>3);
free($<a>1);
free($<a>3);
assert(s && "opbase: allocation failed");
$<a>$ = s;
}
;
ops: opbase {
$<a>$ = $<a>1;
}
| ops opbase {
int r = oplist_append(&my_oplist, $<a>1);
assert(r == 0 && "ops: allocation failed");
$<a>$ = $<a>2;
}
| ops ID {
char *s = concat3($<a>1, STR_SPACE, $<a>2);
free($<a>1);
free($<a>2);
assert(s && "ops: allocation failed");
$<a>$ = s;
}
;
input: ops {
int r = oplist_append(&my_oplist, $<a>1);
assert(r == 0 && "input: allocation failed");
}
;
%%
char *concat3(const char *s1, const char *s2, const char *s3) {
size_t len = strlen(s1) + strlen(s2) + strlen(s3);
char *s = malloc(len + 1);
if (!s)
goto concat3__end;
sprintf(s, "%s%s%s", s1, s2, s3);
concat3__end:
return s;
}
int oplist_append(struct oplist *oplist, char *op) {
if (oplist->count == oplist->capacity) {
char **ops = realloc(oplist->ops, (oplist->capacity + 32) * sizeof(char *));
if (!ops)
return 1;
oplist->ops = ops;
oplist->capacity += 32;
}
oplist->ops[oplist->count++] = op;
return 0;
}
void oplist_clear(struct oplist *oplist) {
if (oplist->count > 0) {
for (size_t i = 0; i < oplist->count; ++i)
free(oplist->ops[i]);
oplist->count = 0;
}
if (oplist->capacity > 0) {
free(oplist->ops);
oplist->capacity = 0;
}
}
void oplist_dump(struct oplist *oplist) {
for (size_t i = 0; i < oplist->count; ++i)
printf("%2zu: '%s'\n", i, oplist->ops[i]);
}
void yyerror(const char *str) {
fprintf(stderr, "error#%d: %s\n", yylineno, str);
}
int main(int argc, char *argv[]) {
yyparse();
oplist_dump(&my_oplist);
oplist_clear(&my_oplist);
}
Output with A # B C X # Y:
0: 'A # B C'
1: 'X # Y'
I want to use flex to get the current line number. it seems that flex has a global variable yylineno to keep the current line number when compile.
It is sure that yylineno will increment by 1 when \n is matched. but does ‘r$’ which match a
string at the end of a line change yylineno too? otherwise, are there anyelse situations where yylineno is updated?
For example, I have a source file which is 71 lines in total
/*
Author: guanwanxian
date: 2014-12-29
*/
#include "cstdio"
#include "iostream"
#include "cmath"
#include "tchar.h"
using namespace std;
#define MAX 10000
//This is a struct to represent a Point in two-dimension plane
//Take a test
struct Point{
Point(double XPos_N,double YPos_N){
m_XPos=XPos_N;
m_YPos=YPos_N;
}
double CalDistanceWithAnotherPoint(Point& OtherPoint)
{
double Dis=sqrt((m_XPos-OtherPoint.m_XPos)*(m_XPos-OtherPoint.m_XPos)+(m_YPos-OtherPoint.m_YPos)*(m_YPos-OtherPoint.m_YPos));
return Dis;
}
double m_XPos;
double m_YPos;
};
//this is a function to print Hello World
void PrintHelloWorld()
{
for(int i=0;i<10;i++)
{
printf("Hello World\n");
}
}
/*
this is a function to calculate the sun of two integers
balabala
2014-12-31
*/
int CalSum(int x , int y)
{
int sum=x+y;
return sum;
}
/*
this is the Main function
this is the enterance of my program
this is just a test program
*/
int _tmain(int argc, _TCHAR* argv[])
{
int A=23;
int B=34;
int SumOfAB=CalSum(A,B);
_tprintf(_T("The sum of A and B is:%d \n"),SumOfAB);
PrintHelloWorld();
Point AP(0,0);
Point BP(2,3);
double DisBetAP_AND_BP=AP.CalDistanceWithAnotherPoint(BP);
_tprintf(_T("The distance between AP and BP is:%lf\n"),DisBetAP_AND_BP);
return 0;
}
And my flex file is :
%option noyywrap
%option yylineno
%{
#include <cstdlib>
#include <iostream>
#include "tchar.h"
#include "parser.hpp"
extern int SourceFileLength;//The size of input file
// this function will be generated using bison
extern int yyparse();
int DigitNum=0;
int CommentLineNum=0;
int ProgramLineNum=0;
%}
Digits [0-9]+
BinoOP [-+*/]
parenthesis [()]
%s IN_BLOCK_COMMENT
%s IN_SINGLELINE_COMMENT
%s NOT_COMMENT
%s IN_FUNCTION
%%
<INITIAL>{
"//" {
BEGIN(IN_SINGLELINE_COMMENT);
std::cout<< "enter single line comment\n";
}
"/*" {
std::cout<<"block line num: "<<yylineno<<std::endl;
BEGIN(IN_BLOCK_COMMENT);
std::cout<< "enter block comment\n";
}
([^\/\ \n][^\ \n]*)|(\/[^\*\/\ \n][^\ \n]*)|(\/) { std::cout << yytext <<std::endl;}
\n {std::cout << std::endl; ProgramLineNum++; }
<<EOF>> { std::cout<<"TotalLine: "<<yylineno<<std::endl; std::cout<<"current position: "<<ftell(yyin)<<std::endl; ProgramLineNum++; std::cout<<"File Size: "<<SourceFileLength<<std::endl; return(0);}
. {}
}
<IN_BLOCK_COMMENT>{
"*/" { BEGIN(INITIAL); std::cout << "leave block comment\n" << std::endl; CommentLineNum++; }
[^*\n]+ { std::cout << "BlockLine\n"; }//eat comment in chunks
"*" { std::cout << "\"*\" " << std::endl;}//eat the lone star
"\n" { std::cout <<std::endl; CommentLineNum++; ProgramLineNum++;}
}
<IN_SINGLELINE_COMMENT>{
.*$ { std::cout<<"curretn yyline: "<<yylineno<<std::endl; BEGIN(INITIAL); std::cout<< "SingleLine\n"; std::cout<< "leave single line comment\n"<<std::endl; CommentLineNum++; }//单行注释,包括只有//的情况
}
<NOT_COMMENT>{
}
<IN_FUNCTION>{
BEGIN(INITIAL);
}
The Answer is 75 lines instead of 71 lines. Because the patter .*$ has been matched three times and the initial yylineno seems to be 1, so the answer is 1+71+3=75. am i right?
does r$ which matches a string at end of line change yylineno too?
No.
Smilarly, it is incorrect to increment CommentLineNum in the rule <IN_SINGLE_LINE_COMMENT>.*$. This rule does not consume a line terminator.
I am trying to generate a simple scanner using Flex. However, when using the following code, I get multiple "unrecognised rule" errors on lines 23,24 and 25. After studying some similar examples, I still can't find any formatting mistakes. Can someone please point me to the right direction?
%{
#include <stdio.h>
#include "mylang3.tab.h"
#include <stdlib.h>
#include <string.h>
#define OK 234
#define ILLEGAL 235
%}
digit [0-9]
letter [a-zA-Z]
invalid_id [char|else|if|class|new|return|void|while]
unsigned_int ({digit}+)
INTEGER ((+|-)?{unsigned_int})
all_chars [{letter}{digit}_]
ID ([({all_chars}{-}{digit})({all_chars})*]{-}{invalid_id})
special_char ["\n"|"\""|"\'"|"\0"|"\t"|"\\"]
CHARACTER '([[:print:]]{-}["']{+}{special_char})'
%%
[a-zA-Z]+ printf("I have found a word %s\n", yytext);
{ID} printf("I have found an id %s\n", yytext); //errors
{INTEGER} printf("I have found an integer %s\n",yytext); //errors
{CHARACTER} printf("I have found a char %s\n",yytext); //errors
char|else|if|class|new|return|void|while printf("I have found a reserved word %s\n",yytext);
"+"|"-"|"*"|"/"|"{"|"}"|"("|")"|"["|"]" printf("I have found an operator: %s\n", yytext );
[" " \t\n]+ /* eat up whitespace */
. printf( "Unrecognized character: %s\n", yytext );
%%
/*int main(int argc, char** argv){
int token;
int ok=1;
++argv, --argc;
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yylex();
while((token =yylex())!=0){
if(token==ILLEGAL){ printf("Illegal sequence\n"); ok=0; }
}
if(ok==0) printf("Encountered lexical errors\n");
else printf("No lexical errors found\n");
return 0;
}*/
You can only use square brackets for characters, not for sequences of characters. So instead of e. g.
all_chars [{letter}{digit}_]
you'll have to write
all_chars ({letter}|{digit}|_)
And you shouldn't mix pipe signs and square brackets. [abc] means the same as (a|b|c), but [a|b|c] is wrong.