Error EOF inside action Flex - flex-lexer

I am creating a file that will be compiled with flex, but I am having trouble understanding why I am getting this error. I am inexperienced with this.
The error says line 43 (ie the last line) end of file inside the action.
What I have so far.
%{
#ifdef PRINT
#define TOKEN(t) printf("Token: " #t "/n");
#else
#define TOKEN(t) return(t);
#endif
%}
%%
"," TOKEN(COMMA")
";" TOKEN(SEMICOLON)
"->" TOKEN(ARROW)
"(" TOKEN(BRA)
")" TOKEN(KET)
"=" TOKEN(EQUALS)
"<>" TOKEN(LESMORE)
"<" TOKEN(LESS)_THAN)
">" TOKEN(MORE_THAN)
"<=" TOKEN(LESS_EQUAL)
">=" TOKEN(MORE_EQUAL)
"*" TOKEN(MULTIPLY)
"/" TOKEN(DIVIDE)
"'" TOKEN(CHAR_SHOW)
ENDP TOKEN(ENDP)
DECLARATIONS TOKEN(DECLARATIONS)
CHARACTER TOKEN(CHARACTER)
INTEGER TOKEN(INTEGER)
REAL TOKEN(REAL)
ENDIF TOKEN(ENDIF)
ELSE TOKEN(ELSE)
ENDDO TOKEN(ENDDO)
WHILE TOKEN(WHILE)
DO TOKEN(DO)
ENDWHILE TOKEN(ENDWHILE)
FOR TOKEN(FOR)
IS TOKEN(IS)
BY TOKEN(BY)
TO TOKEN(TO)
ENDFOR TOKEN(ENDFOR)
WRITE TOKEN(WRITE)
NEWLINE TOKEN(NEWLINE)
READ TOKEN(READ)
%%
Any help is appreciated

The first action is:
"," TOKEN(COMMA")
which has a mismatched quote.
Also, there is a problem with
"<" TOKEN(LESS)_THAN)
And it is not clear to me if all the lines from that one down are incorrectly indented by one space; if so, that is also a problem.
Finally, there is very little point in that TOKEN macro (which is probably copied from somewhere else where it is unnecessary) because you can use the --debug command-line option to Flex to produce very accurate scanner traces, and there is a similar tracing facility in bison which will also reveal the result of the scanner (including the name of the token, which the flex trace does not, unfortunately, provide).

You have two Typos, you have to change that to :
Line 10 : "," TOKEN(COMMA") --> "," TOKEN(COMMA)
Line 17: "<" TOKEN(LESS)_THAN) --> "<" TOKEN(LESS_THAN)

Related

How would I put “\” in a string without the apostrophe in the front of it being cancelled out?

For example, if I were to do this:
print(“\”)
It would say: `unfinished string near: ‘“”’
instead of my expected output of: ‘\’
How would I print this? I have searched on google yet still have yet to find an answer.
The backslash (\) is escaping the following character, being the double quote ("), causing the string to be unfinished.
To include an actual backslash in your string, you escape it with another backslash:
print("\\")
From Lua 5.4 Reference Manual, §3.1 (emphasis mine):
A short literal string can be delimited by matching single or double quotes, and can contain the following C-like escape sequences: '\a' (bell), '\b' (backspace), '\f' (form feed), '\n' (newline), '\r' (carriage return), '\t' (horizontal tab), '\v' (vertical tab), '\\' (backslash), '"' (quotation mark [double quote]), and ''' (apostrophe [single quote]). [...]

F(Lex) WARNING , rule cannot be matched

EOL \n
WS(" "|\t|\n)
WSS {WS}*
NEWSS {WSS}+
NAME [a-zA-z_][a-zA-z0-9_-]*
WORD [^;]+
IMPORT {NEWSS}'{NAME}'{WSS};
VAL [a-zA-z0-9]+
CONTENT [^}]+
MIX {NEWSS}{NAME}{WSS}[(]
INCLUDE {WSS}{NAME}{WSS}[{]
%s DOTAIM
%s NAMESTATE
%s NAMER
%s CONTENT
%s VALUE
%s INC
%%
${NAME} {key=yytext;BEGIN(NAMESTATE);}
. {output+=yytext;}
\n {output+=yytext;}
45) <NAMESTATE>; {if(var.find(key)==var.end()){output="Unknown variable";return 1;};output+=(var[key]+yytext);BEGIN(INITIAL);}
<NAMESTATE>{WSS}:{WSS} {BEGIN(DOTAIM);}
<DOTAIM>{WORD}{WSS} {val=trim(yytext); var[key]=val;}
48) <DOTAIM>; {BEGIN(INITIAL);}
This is my code and I keep getting this warning:
hello.lex:45: warning, rule cannot be matched
hello.lex:48: warning, rule cannot be matched
Does anyone know why? Because these are in states and line 43 is not preventing them to match.
You declare your start conditions as inclusive (%s): as the manual indicates, "If the start condition is inclusive, then rules with no start conditions at all will also be active."
So the . at line 43 will be active and prevent the ; from matching.
Moving the fallback rule to the end of the rules would fix the problem, and it is generally best style even if you have start conditions.

Ambiguous ANTLR parser rule

I have a very simple example text which I want to parse with ANTLR, and yet I'm getting wrong results due to ambiguous definition of the rule.
Here is the grammar:
grammar SimpleExampleGrammar;
prog : event EOF;
event : DEFINE EVT_HEADER eventName=eventNameRule;
eventNameRule : DIGIT+;
DEFINE : '#define';
EVT_HEADER : 'EVT_';
DIGIT : [0-9a-zA-Z_];
WS : ('' | ' ' | '\r' | '\n' | '\t') -> channel(HIDDEN);
First text example:
#define EVT_EX1
Second text example:
#define EVT_EX1
#define EVT_EX2
So, the first example is parsed correctly.
However, the second example doesn't work, as the eventNameRule matches the next "#define ..." and the parse tree is incorrect
Appreciate any help to change the grammar to parse this correctly.
Thanks,
Busi
Beside the missing loop specifier you also have a problem in your WS rule. The first alt matches anything. Remove that. And, btw, give your DIGIT rule a different name. It matches more than just digits.
As Adrian pointed out, my main mistake here is that in the initial rule (prog) I used "event" and not "event+" this will solve the issue.
Thanks Adrian.

Antlr4 ignores tokens

In ANTLR 4 I try to parse a text file, but some of my defined tokens are constantly ignored in favor of others. I produced a small example to show what I mean:
File to parse:
hello world
hello world
Grammar:
grammar TestLexer;
file : line line;
line : 'hello' ' ' 'world' '\n';
LINE : ~[\n]+? '\n';
The ANTLR book explains that 'hello' would become an implicit token, which is placed before the LINE token, and that token order matters. So I'd expect that the parser would NOT match the LINE token, but it does, as the resulting tree shows:
How can I fix this, so that I get the actual implicit tokens?
Btw. I also tried to write explicit tokens before LINE, but that didn't change anything.
Found it myself:
It seems that ANTLR chooses longest tokens first.
So since LINE would always match a whole line it is always preferred.
To still include some "joker" token into a grammar it should be a single symbol.
In my case
grammar TestLexer;
file : line line;
line : 'hello' ' ' 'world' '\n';
LINE : ~[\n];
would work.

Token in JavaCC: make sure that a symbol is single on a line

I need "{" will be single on a line. Therefore I have to use a token that recognize it. This are right examples:
program
{
or
program
{
And this are incorrect examples:
program {
or
program
{ sentence;
Then I have a token like this:
TOKEN: { < openKey: "{" > {System.out.print(image +"\n");}}
SKIP: { < ( " " | "\r" | "\t" | "\n" )+ > }
But I can not think how to make the symbol "{" is exactly between one or more "\n". And after recognized it I have to write exactly:
program
{
If I try:
TOKEN: { < openKey: ( " " | "\r" | "\t" | "\n" )+ "{" ( " " | "\r" | "\t" | "\n" )+ > {System.out.print(image +"\n");}}
This runs but it writes so many "\n" like there was in the input.
The basic problem is that you're printing the input without any interpretation. In other words, what goes in is what comes out, as you've discovered.
To make it easier to read --- and in order to not be in some respects misusing the lexical analyzer by forcing it to do the entire task --- I recommend moving your print statement down into the parser (e.g., in the Start() function). (I actually tend to move all of my output out of the parser entirely unless I'm doing something really tiny that I'm never going to reuse, but that's for another question.)
Next, to address the actual problem, you have do some interpretation to get from a bunch of newlines to just one. The simplest way to do that is a basic replaceAll. Here's my Start() function, where openKey is defined just as you've done, and WORD is simply a concatenation of letters.
void Start() :
{
Token t;
}
{
(
t = <WORD>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
)*
(
t = <openKey>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
(
t = <WORD>
{System.out.print((t.image).replaceAll("(\n)+","\n"));}
)*
)*
<EOF>
}
So basically, this takes zero or more words, followed by the unit that consists of 1 or more newlines followed by the left curly brace followed by 1 or more newlines, followed by zero or more words, and outputs the words, the curly brace, and just 1 newline per 1-or-more-newline set.
If you can start a file with a curly brace, instead of requiring a word, then it outputs and empty line, a curly brace, and a newline. I don't know if that's what you want, being able to begin the output with an empty line, so you will need to play with the output code to get the exact formatting you're going for, plus, as you can see you've got some very nice repeated code in there that could be extracted into a function, so I leave that for an exercise for the reader.
Anyway, the basic premise of this answer is --- and I believe this is really something a maxim for the ages, suitable for use in all areas of life, not just coding --- "Unless you change what you take in before outputting it, it's going to be exactly what you took in!"
I did it differently:
TOKEN: { < openKey: "\n" (" " | "\t")* "{" (" " | "\t")* ("\r" | "\n") >{System.out.print("{\r\n");}}
SKIP: { " " | "\r" | "\t" | "\n" }
There were some problems with the carriage return, but this way works well.

Resources