Errors in definitions in Flex and Lex - flex-lexer

I am writing a lexical analyser for a toy programming language with toy keywords. I wish to print "keyword" for every keyword the analyser bumps into. To make my code cleaner, I defined the term "keyword" for all keywords above the rule section.
%{
#include <stdio.h>
%}
keyword program | begin | ... | end
where the ... implies rest of the keywords.
In the rules section, I wrote the following rule:
{keyword} {
printf("keyword\n");
}
Then finally I wrote the main function and yywrap function.
However, when I compile the generated lex.yy.c file, I get the following error.
use of undeclared identifier 'keyword'
{keyword} {
^
Please help me with this error, I am new to this scanner-generating language.

You will get better answers here if you copy and paste the precise text of your program into your question. Otherwise, you force anyone answering to guess what the original text is. This is my guess:
Probably the line that is being complained about was indented in your flex input file. Make sure that all rules start exactly at the left margin. (Any indented text is copied verbatim into the output file, as though it were C code. The most common use for this feature is to add comments to your Flex rules.)
Also, you cannot use unquoted spaces in a macro definition; you would need:
keyword program|begin|...|end
Otherwise, flex will throw an error when it expands the macro. (It didn't expand the macro in this case, presumably because of the first problem.)

Related

How to use context free grammars?

Could someone help me with using context free grammars. Up until now I've used regular expressions to remove comments, block comments and empty lines from a string so that it can be used to count the PLOC. This seems to be extremely slow so I was looking for a different more efficient method.
I saw the following post: What is the best way to ignore comments in a java file with Rascal?
I have no idea how to use this, the help doesn't get me far as well. When I try to define the line used in the post I immediately get an error.
lexical SingleLineComment = "//" ~[\n] "\n";
Could someone help me out with this and also explain a bit about how to setup such a context free grammar and then to actually extract the wanted data?
Kind regards,
Bob
First this will help: the ~ in Rascal CFG notation is not in the language, the negation of a character class is written like so: ![\n].
To use a context-free grammar in Rascal goes in three steps:
write it, like for example the syntax definition of the Func language here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func
Use it to parse input, like so:
// This is the basic parse command, but be careful it will not accept spaces and newlines before and after the TopNonTerminal text:
Prog myParseTree = parse(#Prog, "example string");
// you can do the same directly to an input file:
Prog myParseTree = parse(#TopNonTerminal, |home:///myProgram.func|);
// if you need to accept layout before and after the program, use a "start nonterminal":
start[Prog] myParseTree = parse(#start[TopNonTerminal], |home:///myProgram.func|);
Prog myProgram = myParseTree.top;
// shorthand for parsing stuff:
myProgram = [Prog] "example";
myProgram = [Prog] |home:///myLocation.txt|;
Once you have the tree you can start using visit and / deepmatch to extract information from the tree, or write recursive functions if you like. Examples can be found here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func , but here are some common idioms as well to extract information from a parse tree:
// produces the source location of each node in the tree:
myParseTree#\loc
// produces a set of all nodes of type Stat
{ s | /Stat s := myParseTree }
// pattern match an if-then-else and bind the three expressions and collect them in a set:
{ e1, e2, e3 | (Stat) `if <Exp e1> then <Exp e2> else <Exp e3> end` <- myExpressionList }
// collect all locations of all sub-trees (every parse tree is of a non-terminal type, which is a sub-type of Tree. It uses |unknown:///| for small sub-trees which have not been annotated for efficiency's sake, like literals and character classes:
[ t#\loc?|unknown:///| | /Tree t := myParseTree ]
That should give you a start. I'd go try out some stuff and look at more examples. Writing a grammar is a nice thing to do, but it does require some trial and error methods like writing a regex, but even more so.
For the grammar you might be writing, which finds source code comments but leaves the rest as "any character" you will need to use the longest match disambiguation a lot:
lexical Identifier = [a-z]+ !>> [a-z]; // means do not accept an Identifier if there is still [a-z] to add to it; so only the longest possible Identifier will match.
This kind of context-free grammar is called an "Island Grammar" metaphorically, because you will write precise rules for the parts you want to recognize (the comments are "Islands") while leaving the rest as everything else (the rest is "Water"). See https://dl.acm.org/citation.cfm?id=837160

whitespace in flex patterns leads to "unrecognized rule"

The flex info manual provides allows whitespace in regular expressions using the "x" modifier in the (?r-s:pattern) form. It specifically offers a simple example (without whitespace)
(?:foo) same as (foo)
but the following program fails to compile with the error "unrecognized rule":
BAD (?:foo)
%%
{BAD} {}
I cannot find any form of (? that is acceptable as a rule pattern. Is the manual in error, or do I misunderstand?
The example in your question does not seem to reflect the question itself, since it shows neither the use of whitespace nor a x flag. So I'm going to assume that the pattern which is failing for you is something like
BAD (?x:two | lines |
of | words)
%%
{BAD} { }
And, indeed, that will not work. Although you can use extended format in a pattern, you can only use it in a definition if it doesn't contain a newline. The definition terminates at the last non-whitespace character on the definition line.
Anyway, definitions are overused. You could write the above as
%%
(?x:two | lines |
of | words ) { }
Which saves anyone reading your code from having to search for a definition.
I do understand that you might want to use a very long pattern in a rule, which is awkward, particularly if you want to use it twice. Regardless of the issue with newlines, this tends to run into problems with Flex's definition length limit (2047 characters). My approach has been to break the very long pattern into a series of definitions, and then define another symbol which concatenates the pieces.
Before v2.6, Flex did not chop whitespace off the end of the definition line, which also leads to mysterious "unrecognized rule" errors. The manual seems to still reflect the v2.5 behaviour:
The definition is taken to begin at the first non-whitespace character following the name and continuing to the end of the line.

How do you declare this macro in lex?

I'm new to Lex and I'm confused on how to declare the following macro, keyword. I want keyword to consist of either "if", "then", "else", or "while."
I typed this in lex:
keyword "if" | "then" | "else" | "while"
but the compiler is giving me an "unrecognized rule error". When I instead do
keyword "if"
It compiles ok.
Is this just a limitation of Lex? I know in jflex you can do what I did above and it'll work fine. Or am I doing it incorrectly?
Thanks
I can't test this right now, but off the top of my head:
Try putting the values in parentheses (before the first %%)
keyword ("if"|"then"|"else"|"while")
And then use it in rules like this (between %% and %%):
{keyword} {//action}
This is how you make a class in lex, so in the rest of the code you can use {keyword} and it will be recognized as the regex you've assigned in the definition section (before the first %%).
Also, you can use a class as a part of other regexs:
{keyword}\{[^\}]\} {//action}
This recognizes a whole block of code. (but it doesn't check the syntax inside the block, I leave that to you :) )

Building Latex/Tex arguments in lua

I use lua to make some complex job to prepare arguments for macros in Tex/LaTex.
Part I
Here is a stupid minimal example :
\newcommand{\test}{\luaexec{tex.print("11,12")}}% aim to create 11,12
\def\compare#1,#2.{\ifthenelse{#1<#2}{less}{more}}
\string\compare11,12. : \compare11,12.\\ %answer is less
\string\test : \test\\ % answer is 11,12
\string\compare : \compare\test. % generate an error
The last line creates an error. Obviously, Tex did not detect the "," included in \test.
How can I do so that \test is understood as 11 followed by , followed by 12 and not the string 11,12 and finally used as a correctly formed argument for \compare ?
There are several misunderstandings of how TeX works.
Your \compare macro wants to find something followed by a comma, then something followed by a period. However when you call
\compare\test
no comma is found, so TeX keeps looking for it until finding either the end of file or a \par (or a blank line as well). Note that TeX never expands macros when looking for the arguments to a macro.
You might do
\expandafter\compare\test.
provided that \test immediately expands to tokens in the required format, which however don't, because the expansion of \test is
\luaexec{tex.print("11,12")}
and the comma is hidden by the braces, so it doesn't count. But it wouldn't help nonetheless.
The problem is the same: when you do
\newcommand{\test}{\luaexec{tex.print("11,12")}}
the argument is not expanded. You might use “expanded definition” with \edef, but the problem is that \luaexec is not fully expandable.
If you do
\edef\test{\directlua{tex.sprint("11,12")}}
then
\expandafter\compare\test.
would work.

Create a Print Function

I'm learning Bison and at this time the only thing that I do was the rpcalc example, but now I want to implement a print function(like printf of C), but I don't know how to do this and I'm planning to have a syntax like this print ("Something here");, but I don't know how to build the print function and I don't know how to create that ; as a end of line. Thanks for your help.
You first need to ask yourself:
What are the [sub-]parts of my 'print ("something");' syntax ?
Once you identify these parts, "simply" describe them in the form of grammar syntax rules, along with applicable production rules. And then let Bison generate the parser for you; that's about it.
To put you on your way:
The semi-column is probably a element you will use to separate statemements (such a one "call" to print from another).
'print' itself is probably a keyword, or preferably a native function name of your language.
The print statement appears to take a literal string as [one of] its arguments. a literal string starts and ends with a double quote (and probably allow for escaped quotes within itself)
etc.
The bolded and italic expressions above are some of the entities (the 'symbols' in parser lingo) you'll likely need to define in the syntax for your language. For that you'll use Bison grammar rules, such as
stmt : print_stmt ';' | input_stmt ';'| some_other_stmt ';' ;
prnt_stmt : print '(' args ')'
{ printf( $3 ); }
;
args : arg ',' args;
...
Since the question asked about the semi-column, maybe some confusion was from the different uses thereof; see for example above how the ';' belong to your language's syntax whereby the ; (no quotes) at the end of each grammar rule are part of Bison's language.
Note: this is of course a simplistic implementation, aimed at showing the essential. Also the Bison syntax may be a tat off (been there / done it, but a long while back ;-) I then "met" ANTLR never to return to Bison, although I do see how its lightweight and fully self contained nature can make it appropriate in some cases)

Resources