DParser- warning: trying to write code to binary file - parsing

I have a large grammar written for DParser and using the Python binding. When I first run the parser, and DParser generates its internal tables, I get a number of warnings like these:
warning: trying to write code to binary file
warning: trying to write code to binary file
warning: trying to write code to binary file
Not sure what the cause of source of these warnings are. The only thing I could find was in the DParser source code "write_tables.c":
write_code(FILE *fp, Grammar *g, Rule *r, char *code,
char *fname, int line, char *pathname)
{
char *c;
if ( !fp ) {
d_warn("trying to write code to binary file");
return;
}
...
}
Any hints or ideas would be appreciated.

I found out that the problem with these warnings was because I had errors in my grammar and I had forgotten to add quotes around [ ] in some cases. Like:
[ example_non_terminal ]
It was taking example_non_terminal as a character set. A number of these were causing the problem. The correct grammar should have been:
'[' example_non_terminal ']'

Related

Errors in definitions in Flex and Lex

I am writing a lexical analyser for a toy programming language with toy keywords. I wish to print "keyword" for every keyword the analyser bumps into. To make my code cleaner, I defined the term "keyword" for all keywords above the rule section.
%{
#include <stdio.h>
%}
keyword program | begin | ... | end
where the ... implies rest of the keywords.
In the rules section, I wrote the following rule:
{keyword} {
printf("keyword\n");
}
Then finally I wrote the main function and yywrap function.
However, when I compile the generated lex.yy.c file, I get the following error.
use of undeclared identifier 'keyword'
{keyword} {
^
Please help me with this error, I am new to this scanner-generating language.
You will get better answers here if you copy and paste the precise text of your program into your question. Otherwise, you force anyone answering to guess what the original text is. This is my guess:
Probably the line that is being complained about was indented in your flex input file. Make sure that all rules start exactly at the left margin. (Any indented text is copied verbatim into the output file, as though it were C code. The most common use for this feature is to add comments to your Flex rules.)
Also, you cannot use unquoted spaces in a macro definition; you would need:
keyword program|begin|...|end
Otherwise, flex will throw an error when it expands the macro. (It didn't expand the macro in this case, presumably because of the first problem.)

Why is my parser showing errors for valid programs?

I can't figure out what's wrong with my parser. Here are the associated files:
parse.y
declarations: INTEGER_SIZE IDENTIFIER TERMINATOR {declare($1,$2);}
void yyerror(char *err){
printf("\n\nYYError on line %d: Error = %s\n", yylineno, err);
}
scan.l
[Xx]+ {yylval.size = strlen(yytext);
When running it against the valid program below it shows an error at line 3; when running any of the lines individually it shows an error on line 1 via the yyerror() function.
BEGINING.
XXX XY-1.
XXXX Y.
XXXX Z.
BODY.
PRINT “Please enter a number? ”.
INPUT Y.
MOVE 15 TO Z.
ADD Y TO Z.
PRINT XY-1;” + “;Y;”=”;Z.
END.
To run the files run the following commands:
yacc -d parser.y
lex lexer.l
gcc -o parser lex.yy.c y.tab.c -ll
This non-terminal is called declarations, from which one might think that it matches one or more declarations, or perhaps zero or more declarations:
declarations: INTEGER_SIZE IDENTIFIER TERMINATOR {declare($1,$2);}
But the rule matches exactly three tokens, which is to say one declaration. So when you give it an input with two declarations, it fails on the second one.
Similarly, your non-terminal called statements only matches a single statement, not several as might be expected from its name.
Grammars need to be explicit. If you want to match several declarations, you have to write that:
declarations: declaration
| declarations declaration
By the way, I have seen before grammars written with the belief that you have to write {;} at the end of a production. I'm curious where this idea comes from. Yacc and bison do not require that productions have an action, and anyway an empty action is {}, just as it is in C.

Parser - Segmentation fault when calling yytext

My parser is recognizing the grammar and indicating the correct error line using yylineno. I want to print the symbol wich caused the error.
int yyerror(string s)
{
extern int yylineno; // defined and maintained in lex.yy.c
extern char *yytext; // defined and maintained in lex.yy.c
cerr << "error: " << s << " -> " << yytext << " # line " << yylineno << endl;
//exit(1);
}
I get this error when I write something not acceptable by the grammar:
error: syntax error -> Segmentation fault
Am I not supposed to used yytext? If not what variable contains the symbol that caused the syntax error?
Thanks
Depending on the version of lex you are using, yytext may be an array or may be a pointer. Since it is defined in a different compilation unit, if it is an array and you declare it as a pointer, you won't see any error messages from the compiler or linker (linker generally don't do type checking). Instead it will treat the first several characters in the array as a pointer and try to dereference it and probably crash.
If you are using flex, you can add a %pointer declaration to the first section of your .l file to ensure that it is a pointer and not an array
Are you using lex or flex? If you're using lex,yytext is a char[], not a char*.
EDIT If you aren't using flex you should be, it is superior in every way and has been from the moment of its appearance nearly 30 years ago. lex was obsoleted on that day.

How to make lex/flex recognize tokens not separated by whitespace?

I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if. Simultaneously, the lexer must also exit(1) when it encounters invalid input.
A simplified version of the code I have:
%{
#include <stdio.h>
%}
%option main warn debug
%%
if |
then |
else printf("keyword: %s\n", yytext);
[[:digit:]]+ printf("number: %s\n", yytext);
[[:alpha:]][[:alnum:]]* printf("identifier: %s\n", yytext);
[[:space:]]+ // skip whitespace
[[:^space:]]+ { printf("ERROR: %s\n", yytext); exit(1); }
%%
When I run this (or my complete version), and pass it the input 39if, the error rule is matched and the output is ERROR: 39if, when I'd like it to be:
number: 39
keyword: if
(I.e. the same as if I entered 39 if as the input.)
Going by the manual, I have a hunch that the cause is that the error rule matches a longer possible input than the number and keyword rules, and flex will prefer it. That said, I have no idea how to resolve this situation. It seems unfeasible to write an explicit regexp that will reject all non-error input, and I don't know how else to write a "catch-all" rule for the sake of handling lexer errors.
UPDATE: I suppose I could just make the catch-all rule be . { exit(1); } but I'd like to get some nicer debug output than "I got confused on line 1".
You're quite right that you should just match a single "any" character as a fallback. The "standard" way of getting information about where in the line the parsing is at is to use the --bison-bridge option, but that can be a bit of a pain, particularly if you're not using bison. There are a bunch of other ways -- look in the manual for the ways to specify your own i/o functions, for example, -- but the all around simplest IMHO is to use a start condition:
%x LEXING_ERROR
%%
// all your rules; the following *must* be at the end
. { BEGIN(LEXING_ERROR); yyless(1); }
<LEXING_ERROR>.+ { fprintf(stderr,
"Invalid character '%c' found at line %d,"
" just before '%s'\n",
*yytext, yylineno, yytext+1);
exit(1);
}
Note: Make sure that you've ignored whitespace in your rules. The pattern .+ matches any number but at least one non-newline character, or in other words up to the end of the current line (it will force flex to read that far, which shouldn't be a problem). yyless(n) backs up the read pointer by n characters, so after the . rule matches, it will rescan that character producing (hopefully) a semi-reasonable error message. (It won't really be reasonable if your input is multibyte, or has weird control characters, so you could write more careful code. Up to you. It also might not be reasonable if the error is at the end of a line, so you might also want to write a more careful regex which gets more context, and maybe even limits the number of forward characters read. Lots of options here.)
Look up start conditions in the flex manual for more info about %x and BEGIN

Bison grammar warnings

I am writing a parser with Bison and I am getting the following warnings.
fol.y:42 parser name defined to default :"parse"
fol.y:61: warning: type clash ('' 'pred') on default action
I have been using Google to search for a way to get rid of them, but have pretty much come up empty handed on what they mean (much less how to fix them) since every post I found with them has a compilation error and the warnings them selves aren't addressed. Could someone tell me what they mean and how to fix them? The relevant code is below. Line 61 is the last semicolon. I cut out the rest of the grammar since it is incredibly verbose.
%union {
char* var;
char* name;
char* pred;
}
%token <var> VARIABLE
%token <name> NAME
%token <pred> PRED
%%
fol:
declines clauses {cout << "Done parsing with file" << endl;}
;
declines:
declines decline
|decline
;
decline:
PRED decs
;
The first message is likely just a warning that you didn't include %start parse in the grammar specification.
The second means that somewhere you have rule that is supposed to return a value but you haven't properly specified which type of value it is to return. The PRED returns the pred element of your union; the problem might be that you've not created %type entries for decline and declines. If you have a union, you have to specify the type for most, if not all, rules — or maybe just rules that don't have an explicit action (so as to override the default $$ = $1; action).
I'm not convinced that the problem is in the line you specify, and because we don't have a complete, minimal reproduction of your problem, we can't investigate for you to validate it. The specification for decs may be relevant (I'm not convinced it is, but it might be).
You may get more information from the output of bison -v, which is the y.output file (or something similar).
Finally found it.
To fix this:
fol.y:42 parser name defined to default :"parse"
Add %name parse before %token
Eg:
%name parse
%token NUM
(From: https://bdhacker.wordpress.com/2012/05/05/flex-bison-in-ubuntu/#comment-2669)

Resources