LPEG - "Rule may be left-recursive" error, despite being a terminable grammar - preprocessor

I'm trying to use LPEG to build a preprocessor for GLSL. I've managed to get #define and #undef statements working no problem, but my issues comes when trying to work with #ifdef statements.
My thought was that I could build a rule that would encapsulate a shader, bordered by #ifdef and #endif statements, like so:
S -> include | define | ifdef | code
ifdef -> "#ifdef" + var + S + "#endif"
Clearly this grammar is terminable, as the ifdef rule requires capturing #ifdef and the macro name before the recursive call. However, LPEG disagrees, claiming this rule "may be left recursive".
Does anyone know what I can do about this?
Thanks.

Related

flex scanner push-back overflow with automata

I am having a hard time with this problem.
"Write a flex code which recognizes a chain with alphabet {0,1}, with at least 5 char's, and to every consecutive 5 char's there will bee at least 3 1's"
I thought I have solved, but I am new using flex, so I am getting this "flex scanner push-back overflow".
here's my code
%{
#define ACCEPT 1
#define DONT 2
%}
delim [ \t\n\r]
ws {delim}+
comb01 00111|{comb06}1
comb02 01011|{comb07}1
comb03 01101|{comb08}1
comb04 01110|({comb01}|{comb09})0
comb05 01111|({comb01}|{comb09})1
comb06 10011|{comb10}1
comb07 10101|{comb11}1
comb08 10110|({comb02}|{comb12})0
comb09 10111|({comb02}|{comb12})1
comb10 11001|{comb13}1
comb11 11010|({comb03}|{comb14})0
comb12 11011|({comb03}|{comb14})1
comb13 11100|({comb04}|{comb15})0
comb14 11101|({comb04}|{comb15})1
comb15 11110|({comb05}|{comb16})0
comb16 11111|({comb05}|{comb16})1
accept {comb01}|{comb02}|{comb03}|{comb04}|{comb05}|{comb06}|{comb07}|{comb08}|{comb09}|{comb10}|{comb11}|{comb12}|{comb13}|{comb14}|{comb15}|{comb16}
string [^ \t\n\r]+
%%
{ws} { ;}
{accept} {return ACCEPT;}
{string} {return DONT;}
%%
void main () {
int i;
while (i = yylex ())
switch (i) {
case ACCEPT:
printf ("%-20s: ACCEPT\n", yytext);
break;
case DONT:
printf ("%-20s: Reject\n", yytext);
break;
}
}
Flex definitions are macros, and flex implements them that way: when it sees {defn} in a pattern, it replaces it with whatever defn was defined as (in parentheses, usually, to avoid operator precedence issues). It doesn't expand the macros in the macro definition, so the macro substitution might contain more definition references which in turn need to be substituted.
Since macro substitution is unconditional, it is not possible to use recursive macros, including macros which are indirectly recursive. Which yours are. Flex doesn't check for this condition, unlike the C preprocessor; it just continues substituting in an endless loop until it runs out of space.
(Flex is implemented using itself; it does the macro substitution using unput. unput will not resize the input buffer, so "runs out of space" here means that flex's internal flex's input buffer became full of macro substitutions.)
The strategy you are using would work fine as a context-free grammar. But that's not flex. Flex is about regular expressions. The pattern you want to match can be described by a regular expression -- the "grammar" you wrote with flex macros is a regular grammar -- but it is not a regular expression and flex won't make one out of it for you, unfortunately. That's your job.
I don't think it's going to be a very pretty regular expression. In fact, I think it's likely to be enormous. But I didn't try working it out..
There are flex tricks you could use to avoid constructing the regular expression. For example, you could build your state machine out of flex start conditions and then scan one character at a time, where each character scanned does a state transition or throws an error. (Use more() if you want to return the entire string scanned at the end.)

Macro in Objective-C calling isEqualToString: produces error about invalid token

I'm trying to define a macro like this:
#define SOME_DEF [[TTys getString] isEqualToString:ANOTHER_STRING]
and then doing the following:
#if SOME_DEF
...
#endif
[TTys getString] returns an NSString
ANOTHER_STRING is defined earlier as #define ANOTHER_STRING "hello"
I get the following error on the #if SOME_DEF line:
Invalid token at start of a preprocessor expression
Based on this SO question this might be caused by something that can't be resolved at compile time, but I have everything defined. My suspect is
the isEqualToString method, but I don't know of another way to do this.
When you write #if SOME_DEF the preprocessor resolves it into:
#if [[TTys getString] isEqualToString:ANOTHER_STRING]
This is not a valid condition for the #if preprocessor directive:
The ‘#if’ directive allows you to test the value of an arithmetic
expression, rather than the mere existence of one macro. Its syntax is
#if expression
controlled text
#endif /* expression */
expression is a C expression of integer type, subject to stringent restrictions. It may contain
Integer constants.
Character constants, which are interpreted as they
would be in normal code.
Arithmetic operators for addition,
subtraction, multiplication, division, bitwise operations, shifts,
comparisons, and logical operations (&& and ||). The latter two obey
the usual short-circuiting rules of standard C.
Macros. All macros in
the expression are expanded before actual computation of the
expression's value begins.
Uses of the defined operator, which lets
you check whether macros are defined in the middle of an ‘#if’.
Identifiers that are not macros, which are all considered to be the
number zero. This allows you to write #if MACRO instead of #ifdef
MACRO, if you know that MACRO, when defined, will always have a
nonzero value. Function-like macros used without their function call
parentheses are also treated as zero.
From the GCC documentation.
What you can do instead is using a runtime-evaluated regular if-statement:
if(SOME_DEF) {
...
}

Antlr mismatched '>' for include macro

I started to work with antlr a few days ago. I'd like to use it to parse #include macros in c. Only includes are to my interest, all other parts are irrelevant. here i wrote a simple grammar file:
... parser part omitted...
INCLUDE : '#include';
INCLUDE_FILE_QUOTE: '"'FILE_NAME'"';
INCLUDE_FILE_ANGLE: '<'FILE_NAME'>';
fragment
FILE_NAME: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|' ')+;
MACROS: '#'('if' | 'ifdef' | 'define' | 'endif' | 'undef' | 'elif' | 'else' );
//MACROS: '#'('a'..'z'|'A'..'Z')+;
OPERATORS: ('+'|'-'|'*'|'/'|'='|'=='|'!='|'>'|'>='|'<'|'<='|'>>'|'<<'|'<<<'|'|'|'&'|','|';'|'.'|'->'|'#');
... other supporting tokens like ID, WS and COMMENT ...
This grammar produces ambiguity when such statement are encountered:
(;i<listLength;i++)
output: mismatched character ';' expecting '>'
Seems it's trying to match INCLUDE_FILE_ANGLE instead of treating the ";" as OPERATORS.
I heard there's an operator called syntactic predicate, but im not sure how to properly use it in this case.
How can i solve this problem in an Antlr encouraged way?
Looks like there's not lots of activity about antlr here.
Anyway i figured this out.
INCLUDE_MACRO: ('#include')=>'#include';
VERSION_MACRO: ('#version')=>'#version';
OTHER_MACRO:
(
|('#if')=>'#if'
|('#ifndef')=>'#ifndef'
|('#ifdef')=>'#ifdef'
|('#else')=>'#else'
|('#elif')=>'#elif'
|('#endif')=>'#endif'
);
This only solves first half of the problem. Secondly, one cannot use the INCLUDE_FILE_ANGLE to match the desired string in the #include directive.
The '<'FILE_NAME'>' stuffs creates ambiguity and must be broken down to basic tokens from lexer or use more advanced context-aware checks. Im not familiar with the later technique, So i wrote this in the parser rule:
include_statement :
INCLUDE_MACRO include_file
-> ^(INCLUDE_MACRO include_file);
include_file
: STRING
| LEFT_ANGLE(INT|ID|OPERATORS)+RIGHT_ANGLE
;
Though this works , but it admittedly looks ugly.
I hope experienced users can comment with much better solution.

indirectly quoting macro in traditional mode

If I am using standard preprocessing then I may perform an indirect quote by:
#define foo bar
#define quoteme_(x) #x
#define quoteme(x) quoteme_(x)
and then just use quoteme(foo) to obtain "bar"
I want to do this, but using the pre-processor in traditional mode. I have attempted to just replace #x with 'x' but quoteme(foo) just returns 'foo'.
Any help is much appreciated.
Using the cpp that comes with GCC (4.8.1 tested), and the code (example.c):
#define foo bar
#define quoteme_(x) "x"
#define quoteme(x) quoteme_(x)
quoteme(foo)
The relevant part of the output from cpp -traditional example.c is:
"foo"
(and you can use single quotes in the replacement for quoteme_(x) similarly to obtain 'foo'). This is what you observed in the question.
AFAIK, there isn't a way to get 'bar' or "bar" out of the traditional preprocessor system. The pre-standard (traditional) systems were not standardized, and there were details where different systems behaved differently. However, macro arguments were expanded after replacement, rather than before as in C89 and later, which is what leads to the result you're seeing.

ANTLR Rules to Accept Whatever Not Previously Matched

How to create a parser rule that accept whatever the previous rules doesn't accept?
What I am doing is that I try to rewrite c++ src files with ANTLR. My grammar only need to understand a subset of C++ and ignore the rest. By ignoring the rest, I mean I must still output the input line as it is. I cannot simply drop the input. For example, I may need to locate #if, #ifdef, #ifndef, #else, #elif, #endif but send any other valid C++ syntax back the to the output as it is.
Part of my solution looks like:
inputLines : ( preprocessorLineSet | oneNormalInputLine ) ;
preprocessorLineSet : ....;// pattern to match #if #else etc
oneNormalInputLine : (any_token_except_crlf)* CRLF {System.out.println($text)};
// a catch-all rule for anything including #if #else #endif, it must send any unrecognised input back to the ouput
I am assuming the parser would try the alternatives in the order listed in the grammar. So my preprocessorLineSet rule is listed before oneNormalInputLine in the inputLines rule. But, it seems like ANTLR still prefer oneNormalInputLine even if the input is of the #if pattern which I assume should be matched by the previous rule.
Is my assumption correct? Is it a correct way to implement this kind of ignore-the-rest logic?
JavaMan wrote:
I am assuming the parser would try the alternatives in the order listed in the grammar. So my preprocessorLineSet rule is listed before oneNormalInputLine in the inputLines rule.
Correct, the rules are tried from left to right (preprocessorLineSet before oneNormalInputLine).
JavaMan wrote:
But, it seems like ANTLR still prefer oneNormalInputLine even if the input is of the #if pattern which I assume should be matched by the previous rule.
Wouldn't you need to exclude stuff like #if and #elif from any_token_except_crlf? Could you post a working example including a driver class that shows the unexpected behavior?

Resources