Haskell Happy parser not going further - parsing

I'm implementing a parser for a language similar to Oberon.
I've successfully written the lexer using Alex since I can see that the list of tokens returned by the lexer is correct.
When I give the tokens list to the parser, it stops at the first token.
This is my parser:
...
%name myParse
%error { parseError }
%token
KW_PROCEDURE { KW_TokenProcedure }
KW_END { KW_TokenEnd }
';' { KW_TokenSemiColon }
identifier { TokenVariableIdentifier $$ }
%%
ProcedureDeclaration : ProcedureHeading ';' ProcedureBody identifier { putStrLn("C") }
ProcedureHeading : KW_PROCEDURE identifier { putStrLn("D") }
ProcedureBody : KW_END { putStrLn("E") }
| DeclarationSequence KW_END { putStrLn("F") }
DeclarationSequence : ProcedureDeclaration { putStrLn("G") }
{
parseError :: [Token] -> a
parseError _ = error "Parse error"
main = do
inStr <- getContents
print (alexScanTokens inStr)
myParse (alexScanTokens inStr)
putStrLn("DONE")
}
This is the test code I give to the parser:
PROCEDURE proc;
END proc
This is the token list returned by the lexer:
[KW_TokenProcedure,TokenVariableIdentifier "proc",KW_TokenSemiColon,KW_TokenEnd,TokenVariableIdentifier "proc"]
The parser does't give any error, but it sticks to my ProcedureDeclaration rule, printing only C.
This is what the output looks like:
C
DONE
Any idea why?
UPDATE:
I've made a first step forward and I was able to parse the test input given before. Now I changed my parser to recognize the declaration of multiple procedures on the same level. To do this, this is how my new parse looks like:
...
%name myParse
%error { parseError }
%token
KW_PROCEDURE { KW_TokenProcedure }
KW_END { KW_TokenEnd }
';' { KW_TokenSemiColon }
identifier { TokenVariableIdentifier $$ }
%%
ProcedureDeclarationList : ProcedureDeclaration { $1 }
| ProcedureDeclaration ';' ProcedureDeclarationList { $3:[$1] }
ProcedureDeclaration : ProcedureHeading ';' ProcedureBody identifier { addProcedureToProcedure $1 $3 }
ProcedureHeading : KW_PROCEDURE identifier { defaultProcedure { procedureName = $2 } }
ProcedureBody : KW_END { Nothing }
| DeclarationSequence KW_END { Just $1 }
DeclarationSequence : ProcedureDeclarationList { $1 }
{
parseError :: [Token] -> a
parseError _ = error "Parse error"
main = do
inStr <- getContents
let result = myParse (alexScanTokens inStr)
putStrLn ("result: " ++ show(result))
}
The thing is, it fails to compile giving me this error:
Occurs check: cannot construct the infinite type: t5 ~ [t5]
Expected type: HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn t5 t5 t6 t7 t8 t9
Actual type: HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn t5 t5 t6 t7 t8 t9
-> HappyAbsSyn [t5] t5 t6 t7 t8 t9
...
I know for sure that it's caused by the second element of my ProcedureDeclarationsList rule, but I don't understand why.

There are two things to note here.
happy uses the first production rule as the top-level production for myParse.
Your first production rule is ProcedureDeclaration, so that's all it's going to try to parse. You probably want to make DeclarationSequence the first rule.
The return type of your productions are IO-actions, and in Haskell IO-actions are values. They are not "executed" until they become part of main. That means you need to write your productions like this:
DeclarationSequence : ProcedureDeclaration
{ do $1; putStrLn("G") }
ProcedureDeclaration : ProcedureHeading ';' ProcedureBody identifier
{ do $1; $3; putStrLn("C") }
That is, the return value of the DeclarationSequence rule is the IO-action returned by ProcedureDeclaration followed by putStrLn "G".
And the return value of the ProducedureDeclaration rule is the action returned by ProcudureHeading followed by the action returned by ProcedureBody followed by putStrLn "C".
You could also write the RHS of the rules using the >> operator:
{ $1 >> putStrLn "G" }
{ $1 >> $3 >> putStrLn "C" }
Note that you have to decide the order in which to sequence the actions - i.e. pre-/post-/in- order.
Working example: http://lpaste.net/162432

It seems okay your expression has been parsed just fine. Check the return type of myParse, I guess it will be IO (), and the actual action will be putStrLn("D") - is what your wrote in ProcedureDeclaration. Next, your put call to myParse in the do block, it will be interpreted as print .. >> myParse (..) >> putStrLn .. or just linking monadic actions. myParse will return an action which will print "D" so the output is exactly what one would expect.
You have other actions defined in ProcedureBody and DeclarationSequence. But you never use these actions in any way, it's like you will write:
do
let a = putStrLn "E"
putStrLn("C")
Which will output "C", a is not used by any means. Same with your parser. If you want to invoke these actions, try to write $1 >> putStrLn("C") >> $2 in ProcedureDeclaration associated code.

Related

How do I write a grammar for a select sql statement?

I am writing a grammar for an SQL parser and I've been stuck on this for a while now-
F: FETCH fields FROM tables Conditions
;
fields: ALL
| ids
;
ids: ID ids_
;
ids_: ',' ID ids_
| { /*empty*/ }
;
tables: ID
;
Conditions: WHERE ConditionList
| { /*empty*/ }
;
ConditionList: Condition ConditionList_
;
ConditionList_: BoolOp Condition ConditionList_
| { /*empty*/ }
;
Condition: Operand RELOP Operand
| NOT Operand RELOP Operand
;
Operand: ID
| NUM
;
BoolOp: AND
| OR
;
For some reason when the lexer reads a FROM token, the parser terminates with an error. Here's the lex code-
FETCH{ printf("fetch "); return FETCH;}
FROM { printf("from "); return UNIQUE; }
ALL { printf("all "); return ALL; }
WHERE { printf("where "); return WHERE; }
AND { printf("and "); return AND; }
OR { printf("or "); return OR; }
NOT { printf("not "); return NOT; }
RelOp { printf("%s", yytext); yylval.string = strdup(yytext); return RELOP; }
[0-9]* {printf("num "); return NUM; }
[_a-zA-Z][_a-zA-Z0-9]* { printf("id "); return ID; }
{symbol} { printf("%c ", yytext[0]); return yytext[0]; }
. { }
RelOp is a pattern- RelOp ("<"|"<="|">"|">="|"=")
and symbol is a pattern- symbol ("("|")"|",")
Your grammar starts with
F: FETCH fields FROM tables Conditions
However, your lexer rules includes
FROM { printf("from "); return UNIQUE; }
Since UNIQUE is different from FROM, the grammar rule won't apply.
If those printf calls in your lexer are some kind of debugging attempt, they are not very useful since they won't tell you whether you are actually returning the correct token type (and value, in the cases where that is necessary). I strongly recommend using bison's trace feature to get an accurate view of what is going on. (Bison's trace will tell you which token type is being received by the parser, for example.)

Happy Parse Error

I'm currently using the alex and happy lexer/parser generators to implement a parser for the Ethereum Smart contract language solidity. Currently I'm using a reduced grammar in order to simplify the initial development.
I'm running into an error parsing the 'contract' section of the my test contract file.
The following is the code for the grammar:
ProgSource :: { ProgSource }
ProgSource : SourceUnit { ProgSource $1 }
SourceUnit : PragmaDirective { SourceUnit $1}
PragmaDirective : "pragma" ident ";" {Pragma $2 }
| {- empty -} { [] }
ImportDirective :
"import" stringLiteral ";" { ImportDir $2 }
ContractDefinition : contract ident "{" ContractPart "}" { Contract $2 $3 }
ContractPart : StateVarDecl { ContractPart $1 }
StateVarDecl : TypeName "public" ident ";" { StateVar $1 $3 }
| TypeName "public" ident "=" Expression ";" { StateV $1 $3 $5 }
The following file is my test 'contract':
pragma solidity;
contract identifier12 {
public variable = 1;
}
The result is from passing in my test contract into the main function of my parser.
$ cat test.txt | ./main
main: Parse error at TContract (AlexPn 17 2 1)2:1
CallStack (from HasCallStack):
error, called at ./Parser.hs:232:3 in main:Parser
From the error it suggest that the issue is the first letter of the 'contract' token, on line 2 column 1. But from my understanding this should parse properly?
You defined ProgSource to be a single SourceUnit, so the parser fails when the second one is encountered. I guess you wanted it to be a list of SourceUnits.
The same applies to ContractPart.
Also, didn't you mean to quote "contract" in ContractDefinition? And in the same production, $3 should be $4.

What causes Happy to throw a parse error?

I've written a lexer in Alex and I'm trying to hook it up to a parser written in Happy. I'll try my best to summarize my problem without pasting huge chunks of code.
I know from my unit tests of my lexer that the string "\x7" is lexed to:
[TokenNonPrint '\x7', TokenEOF]
My token type (spit out by the lexer), is Token. I've defined lexWrap and alexEOF as described here, which gives me the following header and token declarations:
%name parseTokens
%tokentype { Token }
%lexer { lexWrap } { alexEOF }
%monad { Alex }
%error { parseError }
%token
NONPRINT {TokenNonPrint $$}
PLAIN { TokenPlain $$ }
I invoke the parser+lexer combo with the following:
parseExpr :: String -> Either String [Expr]
parseExpr s = runAlex s parseTokens
And here are my first few productions:
exprs :: { [Expr] }
exprs
: {- empty -} { trace "exprs 30" [] }
| exprs expr { trace "exprs 31" $ $2 : $1 }
nonprint :: { Cmd }
: NONPRINT { NonPrint $ parseNonPrint $1}
expr :: { Expr }
expr
: nonprint {trace "expr 44" $ Cmd $ $1}
| PLAIN { trace "expr 37" $ Plain $1 }
I'll leave out the datatype declarations of Expr and NonPrint since they're long and only the constructors Cmd and NonPrint matter here. The function parseNonPrint is defined at the bottom of Parse.y as:
parseNonPrint :: Char -> NonPrint
parseNonPrint '\x7' = Bell
Also, my error handling function looks like:
parseError :: Token -> Alex a
parseError tokens = error ("Error processing token: " ++ show tokens)
Written like this, I expect the following hspec test to pass:
parseExpr "\x7" `shouldBe` Right [Cmd (NonPrint Bell)]
But instead, I see "exprs 30" print once (even though I'm running 5 different unit tests) and all of my tests of parseExpr return Right []. I don't understand why that would be the case, but I changed the exprs production to prevent it:
exprs :: { [Expr] }
exprs
: expr { trace "exprs 30" [$1] }
| exprs expr { trace "exprs 31" $ $2 : $1 }
Now all of my tests fail on the first token they hit --- parseExpr "\x7" fails with:
uncaught exception: ErrorCall (Error processing token: TokenNonPrint '\a')
And I'm thoroughly confused, since I would expect the parser to take the path exprs -> expr -> nonprint -> NONPRINT and succeed. I don't see why this input would put the parser in an error state. None of the trace statements are hit (optimized away?).
What am I doing wrong?
It turns out the cause of this error was the innocuous line
%lexer { lexWrap } { alexEOF }
which was recommended by the linked question about using Alex with Happy (unfortunately, one of the top Google results for queries like "using Alex as a monadic lexer with Happy). The fix is to change it to the following:
%lexer { lexWrap } { TokenEOF }
I had to dig in to the generated code to uncover the issue. It is caused by the code derived from the %tokens directive, which looks as follows (I commented out all of my token declarations except for TokenNonPrint while trying to track down the error):
happyNewToken action sts stk
= lexWrap(\tk ->
let cont i = happyDoAction i tk action sts stk in
case tk of {
alexEOF -> happyDoAction 2# tk action sts stk; -- !!!!
TokenNonPrint happy_dollar_dollar -> cont 1#;
_ -> happyError' tk
})
Evidently, Happy transforms each line of the %tokens directive in to one branch of a pattern match. It also inserts a branch for whatever was identified to it as the EOF token in the %lexer directive.
By inserting the name of a value, alexEOF, rather than a data constructor, TokenEOF, this branch of the case statement has the effect of re-binding the name alexEOF to whatever token was passed in to lexWrap, shadowing the original binding and short-circuiting the case statement so that it hits the EOF rule every time, which somehow results in Happy entering an error state.
The mistake isn't caught by the type system, since the identifier alexEOF (or TokenEOF) doesn't appear anywhere else in the generated code. Misusing the %lexer directive like this will cause GHC to emit a warning, but, since the warning appears in generated code, it's impossible to distinguish it from all of the other harmless warnings the code throws out.

Debug parser by printing useful information

I would like to parse a set of expressions, for instance:X[3], X[-3], XY[-2], X[4]Y[2], etc.
In my parser.mly, index (which is inside []) is defined as follows:
index:
| INTEGER { $1 }
| MINUS INTEGER { 0 - $2 }
The token INTEGER, MINUS etc. are defined in lexer as normal.
I try to parse an example, it fails. However, if I comment | MINUS INTEGER { 0 - $2 }, it works well. So the problem is certainly related to that. To debug, I want to get more information, in other words I want to know what is considered to be MINUS INTEGER. I tried to add print:
index:
| INTEGER { $1 }
| MINUS INTEGER { Printf.printf "%n" $2; 0 - $2 }
But nothing is printed while parsing.
Could anyone tell me how to print information or debug that?
I tried coming up with an example of what you describe and was able to get output of 8 with what I show below. [This example is completely stripped down so that it only works for [1] and [- 1 ], but I believe it's equivalent logically to what you said you did.]
However, I also notice that your example's debug string in your example does not have an explicit flush with %! at the end, so that the debugging output might not be flushed to the terminal until later than you expect.
Here's what I used:
Test.mll:
{
open Ytest
open Lexing
}
rule test =
parse
"-" { MINUS }
| "1" { ONE 1 }
| "[" { LB }
| "]" { RB }
| [ ' ' '\t' '\r' '\n' ] { test lexbuf }
| eof { EOFTOKEN }
Ytest.mly:
%{
%}
%token <int> ONE
%token MINUS LB RB EOFTOKEN
%start item
%type <int> index item
%%
index:
ONE { 2 }
| MINUS ONE { Printf.printf "%n" 8; $2 }
item : LB index RB EOFTOKEN { $2 }
Parse.ml
open Test;;
open Ytest;;
open Lexing;;
let lexbuf = Lexing.from_channel stdin in
ignore (Ytest.item Test.test lexbuf)

Scala parsing mutually recursive functions for SML

I'm trying to write a parser in Scala for SML with Tokens. It almost works the way I want it to work, except for the fact that this currently parses
let fun f x = r and fun g y in r end;
instead of
let fun f x = r and g y in r end;
How do I change my code so that it recognizes that it doesn't need a FunToken for the second function?
def parseDef:Def = {
currentToken match {
case ValToken => {
eat(ValToken);
val nme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
eat(EqualToken);
VAL(nme,parseExp)
}
case FunToken => {
eat(FunToken);
val fnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
val xnme:String = currentToken match {
case IdToken(x) => {advance; x}
case _ => error("Expected a name after VAL.")
}
def parseAnd:Def = currentToken match {
case AndToken => {eat(AndToken); FUN(fnme,xnme,parseExp,parseAnd)}
case _ => NOFUN
}
FUN(fnme,xnme,parseExp,parseAnd)
}
case _ => error("Expected VAL or FUN.");
}
}
Just implement the right grammar. Instead of
def ::= "val" id "=" exp | fun
fun ::= "fun" id id "=" exp ["and" fun]
SML's grammar actually is
def ::= "val" id "=" exp | "fun" fun
fun ::= id id "=" exp ["and" fun]
Btw, I think there are other problems with your parsing of fun. AFAICS, you are not parsing any "=" in the fun case. Moreover, after an "and", you are not even parsing any identifiers, just the function body.
You could inject the FunToken back into your input stream with an "uneat" function. This is not the most elegant solution, but it's the one that requires the least modification of your current code.
def parseAnd:Def = currentToken match {
case AndToken => { eat(AndToken);
uneat(FunToken);
FUN(fnme,xnme,parseExp,parseAnd) }
case _ => NOFUN
}

Resources