What is wrong with my "token type" in Happy? - parsing

I am writing a simple arithmetic expression parser in the Haskell platform's Happy. The Happy tutorial (labeled "Documentation") from the Haskell site implements a similar grammar to what I need. The difference is that I want to include floating point numbers in my expressions and I do not need to define variables (i.e. expressions will not contain "let", "in", "=", or "x" or "y").
When I compile my grammar file with Happy it outputs:
unused terminals: 1
happy: no token type given
CallStack (from HasCallStack):
error, called at src/AbsSyn.lhs:93:24 in main:AbsSyn
I've searched StackOverflow for questions mentioning "no token type given" and found nothing mentioning this error. I also can't figure out what the "CallStack" trace means. (I'm quite new to Haskell).
I've defined a helper function to tell whether a token is a float:
isNum :: [Token] -> a -> Bool
isNum x = typeOf (read x) == typeOf 1.1
I've copied the documentation page's grammar file almost exactly, except where I've removed any production rules for variables, "=", or other alphabetic input, and where I've added rules for floating point numbers, i.e.
%token
int { TokenInt $$ }
float { TokenNum $$ }
...
Exp : Expl { Expl $1 }
Exp : Expl '+' Term { Plus $1 $3}
...
Factor : int { Int $1 }
| float { Float $1 }
...
data Exp
= Let String Exp Exp
| Expl Expl
deriving Show
data Expl
= Plus Expl Term
| Minus Expl Term
| Term Term
deriving Show
...
data Token
= TokenInt Int
| TokenFloat Float
| TokenNum Float
| TokenPlus
...
lexer (c:cs)
| isSpace c = lexer cs
| isDigit c = lexNum (c:cs)
lexer ('=':cs) = TokenEq : lexer cs
...
lexNum cs = TokenInt (read num) : lexer rest
where (num,rest) = span isDigit cs
lexFloat cs = TokenFloat (read num) : lexer rest
where (num,rest) = span isNum cs
That's about it, so far.

Related

How does a parser solves shift/reduce conflict?

I have a grammar for arithmetic expression which solves number of expression (one per line) in a text file. While compiling YACC I am getting message 2 shift reduce conflicts. But my calculations are proper. If parser is giving proper output how does it resolves the shift/reduce conflict. And In my case is there any way to solve it in YACC Grammar.
YACC GRAMMAR
Calc : Expr {printf(" = %d\n",$1);}
| Calc Expr {printf(" = %d\n",$2);}
| error {yyerror("\nBad Expression\n ");}
;
Expr : Term { $$ = $1; }
| Expr '+' Term { $$ = $1 + $3; }
| Expr '-' Term { $$ = $1 - $3; }
;
Term : Fact { $$ = $1; }
| Term '*' Fact { $$ = $1 * $3; }
| Term '/' Fact { if($3==0){
yyerror("Divide by Zero Encountered.");
break;}
else
$$ = $1 / $3;
}
;
Fact : Prim { $$ = $1; }
| '-' Prim { $$ = -$2; }
;
Prim : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id :NUM { $$ = yylval; }
;
What change should I do to remove such conflicts in my grammar ?
Bison/yacc resolves shift-reduce conflicts by choosing to shift. This is explained in the bison manual in the section on Shift-Reduce conflicts.
Your problem is that your input is just a series of Exprs, run together without any delimiter between them. That means that:
4 - 2
could be one expression (4-2) or it could be two expressions (4, -2). Since bison-generated parsers always prefer to shift, the parser will choose to parse it as one expression, even if it were typed on two lines:
4
-2
If you want to allow users to type their expressions like that, without any separator, then you could either live with the conflict (since it is relatively benign) or you could codify it into your grammar, but that's quite a bit more work. To put it into the grammar, you need to define two different types of Expr: one (which is the one you use at the top level) cannot start with an unary minus, and the other one (which you can use anywhere else) is allowed to start with a unary minus.
I suspect that what you really want to do is use newlines or some other kind of expression separator. That's as simple as passing the newline through to your parser and changing Calc to Calc: | Calc '\n' | Calc Expr '\n'.
I'm sure that this appears somewhere else on SO, but I can't find it. So here is how you disallow the use of unary minus at the beginning of an expression, so that you can run expressions together without delimiters. The non-terminals starting n_ cannot start with a unary minus:
input: %empty | input n_expr { /* print $2 */ }
expr: term | expr '+' term | expr '-' term
n_expr: n_term | n_expr '+' term | n_expr '-' term
term: factor | term '*' factor | term '/' factor
n_term: value | n_term '+' factor | n_term '/' factor
factor: value | '-' factor
value: NUM | '(' expr ')'
That parses the same language as your grammar, but without generating the shift-reduce conflict. Since it parses the same language, the input
4
-2
will still be parsed as a single expression; to get the expected result you would need to type
4
(-2)

Managing position information with Alex and Happy

I'm learning to use Alex and Happy to write a small compiler. I want to maintain line and column information for my AST nodes so that I can provide meaningful error messages to the user. To illustrate how I plan to do it, I wrote a small example (see code below), and I'd like to know if the way I approached the problem (having AlexPosn attached to the tokens, attaching a polymorphic attribute field to AST nodes, using tkPos and astAttr) is good style or if there are better ways to handle position information.
Lexer.x:
{
module Lexer where
}
%wrapper "posn"
$white = [\ \t\n]
tokens :-
$white+ ;
[xX] { \pos s -> MkToken pos X }
"+" { \pos s -> MkToken pos Plus }
"*" { \pos s -> MkToken pos Times }
"(" { \pos s -> MkToken pos LParen }
")" { \pos s -> MkToken pos RParen }
{
data Token = MkToken AlexPosn TokenClass
deriving (Show, Eq)
data TokenClass = X
| Plus
| Times
| LParen
| RParen
deriving (Show, Eq)
tkPos :: Token -> (Int, Int)
tkPos (MkToken (AlexPn _ line col) _) = (line, col)
}
Parser.y:
{
module Parser where
import Lexer
}
%name simple
%tokentype { Token }
%token
'(' { MkToken _ LParen }
')' { MkToken _ RParen }
'+' { MkToken _ Plus }
'*' { MkToken _ Times }
x { MkToken _ X }
%%
Expr : Term '+' Expr { NAdd $1 $3 (astAttr $1) }
| Term { $1 }
Term : Factor '*' Term { NMul $1 $3 (astAttr $1) }
| Factor { $1 }
Factor : x { NX (tkPos $1) }
| '(' Expr ')' { $2 }
{
data AST a = NX a
| NMul (AST a) (AST a) a
| NAdd (AST a) (AST a) a
deriving (Show, Eq)
astAttr :: AST a -> a
astAttr (NX a) = a
astAttr (NMul _ _ a) = a
astAttr (NAdd _ _ a) = a
happyError :: [Token] -> a
happyError _ = error "parse error"
}
Main.hs:
module Main where
import Lexer
import Parser
main :: IO ()
main = do
s <- getContents
let toks = alexScanTokens s
print $ simple toks
I personally would be pretty ok with the style you've described. However, it is very manual and I was hoping to at least provide one alternative that might be easier to manage.
If you look a little further down the documentation for alex wrappers, you'll notice that the monad and monadstate wrappers both contain position information. The downside is that you now have the entire thing wrapped in a monad and it complicates the parser slightly. However, by wrapping it in a monad, the result of the parse is an Alex a which means you have full access to the line and column information when you create your ast nodes. Now this simply removes some of the boiler plate from the lexer and doesn't do much more.
By doing this, you could also carry around the AlexState with your token, but that might be unnecessary.
If you need help actually fixing the parser to handle a monad/monadstate wrapper, I wrote a response on how I managed to get it working here: How to use an Alex monadic lexer with Happy?

Shift/Reduce conflicts in a propositional logic parser in Happy

I'm making a simple propositional logic parser on happy based on this BNF definition of the propositional logic grammar, this is my code
{
module FNC where
import Data.Char
import System.IO
}
-- Parser name, token types and error function name:
--
%name parse Prop
%tokentype { Token }
%error { parseError }
-- Token list:
%token
var { TokenVar $$ } -- alphabetic identifier
or { TokenOr }
and { TokenAnd }
'¬' { TokenNot }
"=>" { TokenImp } -- Implication
"<=>" { TokenDImp } --double implication
'(' { TokenOB } --open bracket
')' { TokenCB } --closing bracket
'.' {TokenEnd}
%left "<=>"
%left "=>"
%left or
%left and
%left '¬'
%left '(' ')'
%%
--Grammar
Prop :: {Sentence}
Prop : Sentence '.' {$1}
Sentence :: {Sentence}
Sentence : AtomSent {Atom $1}
| CompSent {Comp $1}
AtomSent :: {AtomSent}
AtomSent : var { Variable $1 }
CompSent :: {CompSent}
CompSent : '(' Sentence ')' { Bracket $2 }
| Sentence Connective Sentence {Bin $2 $1 $3}
| '¬' Sentence {Not $2}
Connective :: {Connective}
Connective : and {And}
| or {Or}
| "=>" {Imp}
| "<=>" {DImp}
{
--Error function
parseError :: [Token] -> a
parseError _ = error ("parseError: Syntax analysis error.\n")
--Data types to represent the grammar
data Sentence
= Atom AtomSent
| Comp CompSent
deriving Show
data AtomSent = Variable String deriving Show
data CompSent
= Bin Connective Sentence Sentence
| Not Sentence
| Bracket Sentence
deriving Show
data Connective
= And
| Or
| Imp
| DImp
deriving Show
--Data types for the tokens
data Token
= TokenVar String
| TokenOr
| TokenAnd
| TokenNot
| TokenImp
| TokenDImp
| TokenOB
| TokenCB
| TokenEnd
deriving Show
--Lexer
lexer :: String -> [Token]
lexer [] = [] -- cadena vacia
lexer (c:cs) -- cadena es un caracter, c, seguido de caracteres, cs.
| isSpace c = lexer cs
| isAlpha c = lexVar (c:cs)
| isSymbol c = lexSym (c:cs)
| c== '(' = TokenOB : lexer cs
| c== ')' = TokenCB : lexer cs
| c== '¬' = TokenNot : lexer cs --solved
| c== '.' = [TokenEnd]
| otherwise = error "lexer: Token invalido"
lexVar cs =
case span isAlpha cs of
("or",rest) -> TokenOr : lexer rest
("and",rest) -> TokenAnd : lexer rest
(var,rest) -> TokenVar var : lexer rest
lexSym cs =
case span isSymbol cs of
("=>",rest) -> TokenImp : lexer rest
("<=>",rest) -> TokenDImp : lexer rest
}
Now, I have two problems here
For some reason I get 4 shift/reduce conflicts, I don't really know where they might be since I thought the precedence would solve them (and I think I followed the BNF grammar correctly)...
(this is rather a Haskell problem) On my lexer function, for some reason I get parsing errors on the line where I say what to do with '¬', if I remove that line it's works, why could that be? (this issue is solved)
Any help would be great.
If you use happy with -i it will generate an info file. The file lists all the states that your parser has. It will also list all the possible transitions for each state. You can use this information to determine if the shift/reduce conflict is one you care about.
Information about invoking happy and conflicts:
http://www.haskell.org/happy/doc/html/sec-invoking.html
http://www.haskell.org/happy/doc/html/sec-conflict-tips.html
Below is some of the output of -i. I've removed all but State 17. You'll want to get a copy of this file so that you can properly debug the problem. What you see here is just to help talk about it:
-----------------------------------------------------------------------------
Info file generated by Happy Version 1.18.10 from FNC.y
-----------------------------------------------------------------------------
state 17 contains 4 shift/reduce conflicts.
-----------------------------------------------------------------------------
Grammar
-----------------------------------------------------------------------------
%start_parse -> Prop (0)
Prop -> Sentence '.' (1)
Sentence -> AtomSent (2)
Sentence -> CompSent (3)
AtomSent -> var (4)
CompSent -> '(' Sentence ')' (5)
CompSent -> Sentence Connective Sentence (6)
CompSent -> '¬' Sentence (7)
Connective -> and (8)
Connective -> or (9)
Connective -> "=>" (10)
Connective -> "<=>" (11)
-----------------------------------------------------------------------------
Terminals
-----------------------------------------------------------------------------
var { TokenVar $$ }
or { TokenOr }
and { TokenAnd }
'¬' { TokenNot }
"=>" { TokenImp }
"<=>" { TokenDImp }
'(' { TokenOB }
')' { TokenCB }
'.' { TokenEnd }
-----------------------------------------------------------------------------
Non-terminals
-----------------------------------------------------------------------------
%start_parse rule 0
Prop rule 1
Sentence rules 2, 3
AtomSent rule 4
CompSent rules 5, 6, 7
Connective rules 8, 9, 10, 11
-----------------------------------------------------------------------------
States
-----------------------------------------------------------------------------
State 17
CompSent -> Sentence . Connective Sentence (rule 6)
CompSent -> Sentence Connective Sentence . (rule 6)
or shift, and enter state 12
(reduce using rule 6)
and shift, and enter state 13
(reduce using rule 6)
"=>" shift, and enter state 14
(reduce using rule 6)
"<=>" shift, and enter state 15
(reduce using rule 6)
')' reduce using rule 6
'.' reduce using rule 6
Connective goto state 11
-----------------------------------------------------------------------------
Grammar Totals
-----------------------------------------------------------------------------
Number of rules: 12
Number of terminals: 9
Number of non-terminals: 6
Number of states: 19
That output basically says that it runs into a bit of ambiguity when it's looking at connectives. It turns out, the slides you linked mention this (Slide 11), "ambiguities are resolved through precedence ¬∧∨⇒⇔ or parentheses".
At this point, I would recommend looking at the shift/reduce conflicts and your desired precedences to see if the parser you have will do the right thing. If so, then you can safely ignore the warnings. If not, you have more work for yourself.
I can answer No. 2:
| c== '¬' == TokenNot : lexer cs --problem here
-- ^^
You have a == there where you should have a =.

Parsing function in haskell

I'm new to Haskell and I am trying to parse expressions. I found out about Parsec and I also found some articles but I don't seem to understand what I have to do. My problem is that I want to give an expression like "x^2+2*x+3" and the result to be a function that takes an argument x and returns a value. I am very sorry if this is an easy question but I really need some help. Thanks! The code I inserted is from the article that you can find on this link.
import Control.Monad(liftM)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr
| Sub Expr Expr | Mul Expr Expr | Div Expr Expr
| Pow Expr Expr
deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "^" Pow AssocRight],
[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc
= Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
This is just a parser for expressions with variables. Actually interpreting the expression is an entirely separate matter.
You should create a function that takes an already parsed expression and values for variables, and returns the result of evaluating the expression. Pseudocode:
evaluate :: Expr -> Map String Int -> Int
evaluate (Num n) _ = n
evaluate (Var x) vars = {- Look up the value of x in vars -}
evaluate (Plus e f) vars = {- Evaluate e and f, and return their sum -}
...
I've deliberately omitted some details; hopefully by exploring the missing parts, you learn more about Haskell.
As a next step, you should probably look at the Reader monad for a convenient way to pass the variable map vars around, and using Maybe or Error to signal errors, e.g. referencing a variable that is not bound in vars, or division by zero.

Why does ANTLR not parse the entire input?

I am quite new to ANTLR, so this is likely a simple question.
I have defined a simple grammar which is supposed to include arithmetic expressions with numbers and identifiers (strings that start with a letter and continue with one or more letters or numbers.)
The grammar looks as follows:
grammar while;
#lexer::header {
package ConFreeG;
}
#header {
package ConFreeG;
import ConFreeG.IR.*;
}
#parser::members {
}
arith:
term
| '(' arith ( '-' | '+' | '*' ) arith ')'
;
term returns [AExpr a]:
NUM
{
int n = Integer.parseInt($NUM.text);
a = new Num(n);
}
| IDENT
{
a = new Var($IDENT.text);
}
;
fragment LOWER : ('a'..'z');
fragment UPPER : ('A'..'Z');
fragment NONNULL : ('1'..'9');
fragment NUMBER : ('0' | NONNULL);
IDENT : ( LOWER | UPPER ) ( LOWER | UPPER | NUMBER )*;
NUM : '0' | NONNULL NUMBER*;
fragment NEWLINE:'\r'? '\n';
WHITESPACE : ( ' ' | '\t' | NEWLINE )+ { $channel=HIDDEN; };
I am using ANTLR v3 with the ANTLR IDE Eclipse plugin. When I parse the expression (8 + a45) using the interpreter, only part of the parse tree is generated:
Why does the second term (a45) not get parsed? The same happens if both terms are numbers.
You'll want to create a parser rule that has an EOF (end of file) token in it so that the parser will be forced to go through the entire token stream.
Add this rule to your grammar:
parse
: arith EOF
;
and let the interpreter start at that rule instead of the arith rule:

Resources