Parsec: Parsing expression between slashes - parsing

I'm trying to parse simple expressions between slashes. Example: / 1+2*3 / should evaluate to 7.
I was trying this
module Test where
import Text.Parsec
import Text.Parsec.Language (emptyDef)
import Text.Parsec.Combinator (between)
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
ops = ["+","*","-","/",";"]
names = ["def","extern"]
style = emptyDef {
Tok.commentLine = "#"
, Tok.reservedOpNames = ops
, Tok.reservedNames = names
}
integer :: Parser Int
integer = fromIntegral <$> Tok.integer lexer
parens :: Parser a -> Parser a
parens = Tok.parens lexer
braces :: Parser a -> Parser a
braces = Tok.braces lexer
slashes :: Parser a -> Parser a
slashes = between (reserved "/") (reserved "/")
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
binary s f assoc = Ex.Infix (reservedOp s >> return f) assoc
table = [[binary "*" (*) Ex.AssocLeft,
binary "/" div Ex.AssocLeft]
,[binary "+" (+) Ex.AssocLeft,
binary "-" (-) Ex.AssocLeft]]
factor :: Parser Int
factor = try integer
<|> parens expr
expr :: Parser Int
expr = Ex.buildExpressionParser table factor
programInSlashes :: Parser Int
programInSlashes = slashes expr
programInBraces :: Parser Int
programInBraces = braces expr
which works okay for programInBraces:
*Test> parse programInBraces "" "{ 1+2*3/4 }"
Right 2
however, programInSlashes does fail:
*Test> parse programInSlashes "" "/ 1+2*3/4 /"
Left (line 1, column 12):
unexpected end of input
expecting end of "/", integer or "("
Clearly the problem is that / is both an operator and the delimiter for the program itself. But as the language isn't ambiguous we should be able to parse that, no?

I think you can use Text.Parsec.Expr to parse the interior expression; then you can embed backtracking for the / case, for example:
Infix (try $ do { reserved "/"; notFollowedBy eof; return div }) AssocLeft
You can also parse the exterior language and the interior expression in separate passes. I’ve done this in a compiler for a language with custom operators: first parse the program without touching infix expressions, then run another pass to parse infix expressions according to the operators in scope.

Related

I am trying to parse a simple expression but it has an infinite loop

I want to parse a language like this
foo = (bar, bar1 = (bar2 = bar4), bar5)
I wrote a simple parser
module SimpleParser where
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import Text.Parsec
import qualified Text.Parsec.Token as Tok
import Text.Parsec.Char
import Prelude
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
style = emptyDef {
Tok.identLetter = alphaNum
}
parens :: Parser a -> Parser a
parens = Tok.parens lexer
commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer
identifier :: Parser String
identifier = Tok.identifier lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
data Expr = Ident String | Label String Expr | ExprList [Expr] deriving (Eq, Ord, Show)
parseExpr :: String -> Either ParseError Expr
parseExpr s = parse expr "" s
expr :: Parser Expr
expr = parens expr
<|> try exprList
<|> ident
ident :: Parser Expr
ident = do
var <- identifier
return $ Ident var
exprLabel :: Parser Expr
exprLabel = do
var <- identifier
reservedOp "="
body <- expr
return $ Label var body
exprList :: Parser Expr
exprList = do
list <- commaSep (try exprLabel <|> expr)
return $ ExprList list
But even with the following simple input it has an infinite loop:
test = parseExpr "foo = bar"
Could someone explain why it doesn't work and how I can fix it?
Thing is, in your code, exprList will loop if it tries to
parse an identifier, that is parse exprList "" "foo" goes into
an infinite loop. This is because it tries to parse it as a list
of either Labels or expressions, and expressions can be lists. Once
it fails to be a exprLabel it tries to see if it can be a expr and so
it calls exprList again.
To fix it you need to make sure that expr checks to see both if it is
a exprLabel or an identifier before it tries exprList. Note that if
all the above fails it will still go into a loop. This is because it doesn't know if this is just the start of a list (or a list of lists of lists of lists...) or not.
To fix this you can make expr only call exprList if it matches a parens, and use exprList as the starting Parser.
expr :: Parser Expr
expr = parens (exprList)
<|> try exprLabel
<|> ident
exprList :: Parser Expr
exprList = do
list <- commaSep expr
return $ ExprList list
And it works like this:
>parse exprList "" "(foo=bar),foo=bar"
Right (ExprList [ExprList [Label "foo" (Ident "bar")],Label "foo" (Ident "bar")])

Expression parser for unary operator

I was trying to make an expression parser for two operators out of which only ^ is postfix unary, so with an operand it would look like R^.
The problem is that whenever the operator ^ is encountered somewhere other than the end, it just stops there and returns whatever parsed successfully. It means "R;S;T^" parses successfully, but "R^;S;T^" stops at R. However, "(R^);S;T^" just works fine.
I took help from an online post which he used for unary minus but that is a prefix operator (for example -X). His original solution was giving errors at reservedOp2, so I modified it to reservedOp2 name = try (string name >> notFollowedBy (oneOf opChar)) and it produces the output mentioned above. I need it to work with or without parenthesis.
import Control.Applicative
import Text.ParserCombinators.Parsec hiding (many,optinal,(<|>))
import Text.ParserCombinators.Parsec.Expr
import Text.Parsec.Language (haskell)
import qualified Text.Parsec.Token as P
import Text.Parsec.String (Parser)
data RT = Comp RT RT
| Conv RT
| Var String
deriving (Show)
whiteSpace = P.whiteSpace haskell
word = P.identifier haskell
parens = P.parens haskell
opChar = "^;"
reservedOp2 :: String -> CharParser st ()
reservedOp2 name = try (string name >> notFollowedBy (oneOf opChar) >> whiteSpace)
term = parens relexpr
<|> Var <$> word
<?> "term"
table = [ [postfix "^" Conv]
, [binary ";" Comp AssocLeft]
]
prefix name fun = Prefix $ reservedOp2 name >> return fun
binary name fun = Infix $ reservedOp2 name >> return fun
postfix name fun = Postfix $ reservedOp2 name >> return fun
relexpr :: Parser RT
relexpr = buildExpressionParser table term <?> "expression"
It fails to parse e.g. "R^;S" because postfix "^" Conv fails on notFollowedBy (oneOf opChar) (since '^' is followed by ';'). The fix is to remove ';' from opChar:
opChar = "^"
Or, even easier if you can just use reservedOp from haskell:
reservedOp2 = P.reservedOp haskell
Either of these changes fixes the parsing of your examples.

Is there any trick about translating BNF to Parsec program?

The BNF that match function call chain (like x(y)(z)...):
expr = term T
T = (expr) T
| EMPTY
term = (expr)
| VAR
Translate it to Parsec program that looks so tricky.
term :: Parser Term
term = parens expr <|> var
expr :: Parser Term
expr = do whiteSpace
e <- term
maybeAddSuffix e
where addSuffix e0 = do e1 <- parens expr
maybeAddSuffix $ TermApp e0 e1
maybeAddSuffix e = addSuffix e
<|> return e
Could you list all the design patterns about translating BNF to Parsec program?
The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.
If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!
First write the appropriate types:
{-# LANGUAGE LambdaCase #-}
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char
import Control.Applicative hiding ((<|>))
import Control.Monad
data Term = App Term Term | Var String deriving (Show, Eq)
data Token = LParen | RParen | Str String deriving (Show, Eq)
type Lexer = Parsec [Char] () -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token
Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.
oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|>
(char ')' >> return RParen) <|>
(Str <$> many1 letter)
Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:
lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces)
Note the placement of spaces: this way, white space is accepted at the beginning and end of the string.
Since Parser uses a custom token type, you have to use a custom satisfy function. Fortunately, this is almost identical to the existing satisfy.
satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show
(\src _ _ -> incSourceColumn src 1)
(\x -> if f x then Just x else Nothing)
Then we can write parsers for each of the primitive tokens.
lparen = satisfy' $ \case { LParen -> True ; _ -> False }
rparen = satisfy' $ \case { RParen -> True ; _ -> False }
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })
As you may imagine, parens would be useful for our purposes. It is very straightforward to write.
parens :: Parser a -> Parser a
parens = between lparen rparen
Now the interesting parts.
term, expr, var :: Parser Term
term = parens expr <|> var
var = Var <$> strTok
These two should be fairly obvious to you.
Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something".
expr = do
e0 <- term
option e0 (parens expr >>= \e1 -> return (App e0 e1))
The last line means - try to apply the parser given to option - if it fails, instead return e0.
For testing you can do:
tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""
The eof attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z) is not actually in your grammar!
>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input
But the following is
>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

How do I create a Parser data?

I am trying to learn how can I do a parser for expressions in Haskell and I found this code (below), but I don't even know how to use it.
I tried with: expr (Add (Num 5) (Num 2)) , but it needs a "Parser" data type.
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Expr
import Text.Parsec.Token
import Text.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr | Sub Expr Expr | Mul Expr Expr | Div Expr Expr deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc = Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
Solution: readExpr input = parse expr "name for error messages" input
and use readExpr.
You can use the function parse which will run a Parser on an input string and return an Either ParseError Expr. I put a simple usage below where I turn that ParseError into a string and pass it along
readExpr :: String -> Either String Expr
readExpr input = case parse expr "name for error messages" input of
Left err -> Left $ "Oh noes parsers are failing: " ++ show err -- Handle error
Right a -> Right a -- Handle success
There are a few other functions, such as parseFromFile, which let you shorthand a few common patterns, to find them, check out the parsec haddock

Parsing function in haskell

I'm new to Haskell and I am trying to parse expressions. I found out about Parsec and I also found some articles but I don't seem to understand what I have to do. My problem is that I want to give an expression like "x^2+2*x+3" and the result to be a function that takes an argument x and returns a value. I am very sorry if this is an easy question but I really need some help. Thanks! The code I inserted is from the article that you can find on this link.
import Control.Monad(liftM)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr
| Sub Expr Expr | Mul Expr Expr | Div Expr Expr
| Pow Expr Expr
deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "^" Pow AssocRight],
[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc
= Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
This is just a parser for expressions with variables. Actually interpreting the expression is an entirely separate matter.
You should create a function that takes an already parsed expression and values for variables, and returns the result of evaluating the expression. Pseudocode:
evaluate :: Expr -> Map String Int -> Int
evaluate (Num n) _ = n
evaluate (Var x) vars = {- Look up the value of x in vars -}
evaluate (Plus e f) vars = {- Evaluate e and f, and return their sum -}
...
I've deliberately omitted some details; hopefully by exploring the missing parts, you learn more about Haskell.
As a next step, you should probably look at the Reader monad for a convenient way to pass the variable map vars around, and using Maybe or Error to signal errors, e.g. referencing a variable that is not bound in vars, or division by zero.

Resources