Parsing issue with parens. Parsec - Haskell - parsing

This is my code:
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
table :: [[ Operator Char st Integer ]]
table = [
[ op "*" (*) AssocLeft],
[ op "+" (+) AssocLeft]
]
where
op s f assoc = Infix (do { string s ; return f }) assoc
factor = do { char '(' ; x <- expr ; char ')' ; return x }
<|> number
<?> "simple expression"
number :: Parser Integer
number = do { ds <- many1 digit; return read(ds))) } <?> "number"
This works perfectly with expressions like this: (10+10) * 10.
But I have a problem with the following: 10 +10)
This has to return a parsing error (just 1 parenthesis at the end), but instead it returns 20.
How to fix this?
Thanks!

You'll need to use the eof parser to make sure you've read the whole input:
myParser = do
parsed <- expr
eof
return parsed
If you're using Control.Applicative this can be simplified to:
myParser = expr <* eof

Related

Is it possible to force backtrack all options?

I need to parse this syntax for function declaration
foo x = 1
Func "foo" (Ident "x") = 1
foo (x = 1) = 1
Func "foo" (Label "x" 1) = 1
foo x = y = 1
Func "foo" (Ident "x") = (Label "y" 1)
I have written this parser
module SimpleParser where
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import Text.Parsec
import qualified Text.Parsec.Token as Tok
import Text.Parsec.Char
import Prelude
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
style = emptyDef {
Tok.identLetter = alphaNum
}
parens :: Parser a -> Parser a
parens = Tok.parens lexer
commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep1 lexer
commaSep1 :: Parser a -> Parser [a]
commaSep1 = Tok.commaSep1 lexer
identifier :: Parser String
identifier = Tok.identifier lexer
reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer
data Expr = IntLit Int | Ident String | Label String Expr | Func String Expr Expr | ExprList [Expr] deriving (Eq, Ord, Show)
integer :: Parser Integer
integer = Tok.integer lexer
litInt :: Parser Expr
litInt = do
n <- integer
return $ IntLit (fromInteger n)
ident :: Parser Expr
ident = Ident <$> identifier
paramLabelItem = litInt <|> paramLabel
paramLabel :: Parser Expr
paramLabel = do
lbl <- try (identifier <* reservedOp "=")
body <- paramLabelItem
return $ Label lbl body
paramItem :: Parser Expr
paramItem = parens paramRecord <|> litInt <|> try paramLabel <|> ident
paramRecord :: Parser Expr
paramRecord = ExprList <$> commaSep1 paramItem
func :: Parser Expr
func = do
name <- identifier
params <- paramRecord
reservedOp "="
body <- paramRecord
return $ (Func name params body)
parseExpr :: String -> Either ParseError Expr
parseExpr s = parse func "" s
I can parse foo (x) = 1 but I cannot parse foo x = 1
parseExpr "foo x = 1"
Left (line 1, column 10):
unexpected end of input
expecting digit, "," or "="
I understand it tries to parse this code like Func "foo" (Label "x" 1) and fails. But after fail why it cannot try to parse it like Func "foo" (Ident "x") = 1
Is there any way to do it?
Also I have tried to swap ident and paramLabel
paramItem = parens paramRecord <|> litInt <|> try paramLabel <|> ident
paramItem = parens paramRecord <|> litInt <|> try ident <|> paramLabel
In this case I can parse foo x = 1 but I cannot parse foo (x = 1) = 2
parseExpr "foo (x = 1) = 2"
Left (line 1, column 8):
unexpected "="
expecting "," or ")"
Here is how I understand how Parsec backtracking works (and doesn't work):
In:
(try a <|> try b <|> c)
if a fails, b will be tried, and if b subsequently fails c will be tried.
However, in:
(try a <|> try b <|> c) >> d
if a succeeds but d fails, Parsec does not go back and try b. Once a succeeded Parsec considers the whole choice as parsed and it moves on to d. It will never go back to try b or c.
This doesn't work either for the same reason:
(try (try a <|> try b <|> c)) >> d
Once a or b succeeds, the whole choice succeeds, and therefor the outer try succeeds. Parsing then moves on to d.
A solution is to distribute d to within the choice:
try (a >> d) <|> try (b >> d) <|> (c >> d)
Now if a succeeds but d fails, b >> d will be tried.

How do I create a Parser data?

I am trying to learn how can I do a parser for expressions in Haskell and I found this code (below), but I don't even know how to use it.
I tried with: expr (Add (Num 5) (Num 2)) , but it needs a "Parser" data type.
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Expr
import Text.Parsec.Token
import Text.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr | Sub Expr Expr | Mul Expr Expr | Div Expr Expr deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc = Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
Solution: readExpr input = parse expr "name for error messages" input
and use readExpr.
You can use the function parse which will run a Parser on an input string and return an Either ParseError Expr. I put a simple usage below where I turn that ParseError into a string and pass it along
readExpr :: String -> Either String Expr
readExpr input = case parse expr "name for error messages" input of
Left err -> Left $ "Oh noes parsers are failing: " ++ show err -- Handle error
Right a -> Right a -- Handle success
There are a few other functions, such as parseFromFile, which let you shorthand a few common patterns, to find them, check out the parsec haddock

Why doesn't Parsec allow white spaces before operators in the table for buildExpressionParser

In the code below I can correctly parse white spaces after each of the tokens using Parsec:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
whitespace;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
whitespace;
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
However I get a parse error when trying to only parse white spaces before, and after the operators:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = whitespace >> char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
The parse error is:
$ ./parsec_example < <(echo "2 * 2 * 3")
"(stdin)" (line 2, column 1):
unexpected end of input
expecting "*"
Why does this happen? Is there some other way to parse white space around just the operators?
When I test your code, 2 * 2 * 3 parses correctly, but 2 + 2 does not. Parsing fails because the parser for * consumes some input and backtracking isn't enabled at that position, so other parsers cannot be tried.
An expression parser created by buildExpressionParser tries to parse each operator in turn until one succeeds. When parsing 2 + 2, the following occurs:
The first 2 is matched by number. The rest of the input is  + 2 (note the space at the beginning).
The parser genParser '*' (*) is applied to the input. It consumes the space, but does not match the + character.
The other infix operator parsers automatically fail because some input was consumed by genParser '*' (*).
You can fix this by wrapping the critical part of the parser in try. This saves the input until after char s succeeds. If char s fails, then buildExpressionParser can backtrack and try another infix operator.
genParser s f = try (whitespace >> char s) >> whitespace >> return f
The drawback of this parser is that, because it backtracks to before the leading whitespace before an infix operator, it repeatedly scans whitespace. It is usually better to parse whitespace after a successful match, like the OP's first parser example.

Using Parsec to parse regular expressions

I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.
Any thoughts?
You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.
Here's my attempt (I threw in |, +, and ? for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Maybe a more experienced Haskeller will come along with a better solution.

Parsing function in haskell

I'm new to Haskell and I am trying to parse expressions. I found out about Parsec and I also found some articles but I don't seem to understand what I have to do. My problem is that I want to give an expression like "x^2+2*x+3" and the result to be a function that takes an argument x and returns a value. I am very sorry if this is an easy question but I really need some help. Thanks! The code I inserted is from the article that you can find on this link.
import Control.Monad(liftM)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr
| Sub Expr Expr | Mul Expr Expr | Div Expr Expr
| Pow Expr Expr
deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "^" Pow AssocRight],
[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc
= Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
This is just a parser for expressions with variables. Actually interpreting the expression is an entirely separate matter.
You should create a function that takes an already parsed expression and values for variables, and returns the result of evaluating the expression. Pseudocode:
evaluate :: Expr -> Map String Int -> Int
evaluate (Num n) _ = n
evaluate (Var x) vars = {- Look up the value of x in vars -}
evaluate (Plus e f) vars = {- Evaluate e and f, and return their sum -}
...
I've deliberately omitted some details; hopefully by exploring the missing parts, you learn more about Haskell.
As a next step, you should probably look at the Reader monad for a convenient way to pass the variable map vars around, and using Maybe or Error to signal errors, e.g. referencing a variable that is not bound in vars, or division by zero.

Resources