F# FParsec parsing multiplication - parsing

I am trying to tackle the scariest part of programming for me and that is parsing and ASTs. I am working on a trivial example using F# and FParsec. I am wanting to parse a simple series of multiplications. I am only getting the first term back though. Here is what I have so far:
open FParsec
let test p str =
match run p str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
type Expr =
| Float of float
| Multiply of Expr * Expr
let parseExpr, impl = createParserForwardedToRef ()
let pNumber = pfloat .>> spaces |>> (Float)
let pMultiply = parseExpr .>> pstring "*" >>. parseExpr
impl := pNumber <|> pMultiply
test parseExpr "2.0 * 3.0 * 4.0 * 5.0"
When I run this I get the following:
> test parseExpr "2.0 * 3.0 * 4.0 * 5.0";;
Success: Float 2.0
val it : unit = ()
My hope was that I get a nested set of multiplications. I feel like I am missing something tremendously obvious.

Parser combinators like FParsec are not equivalent to BNF grammars. The big difference is that when you have an alternative (<|> in FParsec), the cases are tried in order. If the left parser is successful, then it is returned and the right parser isn't tried. If the left parser fails after consuming some input, then the failure is returned and the right parser isn't tried either. It's only if the left parser fails without consuming any input that the right parser is tried. [1]
In your pNumber <|> pMultiply, pNumber is successful and returned immediately without trying to do pMultiply. You might think to fix that by writing pMultiply <|> pNumber instead, but that's not good either: when parsing the last number, pMultiply will fail to find a * after having consumed some input for its parseExpr, so the whole parsing will be marked as failed.
You generally want to use FParsec's combinator functions as much as possible, and in this case the best solution is probably to use chainl1.
let pNumber = pfloat .>> spaces |>> Float
let pTimes = pstring "*" .>> spaces >>% (fun x y -> Multiply (x, y))
let pMultiply = chainl1 pNumber pTimes
If your goal was to learn how to use BNF grammars, you probably want to look at FsLex and FsYacc rather than FParsec.
[1] There's a function attempt that turns a consuming failure into a non-consuming failure, but it should be used as sparingly as possible.

Related

Using makeExprParser with ambiguity

I'm currently encountering a problem while translating a parser from a CFG-based tool (antlr) to Megaparsec.
The grammar contains lists of expressions (handled with makeExprParser) that are enclosed in brackets (<, >) and separated by ,.
Stuff like <>, <23>, <23,87> etc.
The problem now is that the expressions may themselves contain the > operator (meaning "greater than"), which causes my parser to fail.
<1223>234> should, for example, be parsed into [BinaryExpression ">" (IntExpr 1223) (IntExpr 234)].
I presume that I have to strategically place try somewhere, but the places I tried (to the first argument of sepBy and the first argument of makeExprParser) did unfortunately not work.
Can I use makeExprParser in such a situation or do I have to manually write the expression parser?:
This is the relevant part of my parser:
-- uses megaparsec, text, and parser-combinators
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad.Combinators.Expr
import Data.Text
import Data.Void
import System.Environment
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type BinaryOperator = Text
type Name = Text
data Expr
= IntExpr Integer
| BinaryExpression BinaryOperator Expr Expr
deriving (Eq, Show)
type Parser = Parsec Void Text
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "//") (L.skipBlockCommentNested "/*" "*/")
parseInteger :: Parser Expr
parseInteger = do
number <- some digitChar
_ <- sc
return $ IntExpr $ read number
parseExpr :: Parser Expr
parseExpr = makeExprParser parseInteger [[InfixL (BinaryExpression ">" <$ symbol ">")]]
parseBracketList :: Parser [Expr]
parseBracketList = do
_ <- symbol "<"
exprs <- sepBy parseExpr (symbol ",")
_ <- symbol ">"
return exprs
main :: IO ()
main = do
text : _ <- getArgs
let res = runParser parseBracketList "stdin" (pack text)
case res of
(Right suc) -> do
print suc
(Left err) ->
putStrLn $ errorBundlePretty err
You've (probably) misdiagnosed the problem. Your parser fails on <1233>234> because it's trying to parse > as a left associative operator, like +. In other words, the same way:
1+2+
would fail, because the second + has no right-hand operand, your parser is failing because:
1233>234>
has no digit following the second >. Assuming you don't want your > operator to chain (i.e., 1>2>3 is not a valid Expr), you should first replace InfixL with InfixN (non-associative) in your makeExprParser table. Then, it will parse this example fine.
Unfortunately, with or without this change your parser will still fail on the simpler test case:
<1233>
because the > is interpreted as an operator within a continuing expression.
In other words, the problem isn't that your parser can't handle expressions with > characters, it's that it's overly aggressive in treating > characters as part of an expression, preventing them from being recognized as the closing angle bracket.
To fix this, you need to figure out exactly what you're parsing. Specifically, you need to resolve the ambiguity in your parser by precisely characterizing the situations where > can be part of a continuing expression and where it can't.
One rule that will probably work is to only consider a > as an operator if it is followed by a valid "term" (i.e., a parseInteger). You can do this with lookAhead. The parser:
symbol ">" <* lookAhead term
will parse a > operator only if it is followed by a valid term. If it fails to find a term, it will consume some input (at least the > symbol itself), so you must surround it with a try:
try (symbol ">" <* lookAhead term)
With the above two fixes applied to parseExpr:
parseExpr :: Parser Expr
parseExpr = makeExprParser term
[[InfixN (BinaryExpression ">" <$ try (symbol ">" <* lookAhead term))]]
where term = parseInteger
you'll get the following parses:
λ> parseTest parseBracketList "<23>"
[IntExpr 23]
λ> parseTest parseBracketList "<23,87>"
[IntExpr 23,IntExpr 87]
λ> parseTest parseBracketList "<23,87>18>"
[IntExpr 23,BinaryExpression ">" (IntExpr 87) (IntExpr 18)]
However, the following will fail:
λ> parseTest parseBracketList "<23,87>18"
1:10:
|
1 | <23,87>18
| ^
unexpected end of input
expecting ',', '>', or digit
λ>
because the fact that the > is followed by 18 means that it is a valid operator, and it is parse failure that the valid expression 87>18 is followed by neither a comma nor a closing > angle bracket.
If you need to parse something like <23,87>18, you have bigger problems. Consider the following two test cases:
<1,2>3,4,5,6,7,...,100000000000,100000000001>
<1,2>3,4,5,6,7,...,100000000000,100000000001
It's a challenge to write an efficient parser that will parse the first one as a list of 10000000000 expressions but the second one as a list of two expression:
[IntExpr 1, IntExpr 2]
followed by some "extra" text. Hopefully, the underlying "language" you're trying to parse isn't so hopelessly broken that this will be an issue.

Using `opt` combinator in uu-parsinglib

I am writing a parser for a simple text template language for my project, and I am completely stuck on opt combinator in uu-parsinglib (version 2.7.3.2 in case that matters). Any ideas on how to use it properly?
Here is a very simplified example that shows my predicament.
{-# LANGUAGE FlexibleContexts #-}
import Text.ParserCombinators.UU hiding (pEnd)
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
pIdentifier :: Parser String
pIdentifier = pMany pLetter
pIfClause :: Parser ((String, String), String, Maybe (String, String), String)
pIfClause = (,,,) <$> pIf <*> pIdentifier <*> pOptionalElse <*> pEnd
pIf :: Parser (String, String)
pIf = pBraces ((,) <$> pToken "if " <*> pIdentifier)
pOptionalElse :: Parser (Maybe (String, String))
pOptionalElse = (((\x y -> Just (x, y)) <$> pElse <*> pIdentifier) `opt` Nothing)
pElse :: Parser String
pElse = pBraces (pToken "else")
pEnd :: Parser String
pEnd = pBraces (pToken "end")
main :: IO ()
main = do
putStrLn $ show $ runParser "works" pIfClause "{if abc}def{else}ghi{end}"
putStrLn $ show $ runParser "doesn't work" pIfClause "{if abc}def{end}"
The first string parses properly but the second fails with error:
main: Failed parsing 'doesn't work' :
Expected at position LineColPos 0 12 12 expecting one of [Whitespace, "else"] at LineColPos 0 12 12 :
v
{if abc}def{end}
^
The documentation for opt says:
If p can be recognized, the return value of p is used. Otherwise, the value v is used. Note that opt by default is greedy.
What greedy means is explained in the documentation for <<|>:
<<|> is the greedy version of <|>. If its left hand side parser can make any progress then it commits to that alternative.
In your case, the first argument to opt does recognize part of the input, because else and end both start with e. Thus, it commits to pElse, which fails and makes the whole parse fail.
An easy way to solve this is to use ... <|> pure Nothing, as the documentation suggests.

Can someone give an example of using chainl1 in FParsec?

This is the most puzzling combinator in all of FParsec...
http://www.quanttec.com/fparsec/reference/primitives.html#members.chainl1
...but there is no example on how to use it in the documentation or, AFAIK, on any web pages on the internet. I have a left-recursive parse that seems to require it, but for the life of me I can't figure out how to call it or what to pass to it.
Please help :)
I have some pretty diagrams involving chainl1 (from my own C# code) here:
http://lorgonblog.wordpress.com/2007/12/04/monadic-parser-combinators-part-three/
I put together a simple expression parser in FParsec at the end of this unrelated post. Here's an excerpt using chainl1 to make a parser for a chained operator expression from parsers for the operand and operator.
(* fop : (double -> double -> double) -> (env -> double) -> (env -> double) -> env -> double *)
let fop op fa fb env = fa env |> op <| fb env
(* Parse single operators - return function taking two operands and giving the result *)
let (addop : Parser<_,unit>) =
sym "+" >>% fop (+)
<|> ( sym "-" >>% fop (-) )
(* term, expr - chain of operators of a given precedence *)
let term = chainl1 atom mulop
let expr = chainl1 term addop

Making attoparsec parsers recursive

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Parsing in Haskell for a simple interpreter

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.
Here is the piece of code which I am having problems with
data IntExp
= IVar Var
| ICon Int
| Add IntExp IntExp
deriving (Read, Show)
whitespace = many1 (char ' ')
parseICon :: Parser IntExp
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
parseIVar :: Parser IntExp
parseIVar =
do x <- many (letter)
prime <- string "'" <|> string ""
return (IVar (x ++ prime))
parseIntExp :: Parser IntExp
parseIntExp =
do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
return x
parseAdd :: Parser IntExp
parseAdd =
do x <- parseIntExp
whitespace
string "+"
whitespace
y <- parseIntExp
return (Add x y)
runP :: Show a => Parser a -> String -> IO ()
runP p input
= case parse p "" input of
Left err ->
do putStr "parse error at "
print err
Right x -> print x
The language is slightly more complex, but this is enough to show my problem.
So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully
runP parseAdd "5 + 5"
which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg
runP parseAdd "n + m"
This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.
So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.
I hope I've given enough info about my problem and I was clear enough.
Thanks for any help you can give me!
Your problem is actually in parseICon:
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.
And while I'm at it, since you're new to Haskell, here's some unsolicited advice:
Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.
You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.
You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.
It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.
It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.
Also, doing this sort of thing is redundant and unnecessary:
parseICon :: Parser IntExp
parseICon =
do x <- many digit
return (ICon (read x))
When all you're doing is applying a regular function to the result of a parser, you can just use fmap:
parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)
They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.
Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:
whitespace = many1 $ char ' '
parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit
parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName
parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime
parsePrime :: Parser String
parsePrime = option "" $ string "'"
parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd
parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()
parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
I'm also new to Haskell, just wondering:
will parseIntExp ever make it to parseAdd?
It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.
e.g. runP parseIntExp "3 + m"
would try parseICon, and succeed, giving
(ICon 3) instead of (Add (ICon 3) (IVar m))
Sorry if I'm being stupid here, I'm just unsure.

Resources