Parsing: Lazy initialization and mutually recursive monads in F# - parsing

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.
I first implemented the code in Haskell (with Parsec) which ran perfectly well.
The parsers for infix expressions are designed mutually recursive.
parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
x <- ignoreSpaces $ subParser
do
op <- ignoreSpaces $ operatorParser
y <- parseInfixOp operatorParser subParser
return $ BinaryOp op x y
<|> return x
parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)
parseExpr :: Parser Expression
parseExpr = parseInfix0
parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor
parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)
parseFactor' :: Parser Expression
parseFactor' = parseString
<|> parseBool
<|> parseNumber
<|> parseVariable
<|> (try parseFunCall) <|> parseIdentifier
Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.
let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
let! first = letter
let! rest = many (letter <|> digit)
let funcName = toStr $ first::rest
do! ignoreSpace
let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","
return FunCall(funcName, args)
}
and parseFactor' =
parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier
F# now either complains about recursive objects and just throws a StackOverflowException at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".
What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?

What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.
If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than
...
and parseInfix2 = body
...
use
...
and parseInfix2 x = (body) x
...
even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).
I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').
Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.

η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.
You want something like this:
let rec p1 = lazy (...)
and p2 = ... p1.Value ..
A better way, if you have a workflow builder, is to define the Delay member to do this for you:
type Builder() =
member this.Delay(f: unit -> Parser<_>) : Parser<_> =
let promise = Lazy.Create f
Return () |> Bind (fun () -> promise.Value)
let parse = new Builder()
let rec p1 =
parse {
...
}
and p2 =
parse {
...
}

Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.
Top level, I have:
let rec myParser s = (parseExpr myParser) s
Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.

Related

How to parse recusrive grammar in FParsec

Previous questions which I could not use to get this to work
Recursive grammars in FParsec
Seems to be an old question which was asked before createParserForwardedToRef was added to FParsec
AST doesn't seem to be as horribly recursive as mine.
Parsing in to a recursive data structure
Grammar relies on a special character '[' to indicate another nesting level. I don't have this luxury
I want to build a sort of Lexer and project system for a language I have found myself writing lately. The language is called q. It is a fairly simple language and has no operator precedence. For example 1*2+3 is the same as (1*(2+3)). It works a bit like a reverse polish notation calculator, evaluation is right to left.
I am having trouble expressing this in FParsec. I have put together the following simplified demo
open FParsec
type BinaryOperator = BinaryOperator of string
type Number = Number of string
type Element =
|Number of Number
and Expression =
|Element of Element
|BinaryExpression of Element * BinaryOperator * Expression
let number = regex "\d+\.?\d*" |>> Number.Number
let element = [ number ] |> choice |>> Element.Number
let binaryOperator = ["+"; "-"; "*"; "%"] |> Seq.map pstring |> choice |>> BinaryOperator
let binaryExpression expression = pipe3 element binaryOperator expression (fun l o r -> (l,o,r))
let expression =
let exprDummy, expRef = createParserForwardedToRef()
let elemExpr = element |>> Element
let binExpr = binaryExpression exprDummy |>> BinaryExpression
expRef.Value <- [binExpr; elemExpr; ] |> choice
expRef
let statement = expression.Value .>> eof
let parseString s =
printfn "Parsing input: '%s'" s
match run statement s with
| Success(result, _, _) -> printfn "Ok: %A" result
| Failure(errorMsg, _, _) -> printfn "Error: %A" errorMsg
//tests
parseString "1.23"
parseString "1+1"
parseString "1*2+3" // equivalent to (1*(2+3))
So far, I haven't been able to come up with a way to satisfy all 3 tests cases. In the above, it tries to parse binExpr first, realises it can't, but then must be consuming the input because it doesn't try to evaluate elemExpr next. Not sure what to do. How do I satisfy the 3 tests?
Meditating on Tomas' answer, I have come up with the following that works
let expr, expRef = createParserForwardedToRef()
let binRightExpr = binaryOperator .>>. expr
expRef.Value <- parse{
let! first = element
return! choice [
binRightExpr |>> (fun (o, r) -> (first, o, r) |> BinaryExpression)
preturn (first |> Element)
]
}
let statement = expRef.Value .>> eof
The reason the first parser failed is given in the FParsec docs
The behaviour of the <|> combinator has two important characteristics:
<|> only tries the parser on the right side if the parser on the left
side fails. It does not implement a longest match rule.
However, it only tries the right parser if the left parser fails without consuming input.
Probably need to clean up a few things like the structure of the AST but I think I am good to go.

Left-recursive grammars in operator precedence parsing

I have a left recursive grammar. My AST looks something like:
...
and Expr = BinaryExpr of BinaryExpr
and BinaryExpr = Expr * BinaryOperator * Expr
and BinaryOperator = Plus
...
I was planning on using the operator precedence parser for this, for example:
let exprOpp = new OperatorPrecedenceParser<Expr,unit,unit>()
let pBinaryExpr = exprOpp.ExpressionParser
exprOpp.TermParser <- pExpr
let consBinExpr op x y = (x, op, y) |> BinaryExpr
exprOpp.AddOperator(InfixOperator("+", ws, 1, Associativity.Left, (consBinExpr Plus)))
where pExpr is a parser for Expr.
The left recursion causes a stack overflow. I was wondering whether there was a particular FParsec-way of dealing with this while still using the OperatorPrecedenceParser?
Thanks!
EDIT:
The issue may be coming from the fact that pExpr forwards all calls to another parser pExprRef.
let pExpr, pExprRef = createParserForwardedToRef()
and after the operator precedence parser, the reference is defined:
do pExprRef :=
choice [pBinaryExpr, <other-parsers...>];

Parsing in to a recursive data structure

I wish to parse a string in to a recursive data structure using F#. In this question I'm going to present a simplified example that cuts to the core of what I want to do.
I want to parse a string of nested square brackets in to the record type:
type Bracket = | Bracket of Bracket option
So:
"[]" -> Bracket None
"[[]]" -> Bracket ( Some ( Bracket None) )
"[[[]]]" -> Bracket ( Some ( Bracket ( Some ( Bracket None) ) ) )
I would like to do this using the parser combinators in the FParsec library. Here is what I have so far:
let tryP parser =
parser |>> Some
<|>
preturn None
/// Parses up to nesting level of 3
let parseBrakets : Parser<_> =
let mostInnerLevelBracket =
pchar '['
.>> pchar ']'
|>> fun _ -> Bracket None
let secondLevelBracket =
pchar '['
>>. tryP mostInnerLevelBracket
.>> pchar ']'
|>> Bracket
let firstLevelBracket =
pchar '['
>>. tryP secondLevelBracket
.>> pchar ']'
|>> Bracket
firstLevelBracket
I even have some Expecto tests:
open Expecto
[<Tests>]
let parserTests =
[ "[]", Bracket None
"[[]]", Bracket (Some (Bracket None))
"[[[]]]", Bracket ( Some (Bracket (Some (Bracket None)))) ]
|> List.map(fun (str, expected) ->
str
|> sprintf "Trying to parse %s"
|> testCase
<| fun _ ->
match run parseBrakets str with
| Success (x, _,_) -> Expect.equal x expected "These should have been equal"
| Failure (m, _,_) -> failwithf "Expected a match: %s" m
)
|> testList "Bracket tests"
let tests =
[ parserTests ]
|> testList "Tests"
runTests defaultConfig tests
The problem is of course how to handle and arbitrary level of nesting - the code above only works for up to 3 levels. The code I would like to write is:
let rec pNestedBracket =
pchar '['
>>. tryP pNestedBracket
.>> pchar ']'
|>> Bracket
But F# doesn't allow this.
Am I barking up the wrong tree completely with how to solve this (I understand that there are easier ways to solve this particular problem)?
You are looking for FParsecs createParserForwardedToRef method. Because parsers are values and not functions it is impossible to make mutually recursive or self recursive parsers in order to do this you have to in a sense declare a parser before you define it.
Your final code will end up looking something like this
let bracketParser, bracketParserRef = createParserForwardedToRef<Bracket>()
bracketParserRef := ... //here you can finally declare your parser
//you can reference bracketParser which is a parser that uses the bracketParserRef
Also I would recommend this article for basic understanding of parser combinators. https://fsharpforfunandprofit.com/posts/understanding-parser-combinators/. The final section on a JSON parser talks about the createParserForwardedToRef method.
As an example of how to use createParserForwardedToRef, here's a snippet from a small parser I wrote recently. It parses lists of space-separated integers between brackets (and the lists can be nested), and the "integers" can be small arithmetic expressions like 1+2 or 3*5.
type ListItem =
| Int of int
| List of ListItem list
let pexpr = // ... omitted for brevity
let plist,plistImpl = createParserForwardedToRef()
let pListContents = (many1 (plist |>> List .>> spaces)) <|>
(many (pexpr |>> Int .>> spaces))
plistImpl := pchar '[' >>. spaces
>>. pListContents
.>> pchar ']'
P.S. I would have put this as a comment to Thomas Devries's answer, but a comment can't contain nicely-formatted code. Go ahead and accept his answer; mine is just intended to flesh his out.

Use FParsec to parse float or int*float

I've just started out playing around with FParsec, and I'm now trying to parse strings on the following format
10*0.5 0.25 0.75 3*0.1 0.9
I want 3*0.1, for example, to be expanded into 0.1 0.1 0.1
What I have so far is the following
type UserState = unit
type Parser<'t> = Parser<'t, UserState>
let str s : Parser<_> = pstring s
let float_ws : Parser<_> = pfloat .>> spaces
let product = pipe2 pint32 (str "*" >>. float_ws) (fun x y -> List.init x (fun i -> y))
The product parser correctly parsers entries on the format int*float and expands it into a list of floats. However, I'm having trouble coming up with a solution that allows me to parse either int*float or just a float. I would like to do something like
many (product <|> float_ws)
This will of course not work since the return types of the parsers differ. Any ideas on how to make this work? Is it possible to wrap of modify float_ws such that it returns a list with only one float?
You can make float_ws return a float list by simply adding a |>> List.singleton
let float_ws : Parser<_> = pfloat .>> spaces |>> List.singleton
|>> is just the map function, where you apply some function to the result of one parser and receive a new parser of some new type:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
See: http://www.quanttec.com/fparsec/reference/primitives.html#members.:124::62::62:
Also, since product parser includes an int parser, it will successfully parse a character from the wrong case, this means the parser state will be changed. That means you cannot use the <|> operator on the first parser directly, you must also add attempt so FParsec can return to the original parser state.
let combined = many (attempt product <|> float_ws)

Parsec parsing and separating different structures

Let's say I have different parsers p1, ..., pk. I want to define a function pk :: Parser ([t1], ..., [tk]) where pi :: Parser ti.
That will parse a collection of strings (one after the other) that match any of p1...pk and separate them in the corresponding lists. Assume for simplicity that none of the strings matches two parsers.
I managed to do it, but I am really struggling to find an elegant way (preferably using many or any other builtin parsec parser).
The first step is to turn each of your parsers into parser of the big type:
p1 :: Parser t1
p2 :: Parser t2
p3 :: Parser t3
p1 = undefined
p2 = undefined
p3 = undefined
p1', p2', p3' :: Parser ([t1], [t2], [t3])
p1' = fmap (\x -> ([x], [], [])) p1
p2' = fmap (\x -> ([], [x], [])) p2
p3' = fmap (\x -> ([], [], [x])) p3
Now, we repeatedly choose from these last parsers, and concatenate the results at the end:
parser :: Parser ([t1], [t2], [t3])
parser = fmap mconcat . many . choice $ [p1', p2', p3']
There are Monoid instances for tuples up to size five; beyond that, you can either use nested tuples or a more appropriate data structure.
Representing the parsers as a list makes this easy. Using:
choice :: [Parser a] -> Parser a
many :: Parser a -> Parser [a]
We can write:
combineParsers :: [Parser a] -> Parser [a]
combineParsers = many . choice
This isn't quite right, as it bundles them all into a single list. Let's associate each parser with a unique identifier:
combineParsers' :: [(k, Parser a)] -> Parser [(k, a)]
combineParsers' = many . choice . combine
where combine = map (\(k,p) -> (,) k <$> p)
We can then convert this back to the list form:
combineParsers :: [Parser a] -> Parser [[a]]
combineParsers ps = map snd . sortBy fst <$> combineParsers' (zip [0..] ps)
You could perhaps make this more efficient by writing combineParsers' :: [(k, Parser a)] -> Parser (Map k [a]) instead, using e.g. combine = map $ \(k,p) -> fmap (\a -> Map.insertWith (++) k [a]) p.
This requires that all the parsers have the same result type, so you'll need to wrap each of their results in a data-type with Cons <$> p for each p. You can then unwrap the constructor from each list. Admittedly, this is fairly ugly, but to do it heterogeneously with tuples would require even uglier type-class hacks.
There might be a simpler solution for your specific use-case, but this is the easiest way to do it generically that I can think of.
You can't do this completely generically without type-level trickery, unfortunately. However, an approach along the lines of Daniel Wagner's suggestion can be constructed in DIY style, using polymorphic combinators and some pre-existing recursive instances. I'll present a simple example to demonstrate.
First, a few simpleminded parsers to test with:
type Parser = Parsec String ()
parseNumber :: Parser Int
parseNumber = read <$> many1 digit
parseBool :: Parser Bool
parseBool = string "yes" *> return True
<|> string "no" *> return False
parseName :: Parser String
parseName = many1 letter
Next we create a "base case" type to mark the end of the possibilities, and give it a parser that always succeeds on no input.
data Nil = Nil deriving (Eq, Ord, Read, Show, Enum, Bounded)
instance Monoid Nil where
mempty = Nil
mappend _ _ = Nil
parseNil :: Parser Nil
parseNil = return Nil
This is equivalent to (), of course--I'm only creating a new type to disambiguate in case we actually want to parse (). Next, the combinator that chains parsers together:
infixr 3 ?|>
(?|>) :: (Monoid b) => Parser a -> Parser b -> Parser ([a], b)
p1 ?|> p2 = ((,) <$> ((:[]) <$> p1) <*> pure mempty)
<|> ((,) <$> pure [] <*> p2)
What's going on here is that p1 ?|> p2 tries p1, and if it succeeds wraps that in a singleton list and fills in mempty for the second element of the tuple. If p1 fails, if fills in an empty list and uses p2 for the second element.
parseOne :: Parser ([Int], ([Bool], ([String], Nil)))
parseOne = parseNumber ?|> parseBool ?|> parseName ?|> parseNil
Combining parsers with the new combinator is simple, and the result type is pretty self-explanatory.
parseMulti :: Parser ([Int], ([Bool], ([String], Nil)))
parseMulti = mconcat <$> many1 (parseOne <* newline)
We're relying on the recursive Monoid instance for tuples and the usual instance for lists to combine multiple lines. Now, to show that it works, define a quick test:
runTest = parseTest parseMulti testInput
testInput = unlines [ "yes", "123", "acdf", "8", "no", "qq" ]
Which parses successfully as Right ([123,8],([True,False],(["acdf","qq"],Nil))).

Resources