Left-recursive grammars in operator precedence parsing - f#

I have a left recursive grammar. My AST looks something like:
...
and Expr = BinaryExpr of BinaryExpr
and BinaryExpr = Expr * BinaryOperator * Expr
and BinaryOperator = Plus
...
I was planning on using the operator precedence parser for this, for example:
let exprOpp = new OperatorPrecedenceParser<Expr,unit,unit>()
let pBinaryExpr = exprOpp.ExpressionParser
exprOpp.TermParser <- pExpr
let consBinExpr op x y = (x, op, y) |> BinaryExpr
exprOpp.AddOperator(InfixOperator("+", ws, 1, Associativity.Left, (consBinExpr Plus)))
where pExpr is a parser for Expr.
The left recursion causes a stack overflow. I was wondering whether there was a particular FParsec-way of dealing with this while still using the OperatorPrecedenceParser?
Thanks!
EDIT:
The issue may be coming from the fact that pExpr forwards all calls to another parser pExprRef.
let pExpr, pExprRef = createParserForwardedToRef()
and after the operator precedence parser, the reference is defined:
do pExprRef :=
choice [pBinaryExpr, <other-parsers...>];

Related

How to parse recusrive grammar in FParsec

Previous questions which I could not use to get this to work
Recursive grammars in FParsec
Seems to be an old question which was asked before createParserForwardedToRef was added to FParsec
AST doesn't seem to be as horribly recursive as mine.
Parsing in to a recursive data structure
Grammar relies on a special character '[' to indicate another nesting level. I don't have this luxury
I want to build a sort of Lexer and project system for a language I have found myself writing lately. The language is called q. It is a fairly simple language and has no operator precedence. For example 1*2+3 is the same as (1*(2+3)). It works a bit like a reverse polish notation calculator, evaluation is right to left.
I am having trouble expressing this in FParsec. I have put together the following simplified demo
open FParsec
type BinaryOperator = BinaryOperator of string
type Number = Number of string
type Element =
|Number of Number
and Expression =
|Element of Element
|BinaryExpression of Element * BinaryOperator * Expression
let number = regex "\d+\.?\d*" |>> Number.Number
let element = [ number ] |> choice |>> Element.Number
let binaryOperator = ["+"; "-"; "*"; "%"] |> Seq.map pstring |> choice |>> BinaryOperator
let binaryExpression expression = pipe3 element binaryOperator expression (fun l o r -> (l,o,r))
let expression =
let exprDummy, expRef = createParserForwardedToRef()
let elemExpr = element |>> Element
let binExpr = binaryExpression exprDummy |>> BinaryExpression
expRef.Value <- [binExpr; elemExpr; ] |> choice
expRef
let statement = expression.Value .>> eof
let parseString s =
printfn "Parsing input: '%s'" s
match run statement s with
| Success(result, _, _) -> printfn "Ok: %A" result
| Failure(errorMsg, _, _) -> printfn "Error: %A" errorMsg
//tests
parseString "1.23"
parseString "1+1"
parseString "1*2+3" // equivalent to (1*(2+3))
So far, I haven't been able to come up with a way to satisfy all 3 tests cases. In the above, it tries to parse binExpr first, realises it can't, but then must be consuming the input because it doesn't try to evaluate elemExpr next. Not sure what to do. How do I satisfy the 3 tests?
Meditating on Tomas' answer, I have come up with the following that works
let expr, expRef = createParserForwardedToRef()
let binRightExpr = binaryOperator .>>. expr
expRef.Value <- parse{
let! first = element
return! choice [
binRightExpr |>> (fun (o, r) -> (first, o, r) |> BinaryExpression)
preturn (first |> Element)
]
}
let statement = expRef.Value .>> eof
The reason the first parser failed is given in the FParsec docs
The behaviour of the <|> combinator has two important characteristics:
<|> only tries the parser on the right side if the parser on the left
side fails. It does not implement a longest match rule.
However, it only tries the right parser if the left parser fails without consuming input.
Probably need to clean up a few things like the structure of the AST but I think I am good to go.

How to insert an indentation check into the operator precedence parser?

I am working on the parsing stage for the language I am making and am having difficulty with the following.
let test2 = // I'd like this to be an error.
"""
2
+ 2
"""
let result = run (spaces >>. expr) test2
val result : ParserResult<CudaExpr,unit> =
Success: Add (LitInt32 2,LitInt32 2)
I already managed to make the following example when the terms are indented incorrectly
2 +
2
give me an error, but not when the operator is on the wrong indentation level. I need something like a before-parse check.
let operators expr i =
let f expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f expr
f opp.ExpressionParser
The above function is how the operators phase is structured and as you can see, the term parsers get wrapped in a function that does the indentation check, but the last line is faulty.
Here is a simplified example of the full parser.
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsecCS.dll"
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsec.dll"
open FParsec
type Expr =
| V of string
| Add of Expr * Expr
let identifier = many1Satisfy2L isAsciiLetter (fun x -> isAsciiLetter x || isDigit x || x = ''') "identifier" .>> spaces |>> V
let indentations expressions (s: CharStream<_>) =
let i = s.Column
let expr_indent expr (s: CharStream<_>) =
let expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
many1 expr s
expr_indent (expressions i) s
let expr =
let opp = new OperatorPrecedenceParser<_,_,_>()
opp.AddOperator(InfixOperator("+", spaces, 6, Associativity.Left, fun x y -> Add(x,y)))
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
let rec expr s = indentations (operators identifier) s
expr
let test2 = // I'd like this to be an error.
"""
a
+
b
"""
let result = run (spaces >>. expr) test2
The full parser so far can be found here.
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
I did not realize it 2.5 weeks ago, but what happens when a new block gets opened and expr s gets called is that the term parser gets overwritten with the new indentation and there is no way to back it up and restore it on exit. I did a bit of looking around and managed to adapt the Pratt top down parsing method for my purposes.
Here is a talk by Douglas Crockford on the method.
let poperator: Parser<_,_> =
let f c = (isAsciiIdContinue c || isAnyOf [|' ';'\t';'\n';'\"';'(';')';'{';'}';'[';']'|] c) = false
(many1Satisfy f .>> spaces)
>>= fun token ->
match dict_operator.TryGetValue token with
| true, x -> preturn x
| false, _ -> fail "unknown operator"
let rec led poperator term left (prec,asoc,m) =
match asoc with
| Associativity.Left | Associativity.None -> tdop poperator term prec |>> m left
| Associativity.Right -> tdop poperator term (prec-1) |>> m left
| _ -> failwith "impossible"
and tdop poperator term rbp =
let rec f left =
poperator >>= fun (prec,asoc,m as v) ->
if rbp < prec then led poperator term left v >>= loop
else pzero
and loop left = attempt (f left) <|>% left
term >>= loop
let operators expr i (s: CharStream<_>) =
let expr_indent expr (s: CharStream<_>) = expr_indent i (<=) expr s
let op s = expr_indent poperator s
let term s = expr_indent expr s
tdop op term 0 s
The led and tdop functions which do the actual precedence parsing are 10 lines long. The above is just a snippet of the full parser for the language I am making - in terms of syntax it is similar to F# and is indentation sensitive. Here is a more straightforward F# translation of Douglas Crockford's Javascript example.

Using Parsec to parse regular expressions

I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.
Any thoughts?
You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.
Here's my attempt (I threw in |, +, and ? for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Maybe a more experienced Haskeller will come along with a better solution.

Parsing function in haskell

I'm new to Haskell and I am trying to parse expressions. I found out about Parsec and I also found some articles but I don't seem to understand what I have to do. My problem is that I want to give an expression like "x^2+2*x+3" and the result to be a function that takes an argument x and returns a value. I am very sorry if this is an easy question but I really need some help. Thanks! The code I inserted is from the article that you can find on this link.
import Control.Monad(liftM)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr
| Sub Expr Expr | Mul Expr Expr | Div Expr Expr
| Pow Expr Expr
deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "^" Pow AssocRight],
[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc
= Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
This is just a parser for expressions with variables. Actually interpreting the expression is an entirely separate matter.
You should create a function that takes an already parsed expression and values for variables, and returns the result of evaluating the expression. Pseudocode:
evaluate :: Expr -> Map String Int -> Int
evaluate (Num n) _ = n
evaluate (Var x) vars = {- Look up the value of x in vars -}
evaluate (Plus e f) vars = {- Evaluate e and f, and return their sum -}
...
I've deliberately omitted some details; hopefully by exploring the missing parts, you learn more about Haskell.
As a next step, you should probably look at the Reader monad for a convenient way to pass the variable map vars around, and using Maybe or Error to signal errors, e.g. referencing a variable that is not bound in vars, or division by zero.

Parsing: Lazy initialization and mutually recursive monads in F#

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.
I first implemented the code in Haskell (with Parsec) which ran perfectly well.
The parsers for infix expressions are designed mutually recursive.
parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
x <- ignoreSpaces $ subParser
do
op <- ignoreSpaces $ operatorParser
y <- parseInfixOp operatorParser subParser
return $ BinaryOp op x y
<|> return x
parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)
parseExpr :: Parser Expression
parseExpr = parseInfix0
parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor
parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)
parseFactor' :: Parser Expression
parseFactor' = parseString
<|> parseBool
<|> parseNumber
<|> parseVariable
<|> (try parseFunCall) <|> parseIdentifier
Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.
let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
let! first = letter
let! rest = many (letter <|> digit)
let funcName = toStr $ first::rest
do! ignoreSpace
let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","
return FunCall(funcName, args)
}
and parseFactor' =
parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier
F# now either complains about recursive objects and just throws a StackOverflowException at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".
What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?
What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.
If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than
...
and parseInfix2 = body
...
use
...
and parseInfix2 x = (body) x
...
even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).
I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').
Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.
η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.
You want something like this:
let rec p1 = lazy (...)
and p2 = ... p1.Value ..
A better way, if you have a workflow builder, is to define the Delay member to do this for you:
type Builder() =
member this.Delay(f: unit -> Parser<_>) : Parser<_> =
let promise = Lazy.Create f
Return () |> Bind (fun () -> promise.Value)
let parse = new Builder()
let rec p1 =
parse {
...
}
and p2 =
parse {
...
}
Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.
Top level, I have:
let rec myParser s = (parseExpr myParser) s
Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.

Resources