I'm trying to parse the arrow type with FParsec.
That is, this:
Int -> Int -> Int -> Float -> Char
For example.
I tried with this code, but it only works for one type of arrow (Int -> Int) and no more. I also want to avoid parentheses, because I already have a tuple type that uses them, and I don't want it to be too heavy in terms of syntax either.
let ws = pspaces >>. many pspaces |>> (fun _ -> ())
let str_ws s = pstring s .>> ws
type Type = ArrowType of Type * Type
let arrowtype' =
pipe2
(ws >>. ty')
(ws >>. str_ws "->" >>. ws >>. ty')
(fun t1 t2 -> ArrowType(t1, t2))
let arrowtype =
pipe2
(ws >>. ty' <|> arrowtype')
(ws >>. str_ws "->" >>. ws >>. ty' <|> arrowtype')
(fun t1 t2 -> ArrowType(t1, t2)) <?> "arrow type"
ty' is just another types, like tuple or identifier.
Do you have a solution?
Before I get into the arrow syntax, I want to comment on your ws parser. Using |>> (fun _ -> ()) is a little inefficient since FParsec has to construct a result object then immediately throw it away. The built-in spaces and spaces1 parsers are probably better for your needs, since they don't need to construct a result object.
Now as for the issue you're struggling with, it looks to me like you want to consider the arrow parser slightly differently. What about treating it as a series of types separated by ->, and using the sepBy family of parser combinators? Something like this:
let arrow = spaces1 >>. pstring "->" .>> spaces1
let arrowlist = sepBy1 ty' arrow
let arrowtype = arrowlist |>> (fun types ->
types |> List.reduce (fun ty1 ty2 -> ArrowType(ty1, ty2))
Note that the arrowlist parser would also match against just plain Int, because the definition of sepBy1 is not "there must be at least one list separator", but rather "there must be at least one item in the list". So to distinguish between a type of Int and an arrow type, you'd want to do something like:
let typeAlone = ty' .>> notFollowedBy arrow
let typeOrArrow = attempt typeAlone <|> arrowtype
The use of attempt is necessary here so that the characters consumed by ty' will be backtracked if an arrow was present.
There's a complicating factor I haven't addressed at all since you mentioned not wanting parentheses. But if you decide that you want to be able to have arrow types of arrow types (that is, functions that take functions as input), you'd want to parse types like (Int -> Int) -> (Int -> Float) -> Char. This would complicate the use of sepBy, and I haven't addressed it at all. If you end up needing more complex parsing including parentheses, then it's possible you might want to use OperatorPrecedenceParser. But for your simple needs where parentheses aren't involved, sepBy1 looks like your best bet.
Finally, I should give a WARNING: I haven't tested this at all, just typed this into the Stack Overflow box. The code example I gave you is not intended to be working as-is, but rather to give you an idea of how to proceed. If you need a working-as-is example, I'll be happy to try to give you one, but I don't have the time to do so right now.
Related
Previous questions which I could not use to get this to work
Recursive grammars in FParsec
Seems to be an old question which was asked before createParserForwardedToRef was added to FParsec
AST doesn't seem to be as horribly recursive as mine.
Parsing in to a recursive data structure
Grammar relies on a special character '[' to indicate another nesting level. I don't have this luxury
I want to build a sort of Lexer and project system for a language I have found myself writing lately. The language is called q. It is a fairly simple language and has no operator precedence. For example 1*2+3 is the same as (1*(2+3)). It works a bit like a reverse polish notation calculator, evaluation is right to left.
I am having trouble expressing this in FParsec. I have put together the following simplified demo
open FParsec
type BinaryOperator = BinaryOperator of string
type Number = Number of string
type Element =
|Number of Number
and Expression =
|Element of Element
|BinaryExpression of Element * BinaryOperator * Expression
let number = regex "\d+\.?\d*" |>> Number.Number
let element = [ number ] |> choice |>> Element.Number
let binaryOperator = ["+"; "-"; "*"; "%"] |> Seq.map pstring |> choice |>> BinaryOperator
let binaryExpression expression = pipe3 element binaryOperator expression (fun l o r -> (l,o,r))
let expression =
let exprDummy, expRef = createParserForwardedToRef()
let elemExpr = element |>> Element
let binExpr = binaryExpression exprDummy |>> BinaryExpression
expRef.Value <- [binExpr; elemExpr; ] |> choice
expRef
let statement = expression.Value .>> eof
let parseString s =
printfn "Parsing input: '%s'" s
match run statement s with
| Success(result, _, _) -> printfn "Ok: %A" result
| Failure(errorMsg, _, _) -> printfn "Error: %A" errorMsg
//tests
parseString "1.23"
parseString "1+1"
parseString "1*2+3" // equivalent to (1*(2+3))
So far, I haven't been able to come up with a way to satisfy all 3 tests cases. In the above, it tries to parse binExpr first, realises it can't, but then must be consuming the input because it doesn't try to evaluate elemExpr next. Not sure what to do. How do I satisfy the 3 tests?
Meditating on Tomas' answer, I have come up with the following that works
let expr, expRef = createParserForwardedToRef()
let binRightExpr = binaryOperator .>>. expr
expRef.Value <- parse{
let! first = element
return! choice [
binRightExpr |>> (fun (o, r) -> (first, o, r) |> BinaryExpression)
preturn (first |> Element)
]
}
let statement = expRef.Value .>> eof
The reason the first parser failed is given in the FParsec docs
The behaviour of the <|> combinator has two important characteristics:
<|> only tries the parser on the right side if the parser on the left
side fails. It does not implement a longest match rule.
However, it only tries the right parser if the left parser fails without consuming input.
Probably need to clean up a few things like the structure of the AST but I think I am good to go.
I wish to parse a string in to a recursive data structure using F#. In this question I'm going to present a simplified example that cuts to the core of what I want to do.
I want to parse a string of nested square brackets in to the record type:
type Bracket = | Bracket of Bracket option
So:
"[]" -> Bracket None
"[[]]" -> Bracket ( Some ( Bracket None) )
"[[[]]]" -> Bracket ( Some ( Bracket ( Some ( Bracket None) ) ) )
I would like to do this using the parser combinators in the FParsec library. Here is what I have so far:
let tryP parser =
parser |>> Some
<|>
preturn None
/// Parses up to nesting level of 3
let parseBrakets : Parser<_> =
let mostInnerLevelBracket =
pchar '['
.>> pchar ']'
|>> fun _ -> Bracket None
let secondLevelBracket =
pchar '['
>>. tryP mostInnerLevelBracket
.>> pchar ']'
|>> Bracket
let firstLevelBracket =
pchar '['
>>. tryP secondLevelBracket
.>> pchar ']'
|>> Bracket
firstLevelBracket
I even have some Expecto tests:
open Expecto
[<Tests>]
let parserTests =
[ "[]", Bracket None
"[[]]", Bracket (Some (Bracket None))
"[[[]]]", Bracket ( Some (Bracket (Some (Bracket None)))) ]
|> List.map(fun (str, expected) ->
str
|> sprintf "Trying to parse %s"
|> testCase
<| fun _ ->
match run parseBrakets str with
| Success (x, _,_) -> Expect.equal x expected "These should have been equal"
| Failure (m, _,_) -> failwithf "Expected a match: %s" m
)
|> testList "Bracket tests"
let tests =
[ parserTests ]
|> testList "Tests"
runTests defaultConfig tests
The problem is of course how to handle and arbitrary level of nesting - the code above only works for up to 3 levels. The code I would like to write is:
let rec pNestedBracket =
pchar '['
>>. tryP pNestedBracket
.>> pchar ']'
|>> Bracket
But F# doesn't allow this.
Am I barking up the wrong tree completely with how to solve this (I understand that there are easier ways to solve this particular problem)?
You are looking for FParsecs createParserForwardedToRef method. Because parsers are values and not functions it is impossible to make mutually recursive or self recursive parsers in order to do this you have to in a sense declare a parser before you define it.
Your final code will end up looking something like this
let bracketParser, bracketParserRef = createParserForwardedToRef<Bracket>()
bracketParserRef := ... //here you can finally declare your parser
//you can reference bracketParser which is a parser that uses the bracketParserRef
Also I would recommend this article for basic understanding of parser combinators. https://fsharpforfunandprofit.com/posts/understanding-parser-combinators/. The final section on a JSON parser talks about the createParserForwardedToRef method.
As an example of how to use createParserForwardedToRef, here's a snippet from a small parser I wrote recently. It parses lists of space-separated integers between brackets (and the lists can be nested), and the "integers" can be small arithmetic expressions like 1+2 or 3*5.
type ListItem =
| Int of int
| List of ListItem list
let pexpr = // ... omitted for brevity
let plist,plistImpl = createParserForwardedToRef()
let pListContents = (many1 (plist |>> List .>> spaces)) <|>
(many (pexpr |>> Int .>> spaces))
plistImpl := pchar '[' >>. spaces
>>. pListContents
.>> pchar ']'
P.S. I would have put this as a comment to Thomas Devries's answer, but a comment can't contain nicely-formatted code. Go ahead and accept his answer; mine is just intended to flesh his out.
I've just started out playing around with FParsec, and I'm now trying to parse strings on the following format
10*0.5 0.25 0.75 3*0.1 0.9
I want 3*0.1, for example, to be expanded into 0.1 0.1 0.1
What I have so far is the following
type UserState = unit
type Parser<'t> = Parser<'t, UserState>
let str s : Parser<_> = pstring s
let float_ws : Parser<_> = pfloat .>> spaces
let product = pipe2 pint32 (str "*" >>. float_ws) (fun x y -> List.init x (fun i -> y))
The product parser correctly parsers entries on the format int*float and expands it into a list of floats. However, I'm having trouble coming up with a solution that allows me to parse either int*float or just a float. I would like to do something like
many (product <|> float_ws)
This will of course not work since the return types of the parsers differ. Any ideas on how to make this work? Is it possible to wrap of modify float_ws such that it returns a list with only one float?
You can make float_ws return a float list by simply adding a |>> List.singleton
let float_ws : Parser<_> = pfloat .>> spaces |>> List.singleton
|>> is just the map function, where you apply some function to the result of one parser and receive a new parser of some new type:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
See: http://www.quanttec.com/fparsec/reference/primitives.html#members.:124::62::62:
Also, since product parser includes an int parser, it will successfully parse a character from the wrong case, this means the parser state will be changed. That means you cannot use the <|> operator on the first parser directly, you must also add attempt so FParsec can return to the original parser state.
let combined = many (attempt product <|> float_ws)
I'm so new at F# and FParsec, I don't even want to embarrass myself by showing what I've got so far.
In the FParsec examples, every type in the ASTs (that I see) are type abbreviations for single values, lists, or tuples.
What if I have a complex type which is supposed to hold, say, a parsed function name and its parameters?
So, f(a, b, c) would be parsed to an object of type PFunction which has a string member Name and a PParameter list member Parameters. How can I go from a parser which can match f(a, b, c) and |>> it into a PFunction?
All I seem to be able to do so far is create the composite parser, but not turn it into anything. The Calculator example would be similar if it made an AST including a type like Term but instead it seems to me to be an interpreter rather than a parser, so there is no AST. Besides, Term would probably just be a tuple of other type abbreviated components.
Thanks!
I think this is what you're looking for:
let pIdentifier o =
let isIdentifierFirstChar c = isLetter c || c = '_'
let isIdentifierChar c = isLetter c || isDigit c || c = '_'
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier" <| o
let pParameterList p =
spaces >>.
pchar '(' >>. spaces >>. sepBy (spaces >>. p .>> spaces) (pchar ',')
.>> spaces .>> pchar ')'
type FunctionCall(Name: string, Parameters: string list) =
member this.Name = Name
member this.Parameters = Parameters
let pFunctionCall o=
pipe2 (pIdentifier) (pParameterList pIdentifier) (fun name parameters -> FunctionCall(name, parameters)) <|o
This is a completely contrived but here's what I think it would look like the following, using pipe2 instead of |>>
type FunctionCall(Name: string, Parameters: string list) =
member this.Name = Name
member this.Parameters = Parameters
let pFunctionCall =
pipe2 (pIdentifier) (pstring "(" >>. pParameterList .>> pstring ")") (fun name parameters -> FunctionCall(name, parameters))
The functional answer would be to use a discriminated union, as Daniel mentioned. FParsec also has a UserState that can be used like a state monad, so if you really want to parse directly into a complex type, you can use that. [1]
[1] http://cs.hubfs.net/topic/None/60071
I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.
I first implemented the code in Haskell (with Parsec) which ran perfectly well.
The parsers for infix expressions are designed mutually recursive.
parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
x <- ignoreSpaces $ subParser
do
op <- ignoreSpaces $ operatorParser
y <- parseInfixOp operatorParser subParser
return $ BinaryOp op x y
<|> return x
parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)
parseExpr :: Parser Expression
parseExpr = parseInfix0
parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor
parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)
parseFactor' :: Parser Expression
parseFactor' = parseString
<|> parseBool
<|> parseNumber
<|> parseVariable
<|> (try parseFunCall) <|> parseIdentifier
Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.
let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
let! first = letter
let! rest = many (letter <|> digit)
let funcName = toStr $ first::rest
do! ignoreSpace
let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","
return FunCall(funcName, args)
}
and parseFactor' =
parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier
F# now either complains about recursive objects and just throws a StackOverflowException at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".
What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?
What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.
If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than
...
and parseInfix2 = body
...
use
...
and parseInfix2 x = (body) x
...
even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).
I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').
Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.
η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.
You want something like this:
let rec p1 = lazy (...)
and p2 = ... p1.Value ..
A better way, if you have a workflow builder, is to define the Delay member to do this for you:
type Builder() =
member this.Delay(f: unit -> Parser<_>) : Parser<_> =
let promise = Lazy.Create f
Return () |> Bind (fun () -> promise.Value)
let parse = new Builder()
let rec p1 =
parse {
...
}
and p2 =
parse {
...
}
Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.
Top level, I have:
let rec myParser s = (parseExpr myParser) s
Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.