Parsing into a complex type - f#

I'm so new at F# and FParsec, I don't even want to embarrass myself by showing what I've got so far.
In the FParsec examples, every type in the ASTs (that I see) are type abbreviations for single values, lists, or tuples.
What if I have a complex type which is supposed to hold, say, a parsed function name and its parameters?
So, f(a, b, c) would be parsed to an object of type PFunction which has a string member Name and a PParameter list member Parameters. How can I go from a parser which can match f(a, b, c) and |>> it into a PFunction?
All I seem to be able to do so far is create the composite parser, but not turn it into anything. The Calculator example would be similar if it made an AST including a type like Term but instead it seems to me to be an interpreter rather than a parser, so there is no AST. Besides, Term would probably just be a tuple of other type abbreviated components.
Thanks!

I think this is what you're looking for:
let pIdentifier o =
let isIdentifierFirstChar c = isLetter c || c = '_'
let isIdentifierChar c = isLetter c || isDigit c || c = '_'
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier" <| o
let pParameterList p =
spaces >>.
pchar '(' >>. spaces >>. sepBy (spaces >>. p .>> spaces) (pchar ',')
.>> spaces .>> pchar ')'
type FunctionCall(Name: string, Parameters: string list) =
member this.Name = Name
member this.Parameters = Parameters
let pFunctionCall o=
pipe2 (pIdentifier) (pParameterList pIdentifier) (fun name parameters -> FunctionCall(name, parameters)) <|o

This is a completely contrived but here's what I think it would look like the following, using pipe2 instead of |>>
type FunctionCall(Name: string, Parameters: string list) =
member this.Name = Name
member this.Parameters = Parameters
let pFunctionCall =
pipe2 (pIdentifier) (pstring "(" >>. pParameterList .>> pstring ")") (fun name parameters -> FunctionCall(name, parameters))

The functional answer would be to use a discriminated union, as Daniel mentioned. FParsec also has a UserState that can be used like a state monad, so if you really want to parse directly into a complex type, you can use that. [1]
[1] http://cs.hubfs.net/topic/None/60071

Related

FParsec Parsing list with optional content, fails on continuing with next parser

I'm trying to parse a list of asset types where each asset type potentially has a name. After the list is done I'd like to continue parsing a list of attributes for the asset types, one list for all asset types.
The string I'm trying to parse looks like:
converter named a023, signaltower, powerunit named 23 attributes power, temperature
The parser signature looks like
Parser<((Asset * AssetName option) list * Attribute liste),unit>
I got parsing the assets name, and the attributes separately, the problem arise when I combine the two and list is done, it then fails on the attributes string stating Expecting: 'named'.
To me it seems that it is trying the opt assetname parser which fails on the attributes string, but I am not sure how to ignore that and move on when the list is "done" (after all the asset name part is optional)..
type AssetName = AssetName of string
let named = str "named" >>. spaces1 >>. word
let assetName = spaces1 >>. (named |>> AssetName)
type Asset = | Converter | Signaltower | Powerunit
let assetType = ["converter"; "signaltower"; "powerunit";] |> Seq.map pstring |> choice
let findAsset = function
| "converter" -> Converter
| "signaltower" -> Signaltower
| "powerunit" -> Powerunit
| _ -> raise <| Exception "Invalid asset type"
let asset = (assetType |>> findAsset) .>>. opt assetName
type Attribute = Attribute of string
let attribute = word |>> Attribute
let attributes = spaces1 >>. str "attributes" >>. spaces1 >>. sepBy attribute commaMaybeSpace
let p = sepBy asset (pchar ',' >>. spaces) .>>. attributes
let r input = run p3 input
r "converter named a023, signaltower, powerunit named 23 attributes power, temperature"
Edit As discussed in a similar answer and also hinted at in the documentation, an optional parser opt p fails if p fails after changing its state. Instead, backtracking (restoring the state) should be used, either with the attempt parser, or with one of the backtracking operators.
Keeping in mind that opt (attempt (p >>. q)) is equivalent to opt (p >>? q), try
let assetName = spaces1 >>? (named |>> AssetName)
Not sure if it has some bearing on the issue, but the type of the parser in the example does not match the grammar you are describing. I think you want:
asset-type :
| Converter | Signaltower | Powerunit
named-asset :
| asset-type | asset-type named ident
asset-list :
named-asset,...,named-asset
attributed-asset-list :
asset-list attributes ident-list
The following line does not repeat the parser for named-asset.
let p = asset .>>. attributes
// val p : Parser<((Asset * AssetName option) * Attribute list),unit>
You can replace it with
let p = sepBy asset (pchar ',' >>. spaces) .>>. attributes
// val p : Parser<((Asset * AssetName option) list * Attribute list),unit>
"converter named a023, signaltower, powerunit named 23 attributes power, temperature"
|> run p |> printfn "%A"
// Success: ([(Converter, Some (AssetName "a023")); (Signaltower, null);
// (Powerunit, Some (AssetName "23"))],
// [Attribute "power"; Attribute "temperature"])

Parsing the arrow type with FParsec

I'm trying to parse the arrow type with FParsec.
That is, this:
Int -> Int -> Int -> Float -> Char
For example.
I tried with this code, but it only works for one type of arrow (Int -> Int) and no more. I also want to avoid parentheses, because I already have a tuple type that uses them, and I don't want it to be too heavy in terms of syntax either.
let ws = pspaces >>. many pspaces |>> (fun _ -> ())
let str_ws s = pstring s .>> ws
type Type = ArrowType of Type * Type
let arrowtype' =
pipe2
(ws >>. ty')
(ws >>. str_ws "->" >>. ws >>. ty')
(fun t1 t2 -> ArrowType(t1, t2))
let arrowtype =
pipe2
(ws >>. ty' <|> arrowtype')
(ws >>. str_ws "->" >>. ws >>. ty' <|> arrowtype')
(fun t1 t2 -> ArrowType(t1, t2)) <?> "arrow type"
ty' is just another types, like tuple or identifier.
Do you have a solution?
Before I get into the arrow syntax, I want to comment on your ws parser. Using |>> (fun _ -> ()) is a little inefficient since FParsec has to construct a result object then immediately throw it away. The built-in spaces and spaces1 parsers are probably better for your needs, since they don't need to construct a result object.
Now as for the issue you're struggling with, it looks to me like you want to consider the arrow parser slightly differently. What about treating it as a series of types separated by ->, and using the sepBy family of parser combinators? Something like this:
let arrow = spaces1 >>. pstring "->" .>> spaces1
let arrowlist = sepBy1 ty' arrow
let arrowtype = arrowlist |>> (fun types ->
types |> List.reduce (fun ty1 ty2 -> ArrowType(ty1, ty2))
Note that the arrowlist parser would also match against just plain Int, because the definition of sepBy1 is not "there must be at least one list separator", but rather "there must be at least one item in the list". So to distinguish between a type of Int and an arrow type, you'd want to do something like:
let typeAlone = ty' .>> notFollowedBy arrow
let typeOrArrow = attempt typeAlone <|> arrowtype
The use of attempt is necessary here so that the characters consumed by ty' will be backtracked if an arrow was present.
There's a complicating factor I haven't addressed at all since you mentioned not wanting parentheses. But if you decide that you want to be able to have arrow types of arrow types (that is, functions that take functions as input), you'd want to parse types like (Int -> Int) -> (Int -> Float) -> Char. This would complicate the use of sepBy, and I haven't addressed it at all. If you end up needing more complex parsing including parentheses, then it's possible you might want to use OperatorPrecedenceParser. But for your simple needs where parentheses aren't involved, sepBy1 looks like your best bet.
Finally, I should give a WARNING: I haven't tested this at all, just typed this into the Stack Overflow box. The code example I gave you is not intended to be working as-is, but rather to give you an idea of how to proceed. If you need a working-as-is example, I'll be happy to try to give you one, but I don't have the time to do so right now.

Parsing in to a recursive data structure

I wish to parse a string in to a recursive data structure using F#. In this question I'm going to present a simplified example that cuts to the core of what I want to do.
I want to parse a string of nested square brackets in to the record type:
type Bracket = | Bracket of Bracket option
So:
"[]" -> Bracket None
"[[]]" -> Bracket ( Some ( Bracket None) )
"[[[]]]" -> Bracket ( Some ( Bracket ( Some ( Bracket None) ) ) )
I would like to do this using the parser combinators in the FParsec library. Here is what I have so far:
let tryP parser =
parser |>> Some
<|>
preturn None
/// Parses up to nesting level of 3
let parseBrakets : Parser<_> =
let mostInnerLevelBracket =
pchar '['
.>> pchar ']'
|>> fun _ -> Bracket None
let secondLevelBracket =
pchar '['
>>. tryP mostInnerLevelBracket
.>> pchar ']'
|>> Bracket
let firstLevelBracket =
pchar '['
>>. tryP secondLevelBracket
.>> pchar ']'
|>> Bracket
firstLevelBracket
I even have some Expecto tests:
open Expecto
[<Tests>]
let parserTests =
[ "[]", Bracket None
"[[]]", Bracket (Some (Bracket None))
"[[[]]]", Bracket ( Some (Bracket (Some (Bracket None)))) ]
|> List.map(fun (str, expected) ->
str
|> sprintf "Trying to parse %s"
|> testCase
<| fun _ ->
match run parseBrakets str with
| Success (x, _,_) -> Expect.equal x expected "These should have been equal"
| Failure (m, _,_) -> failwithf "Expected a match: %s" m
)
|> testList "Bracket tests"
let tests =
[ parserTests ]
|> testList "Tests"
runTests defaultConfig tests
The problem is of course how to handle and arbitrary level of nesting - the code above only works for up to 3 levels. The code I would like to write is:
let rec pNestedBracket =
pchar '['
>>. tryP pNestedBracket
.>> pchar ']'
|>> Bracket
But F# doesn't allow this.
Am I barking up the wrong tree completely with how to solve this (I understand that there are easier ways to solve this particular problem)?
You are looking for FParsecs createParserForwardedToRef method. Because parsers are values and not functions it is impossible to make mutually recursive or self recursive parsers in order to do this you have to in a sense declare a parser before you define it.
Your final code will end up looking something like this
let bracketParser, bracketParserRef = createParserForwardedToRef<Bracket>()
bracketParserRef := ... //here you can finally declare your parser
//you can reference bracketParser which is a parser that uses the bracketParserRef
Also I would recommend this article for basic understanding of parser combinators. https://fsharpforfunandprofit.com/posts/understanding-parser-combinators/. The final section on a JSON parser talks about the createParserForwardedToRef method.
As an example of how to use createParserForwardedToRef, here's a snippet from a small parser I wrote recently. It parses lists of space-separated integers between brackets (and the lists can be nested), and the "integers" can be small arithmetic expressions like 1+2 or 3*5.
type ListItem =
| Int of int
| List of ListItem list
let pexpr = // ... omitted for brevity
let plist,plistImpl = createParserForwardedToRef()
let pListContents = (many1 (plist |>> List .>> spaces)) <|>
(many (pexpr |>> Int .>> spaces))
plistImpl := pchar '[' >>. spaces
>>. pListContents
.>> pchar ']'
P.S. I would have put this as a comment to Thomas Devries's answer, but a comment can't contain nicely-formatted code. Go ahead and accept his answer; mine is just intended to flesh his out.

Use FParsec to parse float or int*float

I've just started out playing around with FParsec, and I'm now trying to parse strings on the following format
10*0.5 0.25 0.75 3*0.1 0.9
I want 3*0.1, for example, to be expanded into 0.1 0.1 0.1
What I have so far is the following
type UserState = unit
type Parser<'t> = Parser<'t, UserState>
let str s : Parser<_> = pstring s
let float_ws : Parser<_> = pfloat .>> spaces
let product = pipe2 pint32 (str "*" >>. float_ws) (fun x y -> List.init x (fun i -> y))
The product parser correctly parsers entries on the format int*float and expands it into a list of floats. However, I'm having trouble coming up with a solution that allows me to parse either int*float or just a float. I would like to do something like
many (product <|> float_ws)
This will of course not work since the return types of the parsers differ. Any ideas on how to make this work? Is it possible to wrap of modify float_ws such that it returns a list with only one float?
You can make float_ws return a float list by simply adding a |>> List.singleton
let float_ws : Parser<_> = pfloat .>> spaces |>> List.singleton
|>> is just the map function, where you apply some function to the result of one parser and receive a new parser of some new type:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
See: http://www.quanttec.com/fparsec/reference/primitives.html#members.:124::62::62:
Also, since product parser includes an int parser, it will successfully parse a character from the wrong case, this means the parser state will be changed. That means you cannot use the <|> operator on the first parser directly, you must also add attempt so FParsec can return to the original parser state.
let combined = many (attempt product <|> float_ws)

How to parse a list of tokens with FParsec

I am trying to parse a list of tokens with FParsec, where each token is either a block of text or a tag - for example:
This is a {type of test} test, and it {succeeds or fails}
Here is the parser:
type Parser<'t> = Parser<'t, unit>
type Token =
| Text of string
| Tag of string
let escape fromString toString : Parser<_> =
pstring fromString |>> (fun c -> toString)
let content : Parser<_> =
let contentNormal = many1Satisfy (fun c -> c <> '{' && c <> '}')
let openBraceEscaped = escape "{{" "{"
let closeBraceEscaped = escape "}}" "}"
let contentEscaped = openBraceEscaped <|> closeBraceEscaped
stringsSepBy contentNormal contentEscaped
let ident : Parser<_> =
let isIdentifierFirstChar c = isLetter c || c = '_'
let isIdentifierChar c = isLetter c || isDigit c || c = '_'
spaces >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier" .>> spaces
let text = content |>> Text
let tag =
ident |> between (skipString "{") (skipString "}")
|>> Tag
let token = text <|> tag
let tokens = many token .>>. eof
the following tests work:
> run token "abc def" ;;
val it : ParserResult<Token,unit> = Success: Text "abc def"
> run token "{abc def}" ;;
val it : ParserResult<Token,unit> = Success: Tag "abc def"
but trying to run tokens results in an exception:
> run tokens "{abc} def" ;;
System.InvalidOperationException: (Ln: 1, Col: 10): The combinator 'many' was
applied to a parser that succeeds without consuming input and without
changing the parser state in any other way. (If no exception had been raised,
the combinator likely would have entered an infinite loop.)
I've gone over this stackoverflow question but nothing I've tried works. I even added the following, but I get the same exception:
let tokenFwd, tokenRef = createParserForwardedToRef<Token, unit>()
do tokenRef := choice [tag; text]
let readEndOfInput : Parser<unit, unit> = spaces >>. eof
let readExprs = many tokenFwd
let readExprsTillEnd = readExprs .>> readEndOfInput
run readExprsTillEnd "{abc} def" // System.InvalidOperationException ... The combinator 'many' was applied ...
I believe the problem is stringsSepBy in content, but I can't figure out any other way to get a string with the escaped items
Any help would be much appreciated - I have been going through this for a couple days now and can't figure it out.
stringsSepBy accepts zero strings, causing token to accept an empty string, causing many to complain.
I changed it to the following to verify that that was the line you need to work on.
many1 (contentNormal <|> contentEscaped) |>> fun l -> String.concat "" l
Also I got away from stringsSepBy contentNormal contentEscaped, because that says you need to match contentNormals with contentEscapeds in between them. So a{{b}}c is ok, but {{b}}, {{b}}c and a{{b}} will fail.
notEmpty can be used to consume input. If you're not consuming any input but letting the parser succeed then the "current position" of the parser is not moved forward, so when a statement doing that is inside a many it would go into an infinite loop without that exception. stringsSepBy is succeeding and parsing zero elements, you could use notEmpty to fail it if it gets zero elements:
stringsSepBy contentNormal contentEscaped |> notEmpty
Also, I tried to get your full example to parse, the tags can include spaces so you need to allow ident to include spaces to match that:
let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = ' '
Another little adjustment would be to only return a Token list rather than Token list * unit tuple (unit is the result of eof):
let tokens = many token .>> eof

Resources