Parsing separated lists with FParsec

Parsing separated lists with FParsec - f#

I am trying to parse something that may be a list of items, or which may be just one item. I want to put the results into a DU (Thing below).
The way I'm approaching this is as below, but it gives me a list of things even when there is only one thing in the list.
let test p str =
match run p str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
type Thing =
| OneThing of int
| LotsOfThings of Thing list
let str s = pstringCI s .>> spaces
let one = str "one" |>> fun x -> OneThing 1
let two = str "two" |>> fun x -> OneThing 2
let three = str "three" |>> fun x -> OneThing 3
let oneThing = (one <|> two <|> three)
let lotsOfThings = sepBy1 oneThing (str "or") |>> LotsOfThings
let lotsFirst = (lotsOfThings <|> oneThing)
test lotsFirst "one or two" // Success: LotsOfThings [OneThing 1; OneThing 2]
test lotsFirst "one" // Success: LotsOfThings [OneThing 1]
What is the correct way to return OneThing when there is only one item in the list?
I can do that if I test the list before returning, like the below. But that doesn't really "feel" right.
let lotsOfThings = sepBy1 oneThing (str "or") |>> fun l -> if l.Length = 1 then l.[0] else l |> LotsOfThings
LinqPad of the above is here: http://share.linqpad.net/sd8tpj.linq

If you don't like testing the list length after parsing, then you might try switching your <|> expression to test the single-item case first, and use notFollowedBy to ensure that the single-item case won't match a list:
let oneThing = (one <|> two <|> three)
let separator = str "or"
let lotsOfThings = sepBy1 oneThing separator |>> LotsOfThings
let oneThingOnly = oneThing .>> (notFollowedBy separator)
let lotsSecond = (attempt oneThingOnly) <|> lotsOfThings
test lotsSecond "one or two" // Success: LotsOfThings [OneThing 1; OneThing 2]
test lotsSecond "one" // Success: OneThing 1
Note the use of the attempt parser with oneThingOnly. That's because the documentation for the <|> parser states (emphasis in original):
The parser p1 <|> p2 first applies the parser p1. If p1 succeeds, the result of p1 is returned. If p1 fails with a non‐fatal error and without changing the parser state, the parser p2 is applied.
Without the attempt in there, "one or two" would first try to parse with oneThingOnly, which would consume the "one" and then fail on the "or", but the parser state would have been changed. The attempt combinator basically makes a "bookmark" of the parser state before trying a parser, and if that parser fails, it goes back to the "bookmark". So <|> would see an unchanged parser state after attempt oneThingOnly, and would then try lotsOfThings.

Related

How to parse a recursive left syntax rule with FParsec?

I usually use FParsec for LL grammars, but sometimes it happens that in a whole grammar only one element requires left recursive parsing (so the grammar is no longer LL). Currently I have such a situation, I have a large LL grammar implemented with FParsec, but a small grammar element is bothering me because it obviously cannot be parsed correctly.
The syntax element in question is an access to an array index à la F#, e.g. myArray.[index] where myArray can be any expression and index can be any expression too. It turns out that my function calls use square brackets, not parentheses, and my identifiers can be qualified with dots.
An example of correct syntax for an expression is: std.fold[fn, f[myArray.[0]], std.tail[myArray]].
The .[] syntax element is obviously left recursive, but perhaps there is a trick that allows me to parse it anyway? My minimal code is as follows:
open FParsec
type Name = string list
type Expr =
(* foo, Example.Bar.fizz *)
| Variable of Name
(* 9, 17, -1 *)
| Integer of int
(* foo[3, 2], Std.sqrt[2] *)
| FunCall of Name * Expr list
(* (a + b), (a + (1 - c)) *)
| Parens of Expr
(* myArray.[0], table.[index - 1] *)
| ArrayAccess of Expr * Expr
(* a + b *)
| Addition of Expr * Expr
let opp =
new OperatorPrecedenceParser<Expr, _, _>()
let pExpr = opp.ExpressionParser
let pName =
let id =
identifier (IdentifierOptions(isAsciiIdStart = isAsciiLetter, isAsciiIdContinue = isAsciiLetter))
sepBy1 id (skipChar '.')
let pVariable = pName |>> Variable
let pInt = pint32 |>> Integer
let pFunCall =
pipe4
pName
(spaces >>. skipChar '[')
(sepBy (spaces >>. pExpr) (skipChar ','))
(spaces >>. skipChar ']')
(fun name _ args _ -> FunCall(name, args))
let pArrayAccess =
pipe5
pExpr
(spaces >>. skipChar '.')
(spaces >>. skipChar '[')
(spaces >>. pExpr)
(spaces >>. skipChar ']')
(fun expr _ _ index _ -> ArrayAccess(expr, index))
let pParens =
between (skipChar '(') (skipChar ')') (spaces >>. pExpr)
opp.TermParser <-
choice [ attempt pFunCall
pVariable
pArrayAccess
pInt
pParens ]
.>> spaces
let addInfixOperator str prec assoc mapping =
opp.AddOperator
<| InfixOperator(str, spaces, prec, assoc, (), (fun _ leftTerm rightTerm -> mapping leftTerm rightTerm))
addInfixOperator "+" 6 Associativity.Left (fun a b -> Addition(a, b))
let startParser = runParserOnString (pExpr .>> eof) () ""
printfn "%A" <| startParser "std.fold[fn, f[myArray.[0]], std.tail[myArray]]"
One way to do this is as follows: instead of making a list of parsing choices that also lists pArrayAccess like above, which will at some point cause an infinite loop, one can modify pExpr to parse the grammar element in question as an optional element following an expression:
let pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") opp.ExpressionParser
match! opt pArrayAccess with
| None -> return exp
| Some index -> return ArrayAccess(exp, index)
}
After testing, it turns out that this works very well if the following two conditions are not met:
The contents of the square brackets must not contain access to another array ;
An array cannot be accessed a second time in succession (my2DArray.[x].[y]).
This restricts usage somewhat. How can I get away with this? Is there a way to do this or do I have to change the grammar?

Finally, a solution to this problem is quite simple: just expect a list of array access. If the list is empty, then return the initial expression, otherwise fold over all the array accesses and return the result. Here is the implementation:
let rec pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") pExpr
match! many pArrayAccess with
| [] -> return exp
| xs -> return List.fold
(fun acc curr -> ArrayAccess(acc, curr)) exp xs
}
This way of doing things meets my needs, so I'd be happy with it, if anyone passes by and wants something more general and not applicable with the proposed solution, then I refer to #Martin Freedman comment, using createParserForwardedToRef().

How to parse recusrive grammar in FParsec

Previous questions which I could not use to get this to work
Recursive grammars in FParsec
Seems to be an old question which was asked before createParserForwardedToRef was added to FParsec
AST doesn't seem to be as horribly recursive as mine.
Parsing in to a recursive data structure
Grammar relies on a special character '[' to indicate another nesting level. I don't have this luxury
I want to build a sort of Lexer and project system for a language I have found myself writing lately. The language is called q. It is a fairly simple language and has no operator precedence. For example 1*2+3 is the same as (1*(2+3)). It works a bit like a reverse polish notation calculator, evaluation is right to left.
I am having trouble expressing this in FParsec. I have put together the following simplified demo
open FParsec
type BinaryOperator = BinaryOperator of string
type Number = Number of string
type Element =
|Number of Number
and Expression =
|Element of Element
|BinaryExpression of Element * BinaryOperator * Expression
let number = regex "\d+\.?\d*" |>> Number.Number
let element = [ number ] |> choice |>> Element.Number
let binaryOperator = ["+"; "-"; "*"; "%"] |> Seq.map pstring |> choice |>> BinaryOperator
let binaryExpression expression = pipe3 element binaryOperator expression (fun l o r -> (l,o,r))
let expression =
let exprDummy, expRef = createParserForwardedToRef()
let elemExpr = element |>> Element
let binExpr = binaryExpression exprDummy |>> BinaryExpression
expRef.Value <- [binExpr; elemExpr; ] |> choice
expRef
let statement = expression.Value .>> eof
let parseString s =
printfn "Parsing input: '%s'" s
match run statement s with
| Success(result, _, _) -> printfn "Ok: %A" result
| Failure(errorMsg, _, _) -> printfn "Error: %A" errorMsg
//tests
parseString "1.23"
parseString "1+1"
parseString "1*2+3" // equivalent to (1*(2+3))
So far, I haven't been able to come up with a way to satisfy all 3 tests cases. In the above, it tries to parse binExpr first, realises it can't, but then must be consuming the input because it doesn't try to evaluate elemExpr next. Not sure what to do. How do I satisfy the 3 tests?

Meditating on Tomas' answer, I have come up with the following that works
let expr, expRef = createParserForwardedToRef()
let binRightExpr = binaryOperator .>>. expr
expRef.Value <- parse{
let! first = element
return! choice [
binRightExpr |>> (fun (o, r) -> (first, o, r) |> BinaryExpression)
preturn (first |> Element)
]
}
let statement = expRef.Value .>> eof
The reason the first parser failed is given in the FParsec docs
The behaviour of the <|> combinator has two important characteristics:
<|> only tries the parser on the right side if the parser on the left
side fails. It does not implement a longest match rule.
However, it only tries the right parser if the left parser fails without consuming input.
Probably need to clean up a few things like the structure of the AST but I think I am good to go.

With FParsec how would I parse: line ending in newline <|> a line ending with eof

I'm parsing a file and want to throw away certain lines of the file I'm not interested in. I've been able to get this to work for all cases except for when the last line is a throwaway and does not end in newline.
I've tried constructing an endOfInput rule and joining it with a skipLine rule via <|>. This is all wrapped in a many. Tweaking everything I seem to either get a 'many succeeds without consuming input...' error or a fail on the skipLine rule when I don't try some kind of back track.
let skipLine = many (noneOf "\n") .>> newline |>> fun x -> [string x]
let endOfInput = many (noneOf "\n") .>> eof |>> fun x -> [string x]
test (many (skipLine <|> endOfInput)) "And here is the next.\nThen the last."
** this errors out on the skipLine parser at the last line
I've tried
let skipLine = many (noneOf "\n") .>>? newline |>> fun x -> [string x]
... and ...
let skipLine = many (noneOf "\n") .>> newline |>> fun x -> [string x]
test (many (attempt skipLine <|> endOfInput)) "And here is the next.\nThen the last."
** these produce the many error
Note: the output functions are just place holders to get these to work with my other rules. I haven't gotten into figuring out how to format the output.
This is my first time using FParsec and I'm new to F#.

FParsec actually has a built-in parser that does exactly what you're looking for: skipRestOfLine. It terminates on either newlines or eof, just like what you're looking for.
If you want to try to implement it yourself as a learning exercise, let me know and I'll try to help you figure out the problem. But if you just want a parser that skips characters until the end of the line, the built-in skipRestOfLine is exactly what you need.

Here's an approach of parsing such a files with using an Option type,
it'll help you to parse files with newlines in the end or skip blank lines in the middle. I've got the solution from that post - fparsec key-value parser fails to parse . Parsing of a text file with integer values in one column:
module OptionIntParser =
open FParsec
open System
open System.IO
let pCell: Parser<int, unit> = pint32 |>> fun x -> x
let pSome = pCell |>> Some
let pNone = (restOfLine false) >>% None
let pLine = (attempt pSome) <|> pNone
let pAllover = sepBy pLine newline |>> List.choose id
let readFile filePath =
let rr = File.OpenRead(filePath)
use reader = new IO.StreamReader(rr)
reader.ReadToEnd()
let testStr = readFile("./test1.txt")
let runAll s =
let res = run pAllover s in
match res with
| Success (rows, _, _) -> rows
| Failure (s, _, _) -> []
let myTest =
let res = runAll testStr
res |> List.iter (fun (x) -> Console.WriteLine(x.ToString() ))

Applying a function returned from a subparser with fparsec

Noob alert!
Ok, I'm trying to build a simple math expression parser in fparsec. Right now all I want it to do is handle strings like this "1+2-3*4/5" and return a double as the result of the evaluation. No spaces, newlines, or parens, and left to right order of operations is fine.
Here's what I have so far:
let number = many1 digit |>> fun ds -> int <| String.Concat(ds)
let op : Parser<int -> int -> int, unit> =
charReturn '+' (+) <|>
charReturn '-' (-) <|>
charReturn '*' (*) <|>
charReturn '/' (/)
let expression, expressionImpl = createParserForwardedToRef()
do expressionImpl :=
choice[
attempt(number .>> op >>. expression);
number]
let test p str =
match run (p .>> eof) str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(result, _, _) -> printfn "Failure: %A" result
[<EntryPoint>]
let main argv =
test expression "1+1/2*3-4"
Console.Read() |> ignore
0
In the first choice of the expression parser, I'm not sure how to apply the function returned by the op parser.

As usual, I find the answer right after posting the question (after 3 hours of searching).
I just changed this line:
attempt(number .>> op >>. expression);
to this:
attempt(pipe3 number op expression (fun x y z -> y x z));
However, I just realized that my expressions parse backwards. Back to the drawing board.

Use FParsec to parse a self-describing input

I'm using FParsec to parse an input that describes its own format. For example, consider this input:
int,str,int:4,'hello',3
The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.
What is the best way to parse something like this with FParsec?
I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.
open FParsec
type State = { Formats : string[]; Index : int32 }
with static member Default = { Formats = [||]; Index = 0 }
type Value =
| Integer of int
| String of string
let parseFormat : Parser<_, State> =
parse {
let! formats =
sepBy
(pstring "int" <|> pstring "str")
(skipString ",")
|>> Array.ofList
do! updateUserState (fun state -> { state with Formats = formats })
}
let parseValue format =
match format with
| "int" -> pint32 |>> Integer
| "str" ->
between
(skipString "'")
(skipString "'")
(manySatisfy (fun c -> c <> '\''))
|>> String
| _ -> failwith "Unexpected"
let parseValueByState =
parse {
let! state = getUserState
let format = state.Formats.[state.Index]
do! setUserState { state with Index = state.Index + 1}
return! parseValue format
}
let parseData =
sepBy
parseValueByState
(skipString ",")
let parse =
parseFormat
>>. skipString ":"
>>. parseData
[<EntryPoint>]
let main argv =
let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
printfn "%A" result
0

There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.
First, several library functions that may appear useful in other FParsec-related projects:
/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)
/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>
/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)
/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
let rec fold state =
function
| [] -> pzero
| hd::[] -> state <<+ hd
| hd::tl -> fold (state <<+ (hd .>> separator)) tl
fold (preturn []) parsers
|>> List.rev // reverse the list since we appended to the top
Now, the main code. The basic idea is to run parsing in three steps:
Parse out the keys (which are plain ASCII strings)
Map these keys to actual Value parsers
Run these parsers in order
The rest seems to be commented within the code. :)
/// The resulting type
type Output =
| Integer of int
| String of string
/// tag to parser mappings
let mappings =
[
"int" => (pint32 |>> Integer)
"str" => (
manySatisfy (fun c -> c <> '\'')
|> between (skipChar ''') (skipChar ''')
|>> String
)
]
|> makeMap
let myProcess : Parser<Output list> =
let pKeys = // First, we parse out the keys
many1Satisfy isAsciiLower // Parse one key; keys are always ASCII strings
|> sepBy <| (skipChar ',') // many keys separated by comma
.>> (skipChar ':') // all this with trailing semicolon
let pValues = fun keys ->
keys // take the keys list
|> List.map // find the required Value parser
// (NO ERROR CHECK for bad keys)
(fun p -> Map.find p mappings)
|> allOfSepBy (skipChar ',') // they must run in order, comma-separated
pKeys >>= pValues
Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]

#bytebuster beat me to it but I still post my solution. The technique is similar to #bytebuster.
Thanks for an interesting question.
In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.
open FParsec
type Value =
| Integer of int
| String of string
type ValueParser = Parser<Value, unit>
let parseIntValue : Parser<Value, unit> =
pint32 |>> Integer
let parseStringValue : Parser<Value, unit> =
between
(skipChar '\'')
(skipChar '\'')
(manySatisfy (fun c -> c <> '\''))
<?> "string"
|>> String
let parseValueParser : Parser<ValueParser, unit> =
choice
[
skipString "int" >>% parseIntValue
skipString "str" >>% parseStringValue
]
let parseValueParsers : Parser<ValueParser list, unit> =
sepBy1
parseValueParser
(skipChar ',')
// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust ps =
match ps with
| [] -> preturn []
| h::t ->
adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
loop id ps
let parseLine : Parser<Value list, unit> =
parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof
[<EntryPoint>]
let main argv =
let s = "int,str,int:4,'hello',3"
let r = run parseLine s
printfn "%A" r
0
Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].
Parsing int,str,str:4,'hello',3 (incorrect) yields:
Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
^
Expecting: string

I rewrote #FuleSnabel's sepByList as follows to help me understand it better. Does this look right?
let sepByList (parsers : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust parsers =
parse {
match parsers with
| [] -> return []
| parser :: tail ->
let! value = adjust parser
let! values = loop (fun parser -> sep >>. parser) tail
return value :: values
}
loop id parsers

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parsing separated lists with FParsec - f#

Related

How to parse a recursive left syntax rule with FParsec?

How to parse recusrive grammar in FParsec

With FParsec how would I parse: line ending in newline <|> a line ending with eof

Applying a function returned from a subparser with fparsec

Use FParsec to parse a self-describing input

Categories

Resources