Parsing numbers in FParsec

Parsing numbers in FParsec - f#

I've started learning FParsec. It has a very flexible way to parse numbers; I can provide a set of number formats I want to use:
type Number =
| Numeral of int
| Decimal of float
| Hexadecimal of int
| Binary of int
let numberFormat = NumberLiteralOptions.AllowFraction
||| NumberLiteralOptions.AllowHexadecimal
||| NumberLiteralOptions.AllowBinary
let pnumber =
numberLiteral numberFormat "number"
|>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String)
elif num.IsBinary then Binary (int num.String)
elif num.IsInteger then Numeral (int num.String)
else Decimal (float num.String)
However, the language I'm trying to parse is a bit strange. A number could be numeral (non-negative int), decimal (non-negative float), hexadecimal (with prefix #x) or binary (with prefix #b):
numeral: 0, 2
decimal: 0.2, 2.0
hexadecimal: #xA04, #x611ff
binary: #b100, #b001
Right now I have to do parsing twice by substituting # by 0 (if necessary) to make use of pnumber:
let number: Parser<_, unit> =
let isDotOrDigit c = isDigit c || c = '.'
let numOrDec = many1Satisfy2 isDigit isDotOrDigit
let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s"
let str = spaces >>. numOrDec <|> hexOrBin
str |>> fun s -> match run pnumber s with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> failwith errorMsg
What is a better way of parsing in this case? Or how can I alter FParsec's CharStream to be able to make conditional parsing easier?

Parsing numbers can be pretty messy if you want to generate good error messages and properly check for overflows.
The following is a simple FParsec implementation of your number parser:
let numeralOrDecimal : Parser<_, unit> =
// note: doesn't parse a float exponent suffix
numberLiteral NumberLiteralOptions.AllowFraction "number"
|>> fun num ->
// raises an exception on overflow
if num.IsInteger then Numeral(int num.String)
else Decimal(float num.String)
let hexNumber =
pstring "#x" >>. many1SatisfyL isHex "hex digit"
|>> fun hexStr ->
// raises an exception on overflow
Hexadecimal(System.Convert.ToInt32(hexStr, 16))
let binaryNumber =
pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
|>> fun hexStr ->
// raises an exception on overflow
Binary(System.Convert.ToInt32(hexStr, 2))
let number =
choiceL [numeralOrDecimal
hexNumber
binaryNumber]
"number literal"
Generating good error messages on overflows would complicate this implementation a bit, as you would ideally also need to backtrack after the error, so that the error position ends up at the start of the number literal (see the numberLiteral docs for an example).
A simple way to gracefully handle possible overflow exception is to use a little exception handling combinator like the following:
let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> =
fun stream ->
let state = stream.State
try
p stream
with e -> // catching all exceptions is somewhat dangerous
stream.BacktrackTo(state)
Reply(FatalError, messageError e.Message)
You could then write
let number = mayThrow (choiceL [...] "number literal")
I'm not sure what you meant to say with "alter FParsec's CharStream to be able to make conditional parsing easier", but the following sample demonstrates how you could write a low-level implementation that only uses the CharStream methods directly.
type NumberStyles = System.Globalization.NumberStyles
let invariantCulture = System.Globalization.CultureInfo.InvariantCulture
let number: Parser<Number, unit> =
let expectedNumber = expected "number"
let inline isBinary c = c = '0' || c = '1'
let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9
let hexStringToInt (str: string) = // does no argument or overflow checking
let mutable n = 0
for c in str do
n <- n*16 + hex2int c
n
let binStringToInt (str: string) = // does no argument or overflow checking
let mutable n = 0
for c in str do
n <- n*2 + (int c - int '0')
n
let findIndexOfFirstNonNull (str: string) =
let mutable i = 0
while i < str.Length && str.[i] = '0' do
i <- i + 1
i
let isHexFun = id isHex // tricks the compiler into caching the function object
let isDigitFun = id isDigit
let isBinaryFun = id isBinary
fun stream ->
let start = stream.IndexToken
let cs = stream.Peek2()
match cs.Char0, cs.Char1 with
| '#', 'x' ->
stream.Skip(2)
let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false)
if str.Length <> 0 then
let i = findIndexOfFirstNonNull str
let length = str.Length - i
if length < 8 || (length = 8 && str.[i] <= '7') then
Reply(Hexadecimal(hexStringToInt str))
else
stream.Seek(start)
Reply(Error, messageError "hex number literal is too large for 32-bit int")
else
Reply(Error, expected "hex digit")
| '#', 'b' ->
stream.Skip(2)
let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false)
if str.Length <> 0 then
let i = findIndexOfFirstNonNull str
let length = str.Length - i
if length < 32 then
Reply(Binary(binStringToInt str))
else
stream.Seek(start)
Reply(Error, messageError "binary number literal is too large for 32-bit int")
else
Reply(Error, expected "binary digit")
| c, _ ->
if not (isDigit c) then Reply(Error, expectedNumber)
else
stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore
if stream.Skip('.') then
let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun)
if n2 <> 0 then
// we don't parse any exponent, as in the other example
let mutable result = 0.
if System.Double.TryParse(stream.ReadFrom(start),
NumberStyles.AllowDecimalPoint,
invariantCulture,
&result)
then Reply(Decimal(result))
else
stream.Seek(start)
Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue")
else
Reply(Error, expected "digit")
else
let decimalString = stream.ReadFrom(start)
let mutable result = 0
if System.Int32.TryParse(stream.ReadFrom(start),
NumberStyles.None,
invariantCulture,
&result)
then Reply(Numeral(result))
else
stream.Seek(start)
Reply(Error, messageError "decimal number literal is too large for 32-bit int")
While this implementation parses hex and binary numbers without the help of system methods, it eventually delegates the parsing of decimal numbers to the Int32.TryParse and Double.TryParse methods.
As I said: it's messy.

Related

fparsec - limit number of characters that a parser is applied to

I have a problem where during the parsing of a stream I get to point where the next N characters need to be parsed by applying a specfic parser multiple times (in sequence).
(stripped down toy) Example:
17<tag><anothertag><a42...
^
|- I'm here
Let's say the 17 indicates that the next N=17 characters make up tags, so I need to repetetively apply my "tagParser" but stop after 17 chars and not consume the rest even if it looks like a tag because that has a different meaning and will be parsed by another parser.
I cannot use many or many1 because that would eat the stream beyond those N characters.
Nor can I use parray because I do not know how many successful applications of that parser are there within the N characters.
I was looking into manyMinMaxSatisfy but could not figure out how to make use of it in this case.
Is there a way to cut N chars of a stream and feed them to some parser? Or is there a way to invoke many applications but up to N chars?
Thanks.

You can use getPosition to make sure you don't go past the specified number of characters. I threw this together (using F# 6) and it seems to work, although simpler/faster solutions may be possible:
let manyLimit nChars p =
parse {
let! startPos = getPosition
let rec loop values =
parse {
let! curPos = getPosition
let nRemain = (startPos.Index + nChars) - curPos.Index
if nRemain = 0 then
return values
elif nRemain > 0 then
let! value = p
return! loop (value :: values)
else
return! fail $"limit exceeded by {-nRemain} chars"
}
let! values = loop []
return values |> List.rev
}
Test code:
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let parser =
parse {
let! nChars = pint64
let! tags = manyLimit nChars ptag
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"
Output is:
Success: (["tag"; "anothertag"], "<a42...")

Quite low-level parser, that operates on raw Reply objects. It reads count of chars, creates substring to feed to tags parser and consumes rest. There's should be an easier way, but I don't have much experience with FParsec
open FParsec
type Tag = Tag of string
let pTag = // parses tag string and constructs 'Tag' object
skipChar '<' >>. many1Satisfy isLetter .>> skipChar '>'
|>> Tag
let pCountPrefixedTags stream =
let count = pint32 stream // read chars count
if count.Status = Ok then
let count = count.Result
// take exactly 'count' chars
let tags = manyMinMaxSatisfy count count (fun _ -> true) stream
if tags.Status = Ok then
// parse substring with tags
let res = run (many1 pTag) tags.Result
match res with
| Success (res, _, _) -> Reply(res)
| Failure (_, error, _) -> Reply(ReplyStatus.Error, error.Messages)
else
Reply(tags.Status, tags.Error)
else
Reply(count.Status, count.Error)
let consumeStream =
many1Satisfy (fun _ -> true)
run (pCountPrefixedTags .>>. consumeStream) "17<tag><anothertag><notTag..."
|> printfn "%A" // Success: ([Tag "tag"; Tag "anothertag"], "<notTag...")

You also can do this without going down to stream level.
open FParsec
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let tagsFromChars (l: char[]) =
let s = new System.String(l)
match run (many ptag) s with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> []
let parser =
parse {
let! nChars = pint32
let! tags = parray nChars anyChar |>> tagsFromChars
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"

Why are these two FParsec snippets different?

I want to parse a given char twice, but return a string of that character only once.
For example:
aa -> a
I have some code that works, but also some code that does not work, and I don't understand why.
Why are these snippets different?
// Works
let parseEscapedQuote (c : char) =
let q = string c
pstring (q + q) >>% q
// Does not work
let parseEscapedQuote (c : char) =
let q = string c
pchar c >>. pchar c >>% q

The second one will successfully parse a repeated character the way you want, but it might not fail the way you expect. If only the first pchar c succeeds, it will leave your parser in an invalid state. To fix this, you can use attempt, which restores the prior state if it fails:
attempt (pchar c >>. pchar c) >>% q
Here's a complete example that illustrates the difference:
open FParsec
let parseTwiceBad (c : char) =
pchar c >>. pchar c >>% string c
let parseTwiceGood (c : char) =
attempt (pchar c >>. pchar c) >>% string c
let mkParser parseTwice =
choice [
parseTwice 'x'
anyString 3
]
let run parser str =
let result = runParserOnString parser () "" str
match result with
| Success (value, _, _) -> printfn "Success: %A" value
| Failure (msg, _, _) -> printfn "Failure: %s" msg
let test str =
printfn ""
printfn "Parsing \"%s\" with bad parser:" str
let parser = mkParser parseTwiceBad
run parser str
printfn "Parsing \"%s\" with good parser:" str
let parser = mkParser parseTwiceGood
run parser str
[<EntryPoint>]
let main argv =
test "xx"
test "xAx"
0
Output:
Parsing "xx" with bad parser:
Success: "x"
Parsing "xx" with good parser:
Success: "x"
Parsing "xAx" with bad parser:
Failure: Error in Ln: 1 Col: 2
xAx
^
Expecting: 'x'
Parsing "xAx" with good parser:
Success: "xAx"

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

NOTE: Not long ago, I had already asked a similar question. This is not a duplication, but the clarifications to be requested did not fall within the scope of the subject itself. I therefore allow myself to open another position dealing with the analysis of an ML-like syntax based on indentation, and considering everything as an instruction / expression.
For example:
"Hello" is an expression,
let foo = 2 + 1 is an instruction using an expression (2 + 1),
print foo is an instruction, ...
In short, a syntax and semantics that is quite modular and dynamic. Like F#, or OCaml.
To do this, I use F#, with the API (available on nuget) FParsec. The FParsec wiki provides an example of a syntax based on indentation, so I have taken it up again. The module in the code below used is IndentationParserWithoutBacktracking.
The example code to be parsed uses an elementary indentation, not mixing "literal" and "instructions/expressions":
loop i 1 10
loop k 1 10
print k
print i
print j
A simple code, and without context (but this is not important at the moment).
My implementation allows codes such as:
let foo = a + b
let foo =
let a = 9
let b = 1
a + b
let foo = 7
let foo =
loop i 1 10
print i
For example. (The loop and print are there just for the tests...)
The problem I have been having for a long week now, and that I can't solve, is the fact that the indentation module asks me every time an instruction is expected in a parser for a new line... Here is a screenshot:
This applies to all the examples mentioned above. I don't really understand the problem, and therefore don't know how to solve it.
Here is the code tested for this question, it meets the minimum and functional code criteria, however, FParsec must be used:
open FParsec
// This module come from 'https://github.com/stephan-tolksdorf/fparsec/wiki/Parsing-indentation-based-syntax-with-FParsec'
// I used the second module: 'IndentationParserWithoutBacktracking'
module IndentationParserWithoutBacktracking =
let tabStopDistance = 8
type LastParsedIndentation() =
[<DefaultValue>]
val mutable Value: int32
[<DefaultValue>]
val mutable EndIndex: int64
type UserState =
{Indentation: int
// We put LastParsedIndentation into the UserState so that we
// can conveniently use a separate instance for each stream.
// The members of the LastParsedIndentation instance will be mutated
// directly and hence won't be affected by any stream backtracking.
LastParsedIndentation: LastParsedIndentation}
with
static member Create() = {Indentation = -1
LastParsedIndentation = LastParsedIndentation(EndIndex = -1L)}
type CharStream = CharStream<UserState>
type Parser<'t> = Parser<'t, UserState>
// If this function is called at the same index in the stream
// where the function previously stopped, then the previously
// returned indentation will be returned again.
// This way we can avoid backtracking at the end of indented blocks.
let skipIndentation (stream: CharStream) =
let lastParsedIndentation = stream.UserState.LastParsedIndentation
if lastParsedIndentation.EndIndex = stream.Index then
lastParsedIndentation.Value
else
let mutable indentation = stream.SkipNewlineThenWhitespace(tabStopDistance, false)
lastParsedIndentation.EndIndex <- stream.Index
lastParsedIndentation.Value <- indentation
indentation
let indentedMany1 (p: Parser<'t>) label : Parser<'t list> =
fun stream ->
let oldIndentation = stream.UserState.Indentation
let indentation = skipIndentation stream
if indentation <= oldIndentation then
Reply(Error, expected (if indentation < 0 then "newline" else "indented " + label))
else
stream.UserState <- {stream.UserState with Indentation = indentation}
let results = ResizeArray()
let mutable stateTag = stream.StateTag
let mutable reply = p stream // parse the first element
let mutable newIndentation = 0
while reply.Status = Ok
&& (results.Add(reply.Result)
newIndentation <- skipIndentation stream
newIndentation = indentation)
do
stateTag <- stream.StateTag
reply <- p stream
if reply.Status = Ok
|| (stream.IsEndOfStream && results.Count > 0 && stream.StateTag = stateTag)
then
if newIndentation < indentation || stream.IsEndOfStream then
stream.UserState <- {stream.UserState with Indentation = oldIndentation}
Reply(List.ofSeq results)
else
Reply(Error, messageError "wrong indentation")
else // p failed
Reply(reply.Status, reply.Error)
open IndentationParserWithoutBacktracking
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
// A value is just a literal or a data name, called here "Variable"
type Value =
| Int of int | Float of float
| Bool of bool | String of string
| Char of char | Variable of Identifier
// All is an instruction, but there are some differences:
type Instr =
// Arithmetic
| Literal of Value | Infix of Instr * InfixOp * Instr
// Statements (instructions needing another instructions)
| Let of Identifier * Instr list
| Loop of Identifier * Instr * Instr * Instr list
// Other - the "print" function, from the link seen above
| Print of Identifier
and InfixOp =
// Arithmetic
| Sum | Sub | Mul | Div
// Logic
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> Identifier) <?> "identifier"
let betweenParentheses p =
(between (str "(") (str ")") p) <?> ""
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable)
// Instressions and statements
let pInstrs, pInstrimpl = createParserForwardedToRef()
// `ploop` is located here to force `pInstrs` to be of the type `Instr list`, `ploop` requesting an instression list.
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal)
(ws1 >>. literal)
(pInstrs)
(fun id min max stmts -> Loop(id, min, max, stmts))
// `singlepInstr` allows to use only one Instression, used just after.
let singlepInstr =
pInstrs |>> fun ex -> ex.Head
let term =
(ws >>. singlepInstr .>> ws) <|>
(betweenParentheses (ws >>. singlepInstr)) <|>
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. literal))
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum | "-" -> Sub
| "*" -> Mul | "/" -> Div
// Logical operators
| "&&" -> And | "||" -> Or
| "==" -> Equal | "!=" -> NotEqual
| ">" -> Greater | "<" -> Smaller
| ">=" -> GreaterEqual | "<=" -> SmallerEqual
| _ -> failwith ("Unknown operator: " + op)
let opParser = new OperatorPrecedenceParser<Instr, unit, UserState>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
// Statements
(*
- let:
let <identifier> = <instruction(s) / value>
- print:
print <identifier>
- loop:
loop <identifier> <literal> <literal> <indented statements>
*)
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. pInstrs)
(fun id exp -> Let(id, exp))
let print =
keyword "print" >>. ws1 >>. identifier
|>> Print
let instruction =
print <|> ploop <|> plet <|>
opParser.ExpressionParser <|>
literal
pInstrimpl := indentedMany1 instruction "instruction"
let document = pInstrs .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "%A" result
| Failure(errorMsg, _, _) -> printfn "%s" errorMsg
System.Console.Clear()
let code = test #"
let foo = a + b
"
I would like to understand first of all why it doesn't work, but also to be able to find a solution to my problem, and that this solution can be extended to the potential syntax additions of the parser.
Awaiting a salutary answer, thank you.

In order to understand why your parser doesn't work, you need to isolate the issues.
If I understand you correctly, you want your let parser to support either a single instruction on the same line or indented instructions on subsequent lines, e.g:
let x = instruction
let b =
instruction
instruction
If you can't get your existing implementation to work, I'd recommend going back to the implementation on the Wiki and trying to just add support for the let statement.
For example, I made the Wiki parser accept simple let statements with the following modifications:
type Statement = Loop of Identifier * int * int * Statement list
| Print of Identifier
| Let of Identifier * Statement list
let ws = skipManySatisfy isBlank
let str s = pstring s .>> ws
let statement, statementRef = createParserForwardedToRef()
let indentedStatements = indentedMany1 statement "statement"
let plet = keyword "let" >>. pipe2 (ws1 >>. identifier)
(ws >>. str "=" >>. ws
>>. (indentedStatements
<|> (statement |>> fun s -> [s])))
(fun id exp -> Let(id, exp))
statementRef := print <|> loop <|> plet
Note that in the modified version statement is now the parser forwarded to a ref cell, not indentedStatements.
Note also that ws is not implemented with spaces, like in your parser. This is important because spaces also consumes newlines, which would prevent the indentedMany1 from seeing the newline and properly calculating the indentation.
The reason your parser produced an "Expecting: newline" error is that indentedMany1 needs a newline at the beginning of the indented sequence in order to be able to calculate the indentation. You would have to modify the implementation of indentedMany1 if you wanted to support e.g. the following indentation pattern:
let x = instruction
instruction
instruction

Use FParsec to parse a self-describing input

I'm using FParsec to parse an input that describes its own format. For example, consider this input:
int,str,int:4,'hello',3
The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.
What is the best way to parse something like this with FParsec?
I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.
open FParsec
type State = { Formats : string[]; Index : int32 }
with static member Default = { Formats = [||]; Index = 0 }
type Value =
| Integer of int
| String of string
let parseFormat : Parser<_, State> =
parse {
let! formats =
sepBy
(pstring "int" <|> pstring "str")
(skipString ",")
|>> Array.ofList
do! updateUserState (fun state -> { state with Formats = formats })
}
let parseValue format =
match format with
| "int" -> pint32 |>> Integer
| "str" ->
between
(skipString "'")
(skipString "'")
(manySatisfy (fun c -> c <> '\''))
|>> String
| _ -> failwith "Unexpected"
let parseValueByState =
parse {
let! state = getUserState
let format = state.Formats.[state.Index]
do! setUserState { state with Index = state.Index + 1}
return! parseValue format
}
let parseData =
sepBy
parseValueByState
(skipString ",")
let parse =
parseFormat
>>. skipString ":"
>>. parseData
[<EntryPoint>]
let main argv =
let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
printfn "%A" result
0

There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.
First, several library functions that may appear useful in other FParsec-related projects:
/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)
/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>
/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)
/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
let rec fold state =
function
| [] -> pzero
| hd::[] -> state <<+ hd
| hd::tl -> fold (state <<+ (hd .>> separator)) tl
fold (preturn []) parsers
|>> List.rev // reverse the list since we appended to the top
Now, the main code. The basic idea is to run parsing in three steps:
Parse out the keys (which are plain ASCII strings)
Map these keys to actual Value parsers
Run these parsers in order
The rest seems to be commented within the code. :)
/// The resulting type
type Output =
| Integer of int
| String of string
/// tag to parser mappings
let mappings =
[
"int" => (pint32 |>> Integer)
"str" => (
manySatisfy (fun c -> c <> '\'')
|> between (skipChar ''') (skipChar ''')
|>> String
)
]
|> makeMap
let myProcess : Parser<Output list> =
let pKeys = // First, we parse out the keys
many1Satisfy isAsciiLower // Parse one key; keys are always ASCII strings
|> sepBy <| (skipChar ',') // many keys separated by comma
.>> (skipChar ':') // all this with trailing semicolon
let pValues = fun keys ->
keys // take the keys list
|> List.map // find the required Value parser
// (NO ERROR CHECK for bad keys)
(fun p -> Map.find p mappings)
|> allOfSepBy (skipChar ',') // they must run in order, comma-separated
pKeys >>= pValues
Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]

#bytebuster beat me to it but I still post my solution. The technique is similar to #bytebuster.
Thanks for an interesting question.
In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.
open FParsec
type Value =
| Integer of int
| String of string
type ValueParser = Parser<Value, unit>
let parseIntValue : Parser<Value, unit> =
pint32 |>> Integer
let parseStringValue : Parser<Value, unit> =
between
(skipChar '\'')
(skipChar '\'')
(manySatisfy (fun c -> c <> '\''))
<?> "string"
|>> String
let parseValueParser : Parser<ValueParser, unit> =
choice
[
skipString "int" >>% parseIntValue
skipString "str" >>% parseStringValue
]
let parseValueParsers : Parser<ValueParser list, unit> =
sepBy1
parseValueParser
(skipChar ',')
// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust ps =
match ps with
| [] -> preturn []
| h::t ->
adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
loop id ps
let parseLine : Parser<Value list, unit> =
parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof
[<EntryPoint>]
let main argv =
let s = "int,str,int:4,'hello',3"
let r = run parseLine s
printfn "%A" r
0
Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].
Parsing int,str,str:4,'hello',3 (incorrect) yields:
Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
^
Expecting: string

I rewrote #FuleSnabel's sepByList as follows to help me understand it better. Does this look right?
let sepByList (parsers : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust parsers =
parse {
match parsers with
| [] -> return []
| parser :: tail ->
let! value = adjust parser
let! values = loop (fun parser -> sep >>. parser) tail
return value :: values
}
loop id parsers

How do I test for exactly 2 characters with fparsec?

I have the following program that runs. It takes a line of text and splits it into two parts, the first is an identifier and the second is the remainder of the line. My parser for the identifier (factID) takes any string of characters as the identifier, which is not (quite) what I want. What I want is a parser that only succeeds when it encounters two consecutive upper case letters. So for example "AA" should succeed while "A", "A1" or "AAA" should not.
What I can't figure out is how construct a parser that looks for a fixed length token. I thought perhaps CharParsers.next2CharsSatisfy might be the function I am looking for, but I can't figure out how to properly use it.
open FParsec
let test p str =
match run p str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
let ws = spaces
let str_ws s = pstring s .>> ws
type StringConstant = StringConstant of string * string
let factID =
let isIdentifierFirstChar c = isLetter c
let isIdentifierChar c = isLetter c
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "factID"
let factText =
let isG c = isLetter c || isDigit c || c = ' ' || c = '.'
manySatisfy isG
let factParse = pipe3 factID (str_ws " ") factText
(fun id _ str -> StringConstant(id, str))
[<EntryPoint>]
let main argv =
test factParse "AA This is some text." // This should pass
test factParse "A1 This is some text." // This should fail
test factParse "AAA This is some text." // This passes but I want it to fail
0 // return an integer exit code

I think this would do it
let pFactID = manyMinMaxSatisfy 2 2 Char.IsUpper

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parsing numbers in FParsec - f#

Related

fparsec - limit number of characters that a parser is applied to

Why are these two FParsec snippets different?

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

Use FParsec to parse a self-describing input

How do I test for exactly 2 characters with fparsec?

Categories

Resources