Improving the readability of a FParsec parser

Improving the readability of a FParsec parser - parsing

I have a hand-written CSS parser done in C# which is getting unmanageable and was trying to do it i FParsec to make it more mantainable. Here's a snippet that parses a css selector element made with regexes:
var tagRegex = #"(?<Tag>(?:[a-zA-Z][_\-0-9a-zA-Z]*|\*))";
var idRegex = #"(?:#(?<Id>[a-zA-Z][_\-0-9a-zA-Z]*))";
var classesRegex = #"(?<Classes>(?:\.[a-zA-Z][_\-0-9a-zA-Z]*)+)";
var pseudoClassRegex = #"(?::(?<PseudoClass>link|visited|hover|active|before|after|first-line|first-letter))";
var selectorRegex = new Regex("(?:(?:" + tagRegex + "?" + idRegex + ")|" +
"(?:" + tagRegex + "?" + classesRegex + ")|" +
tagRegex + ")" +
pseudoClassRegex + "?");
var m = selectorRegex.Match(str);
if (m.Length != str.Length) {
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
string tagName = m.Groups["Tag"].Value;
string pseudoClassString = m.Groups["PseudoClass"].Value;
CssPseudoClass pseudoClass;
if (pseudoClassString.IsEmpty()) {
pseudoClass = CssPseudoClass.None;
} else {
switch (pseudoClassString.ToLower()) {
case "link":
pseudoClass = CssPseudoClass.Link;
break;
case "visited":
pseudoClass = CssPseudoClass.Visited;
break;
case "hover":
pseudoClass = CssPseudoClass.Hover;
break;
case "active":
pseudoClass = CssPseudoClass.Active;
break;
case "before":
pseudoClass = CssPseudoClass.Before;
break;
case "after":
pseudoClass = CssPseudoClass.After;
break;
case "first-line":
pseudoClass = CssPseudoClass.FirstLine;
break;
case "first-letter":
pseudoClass = CssPseudoClass.FirstLetter;
break;
default:
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
}
string cssClassesString = m.Groups["Classes"].Value;
string[] cssClasses = cssClassesString.IsEmpty() ? EmptyArray<string>.Instance : cssClassesString.Substring(1).Split('.');
allCssClasses.AddRange(cssClasses);
return new CssSelectorElement(
tagName.ToLower(),
cssClasses,
m.Groups["Id"].Value,
pseudoClass);
My first attempt yielded this:
type CssPseudoClass =
| None = 0
| Link = 1
| Visited = 2
| Hover = 3
| Active = 4
| Before = 5
| After = 6
| FirstLine = 7
| FirstLetter = 8
type CssSelectorElement =
{ Tag : string
Id : string
Classes : string list
PseudoClass : CssPseudoClass }
with
static member Default =
{ Tag = "";
Id = "";
Classes = [];
PseudoClass = CssPseudoClass.None; }
open FParsec
let ws = spaces
let str = skipString
let strWithResult str result = skipString str >>. preturn result
let identifier =
let isIdentifierFirstChar c = isLetter c || c = '-'
let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = '-'
optional (str "-") >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"
let stringFromOptional strOption =
match strOption with
| Some(str) -> str
| None -> ""
let pseudoClassFromOptional pseudoClassOption =
match pseudoClassOption with
| Some(pseudoClassOption) -> pseudoClassOption
| None -> CssPseudoClass.None
let parseCssSelectorElement =
let tag = identifier <?> "tagName"
let id = str "#" >>. identifier <?> "#id"
let classes = many1 (str "." >>. identifier) <?> ".className"
let parseCssPseudoClass =
choiceL [ strWithResult "link" CssPseudoClass.Link;
strWithResult "visited" CssPseudoClass.Visited;
strWithResult "hover" CssPseudoClass.Hover;
strWithResult "active" CssPseudoClass.Active;
strWithResult "before" CssPseudoClass.Before;
strWithResult "after" CssPseudoClass.After;
strWithResult "first-line" CssPseudoClass.FirstLine;
strWithResult "first-letter" CssPseudoClass.FirstLetter]
"pseudo-class"
// (tag?id|tag?classes|tag)pseudoClass?
pipe2 ((pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Id = id })) |> attempt
<|>
(pipe2 (opt tag)
classes
(fun tag classes ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Classes = classes })) |> attempt
<|>
(tag |>> (fun tag -> { CssSelectorElement.Default with Tag = tag })))
(opt (str ":" >>. parseCssPseudoClass) |> attempt)
(fun selectorElem pseudoClass -> { selectorElem with PseudoClass = pseudoClassFromOptional pseudoClass })
But I'm not really liking how it's shaping up. I was expecting to come up with something easier to understand, but the part parsing (tag?id|tag?classes|tag)pseudoClass? with a few pipe2's and attempt's is really bad.
Came someone with more experience in FParsec educate me on better ways to accomplish this?
I'm thinking on trying FSLex/Yacc or Boost.Spirit instead of FParsec is see if I can come up with nicer code with them

You could extract some parts of that complex parser to variables, e.g.:
let tagid =
pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
You could also try using an applicative interface, personally I find it easier to use and think than pipe2:
let tagid =
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
<!> opt tag
<*> id

As Mauricio said, if you find yourself repeating code in an FParsec parser, you can always factor out the common parts into a variable or custom combinator. This is one of the great advantages of combinator libraries.
However, in this case you could also simplify and optimize the parser by reorganizing the grammer a bit. You could, for example, replace the lower half of the parseCssSelectorElement parser with
let defSel = CssSelectorElement.Default
let pIdSelector = id |>> (fun str -> {defSel with Id = str})
let pClassesSelector = classes |>> (fun strs -> {defSel with Classes = strs})
let pSelectorMain =
choice [pIdSelector
pClassesSelector
pipe2 tag (pIdSelector <|> pClassesSelector <|>% defSel)
(fun tagStr sel -> {sel with Tag = tagStr})]
pipe2 pSelectorMain (opt (str ":" >>. parseCssPseudoClass))
(fun sel optPseudo ->
match optPseudo with
| None -> sel
| Some pseudo -> {sel with PseudoClass = pseudo})
By the way, if you want to parse a large number of string constants, it's more efficient to use a dictionary based parsers, like
let pCssPseudoClass : Parser<CssPseudoClass,unit> =
let pseudoDict = dict ["link", CssPseudoClass.Link
"visited", CssPseudoClass.Visited
"hover", CssPseudoClass.Hover
"active", CssPseudoClass.Active
"before", CssPseudoClass.Before
"after", CssPseudoClass.After
"first-line", CssPseudoClass.FirstLine
"first-letter", CssPseudoClass.FirstLetter]
fun stream ->
let reply = identifier stream
if reply.Status <> Ok then Reply(reply.Status, reply.Error)
else
let mutable pseudo = CssPseudoClass.None
if pseudoDict.TryGetValue(reply.Result, &pseudo) then Reply(pseudo)
else // skip to beginning of invalid pseudo class
stream.Skip(-reply.Result.Length)
Reply(Error, messageError "unknown pseudo class")

Related

Mixfix operators with Fparsec

How could we parse mixfix operators with FParsec?
I tried to create a class type that looks like OperatorPrecedenceParser:
let identifier = many1Satisfy (fun c -> isLetter c || isDigit c || c = ''')
let symbol =
[ '!'; '#'; '&'; ','; '%'; '^'; '.';
'§'; '*'; '°'; '$'; '~'; ':'; '-';
'+'; '='; '?'; '/'; '>'; '<'; '|'; ]
module Operator =
type public Fixity =
| Left | Right | Neutral
let defaultPrecedence = 3
type public Mixfix =
{ nameParts: Identifier list; // the components of the mixfix function
fullname: Identifier; // the full operator name
mutable fixity: Fixity; // the fixity of the function
mutable prec: int; // operator precedence
arity: int } // number of arguments (starts to 1)
with member this.update fixity prec =
this.fixity <- fixity
this.prec <- prec
let private set'nameParts ids =
List.filter (fun id -> id <> "_") ids
let private set'fullname (id: Identifier list) =
System.String.Join("", id)
/// by default value
let private set'fixity (id: Identifier list) =
if id.First() = "_"
then Left
elif id.Last() <> "_"
then Neutral
else Right
let private set'arity (id: Identifier list) =
int (List.countWith (fun s -> s = "_") id)
let identifier = attempt identifier <|> operator
let makeApp expr1 expr2 = App(expr1, expr2)
let makeApps expr1 (exprs: SL list) =
List.fold
(fun acc e -> makeApp acc e)
(makeApp expr1 (exprs.First()))
(exprs
|> List.rev
|> List.dropLast
|> List.rev)
type MixFoxOp (ptv: Parser<_, _> option) =
member private this.operators : Mixfix list ref = ref []
member private this.termParser : Parser<_, _> list ref =
if ptv.IsNone
then ref []
else ref [ptv.Value]
member private this.termValue = choice !this.termParser
member private this.addMixFix (mixfix: Mixfix) =
this.operators.Value <- this.operators.Value # [mixfix]
member public this.contains id =
List.exists (fun s -> s.fullname = id) this.operators.Value
member private this.addMixFixOperator id fixity prec =
if this.contains (set'fullname id) = false
then this.addMixFix
{ nameParts = set'nameParts id;
fullname = set'fullname id;
fixity = fixity;
prec = prec;
arity = set'arity id }
else failwithf "Already defined function"
member public this.addOperator (id: Identifier list) (fixity: Fixity option) (prec: int option) =
this.addMixFixOperator id
(if fixity.IsNone then set'fixity id else fixity.Value)
(if prec.IsNone then defaultPrecedence else prec.Value)
member public this.setTermValue ptvalue =
this.termParser := !this.termParser # [ptvalue]
()
member public this.expressionParser =
parse {
let mutable nameParts : Identifier list = []
let mutable valueParts : Value list = []
let addNamePartRet name =
nameParts <- nameParts # [name]
preturn name
let addValuePartRet value =
valueParts <- valueParts # [value]
nameParts <- nameParts # ["_"]
preturn value
do! (attempt ((attempt identifier <|> symbol) >>= fun id -> addNamePartRet id) <|> (ws >>% "")) >>?
sepEndBy
(bws (this.termValue (* <|> this.expressionParser*)) >>= fun x -> addValuePartRet x)
(bws identifier
>>= fun id -> addNamePartRet id) >>% ()
let fullname = set'fullname nameParts
if this.contains fullname
then return makeApps (Var fullname) valueParts
elif valueParts.Length = 1 && nameParts.First() = "_"
then return valueParts.First()
else fail (sprintf "Unknown mixfix function: `%s`" fullname)
}
With a use such as this:
let opp = new Operator.MixFixOp(Some value)
opp.addOperator ["|"; "_"; "|"] None None // abscisse
opp.addOperator ["_"; "!"] None None // factorial
opp.addOperator ["_"; ","; "_"] None None // tuple
opp.addOperator ["if"; "_"; "then"; "_"; "else"; "_"]
// ...
let test = run (opp.expressionParser .>>? eof) "|32|"
And for example:
type Expr =
| Var of string
| App of Expr * Expr
| Int of int
let pint = pint32 |>> Int
let pvar = identifier |>> Var
let value' = attempt pint <|> pvar
let app = chainl1 value' (spaces1 >>% fun x y -> App(x, y))
let value = app
let opp = new Operator.MixFixOp(Some value)
...
But, already, it is not very satisfying as a method, because it requires knowing in advance which parser to use as an operation term, so we can't use operators in these parsers (a setTermValue method has been added in the type, but does not work, indeed, with each use, termParser remains empty, does not update), then we do not know how to use expressionParser without getting an infinite loop, it doesn't manage fixity or precedence, and finally, there are conflicts with the parsers to use as a term, if we expect for example an identifier, as in the examples.
In the long run, thanks to this type-class-parser, I would like to be able to parse expressions such as:
3 + 1 + 4
|32!|
if x > y then 0 else 1
How could I improve this module, and make it operational?
I would be very grateful if you could help me :)

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

NOTE: Not long ago, I had already asked a similar question. This is not a duplication, but the clarifications to be requested did not fall within the scope of the subject itself. I therefore allow myself to open another position dealing with the analysis of an ML-like syntax based on indentation, and considering everything as an instruction / expression.
For example:
"Hello" is an expression,
let foo = 2 + 1 is an instruction using an expression (2 + 1),
print foo is an instruction, ...
In short, a syntax and semantics that is quite modular and dynamic. Like F#, or OCaml.
To do this, I use F#, with the API (available on nuget) FParsec. The FParsec wiki provides an example of a syntax based on indentation, so I have taken it up again. The module in the code below used is IndentationParserWithoutBacktracking.
The example code to be parsed uses an elementary indentation, not mixing "literal" and "instructions/expressions":
loop i 1 10
loop k 1 10
print k
print i
print j
A simple code, and without context (but this is not important at the moment).
My implementation allows codes such as:
let foo = a + b
let foo =
let a = 9
let b = 1
a + b
let foo = 7
let foo =
loop i 1 10
print i
For example. (The loop and print are there just for the tests...)
The problem I have been having for a long week now, and that I can't solve, is the fact that the indentation module asks me every time an instruction is expected in a parser for a new line... Here is a screenshot:
This applies to all the examples mentioned above. I don't really understand the problem, and therefore don't know how to solve it.
Here is the code tested for this question, it meets the minimum and functional code criteria, however, FParsec must be used:
open FParsec
// This module come from 'https://github.com/stephan-tolksdorf/fparsec/wiki/Parsing-indentation-based-syntax-with-FParsec'
// I used the second module: 'IndentationParserWithoutBacktracking'
module IndentationParserWithoutBacktracking =
let tabStopDistance = 8
type LastParsedIndentation() =
[<DefaultValue>]
val mutable Value: int32
[<DefaultValue>]
val mutable EndIndex: int64
type UserState =
{Indentation: int
// We put LastParsedIndentation into the UserState so that we
// can conveniently use a separate instance for each stream.
// The members of the LastParsedIndentation instance will be mutated
// directly and hence won't be affected by any stream backtracking.
LastParsedIndentation: LastParsedIndentation}
with
static member Create() = {Indentation = -1
LastParsedIndentation = LastParsedIndentation(EndIndex = -1L)}
type CharStream = CharStream<UserState>
type Parser<'t> = Parser<'t, UserState>
// If this function is called at the same index in the stream
// where the function previously stopped, then the previously
// returned indentation will be returned again.
// This way we can avoid backtracking at the end of indented blocks.
let skipIndentation (stream: CharStream) =
let lastParsedIndentation = stream.UserState.LastParsedIndentation
if lastParsedIndentation.EndIndex = stream.Index then
lastParsedIndentation.Value
else
let mutable indentation = stream.SkipNewlineThenWhitespace(tabStopDistance, false)
lastParsedIndentation.EndIndex <- stream.Index
lastParsedIndentation.Value <- indentation
indentation
let indentedMany1 (p: Parser<'t>) label : Parser<'t list> =
fun stream ->
let oldIndentation = stream.UserState.Indentation
let indentation = skipIndentation stream
if indentation <= oldIndentation then
Reply(Error, expected (if indentation < 0 then "newline" else "indented " + label))
else
stream.UserState <- {stream.UserState with Indentation = indentation}
let results = ResizeArray()
let mutable stateTag = stream.StateTag
let mutable reply = p stream // parse the first element
let mutable newIndentation = 0
while reply.Status = Ok
&& (results.Add(reply.Result)
newIndentation <- skipIndentation stream
newIndentation = indentation)
do
stateTag <- stream.StateTag
reply <- p stream
if reply.Status = Ok
|| (stream.IsEndOfStream && results.Count > 0 && stream.StateTag = stateTag)
then
if newIndentation < indentation || stream.IsEndOfStream then
stream.UserState <- {stream.UserState with Indentation = oldIndentation}
Reply(List.ofSeq results)
else
Reply(Error, messageError "wrong indentation")
else // p failed
Reply(reply.Status, reply.Error)
open IndentationParserWithoutBacktracking
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
// A value is just a literal or a data name, called here "Variable"
type Value =
| Int of int | Float of float
| Bool of bool | String of string
| Char of char | Variable of Identifier
// All is an instruction, but there are some differences:
type Instr =
// Arithmetic
| Literal of Value | Infix of Instr * InfixOp * Instr
// Statements (instructions needing another instructions)
| Let of Identifier * Instr list
| Loop of Identifier * Instr * Instr * Instr list
// Other - the "print" function, from the link seen above
| Print of Identifier
and InfixOp =
// Arithmetic
| Sum | Sub | Mul | Div
// Logic
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> Identifier) <?> "identifier"
let betweenParentheses p =
(between (str "(") (str ")") p) <?> ""
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable)
// Instressions and statements
let pInstrs, pInstrimpl = createParserForwardedToRef()
// `ploop` is located here to force `pInstrs` to be of the type `Instr list`, `ploop` requesting an instression list.
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal)
(ws1 >>. literal)
(pInstrs)
(fun id min max stmts -> Loop(id, min, max, stmts))
// `singlepInstr` allows to use only one Instression, used just after.
let singlepInstr =
pInstrs |>> fun ex -> ex.Head
let term =
(ws >>. singlepInstr .>> ws) <|>
(betweenParentheses (ws >>. singlepInstr)) <|>
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. literal))
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum | "-" -> Sub
| "*" -> Mul | "/" -> Div
// Logical operators
| "&&" -> And | "||" -> Or
| "==" -> Equal | "!=" -> NotEqual
| ">" -> Greater | "<" -> Smaller
| ">=" -> GreaterEqual | "<=" -> SmallerEqual
| _ -> failwith ("Unknown operator: " + op)
let opParser = new OperatorPrecedenceParser<Instr, unit, UserState>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
// Statements
(*
- let:
let <identifier> = <instruction(s) / value>
- print:
print <identifier>
- loop:
loop <identifier> <literal> <literal> <indented statements>
*)
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. pInstrs)
(fun id exp -> Let(id, exp))
let print =
keyword "print" >>. ws1 >>. identifier
|>> Print
let instruction =
print <|> ploop <|> plet <|>
opParser.ExpressionParser <|>
literal
pInstrimpl := indentedMany1 instruction "instruction"
let document = pInstrs .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "%A" result
| Failure(errorMsg, _, _) -> printfn "%s" errorMsg
System.Console.Clear()
let code = test #"
let foo = a + b
"
I would like to understand first of all why it doesn't work, but also to be able to find a solution to my problem, and that this solution can be extended to the potential syntax additions of the parser.
Awaiting a salutary answer, thank you.

In order to understand why your parser doesn't work, you need to isolate the issues.
If I understand you correctly, you want your let parser to support either a single instruction on the same line or indented instructions on subsequent lines, e.g:
let x = instruction
let b =
instruction
instruction
If you can't get your existing implementation to work, I'd recommend going back to the implementation on the Wiki and trying to just add support for the let statement.
For example, I made the Wiki parser accept simple let statements with the following modifications:
type Statement = Loop of Identifier * int * int * Statement list
| Print of Identifier
| Let of Identifier * Statement list
let ws = skipManySatisfy isBlank
let str s = pstring s .>> ws
let statement, statementRef = createParserForwardedToRef()
let indentedStatements = indentedMany1 statement "statement"
let plet = keyword "let" >>. pipe2 (ws1 >>. identifier)
(ws >>. str "=" >>. ws
>>. (indentedStatements
<|> (statement |>> fun s -> [s])))
(fun id exp -> Let(id, exp))
statementRef := print <|> loop <|> plet
Note that in the modified version statement is now the parser forwarded to a ref cell, not indentedStatements.
Note also that ws is not implemented with spaces, like in your parser. This is important because spaces also consumes newlines, which would prevent the indentedMany1 from seeing the newline and properly calculating the indentation.
The reason your parser produced an "Expecting: newline" error is that indentedMany1 needs a newline at the beginning of the indented sequence in order to be able to calculate the indentation. You would have to modify the implementation of indentedMany1 if you wanted to support e.g. the following indentation pattern:
let x = instruction
instruction
instruction

F#, FParsec, and Updating UserState

Okay, since my last question elicited no responses, I'm forging ahead in a different direction. Lol!
I can't find any examples beyond the official documentation on managing user state, or accessing the results of a prior parser.
N.b. This code does not compile.
namespace MultipartMIMEParser
open FParsec
open System.IO
type Header = { name : string
; value : string
; addl : (string * string) list option }
type Content = Content of string
| Post of Post list
and Post = { headers : Header list
; content : Content }
type private UserState = { Boundary : string }
with static member Default = { Boundary="" }
module internal P =
let ($) f x = f x
let undefined = failwith "Undefined."
let ascii = System.Text.Encoding.ASCII
let str cs = System.String.Concat (cs:char list)
let makeHeader ((n,v),nvps) = { name=n; value=v; addl=nvps}
let runP p s = match runParserOnStream p UserState.Default "" s ascii with
| Success (r,_,_) -> r
| Failure (e,_,_) -> failwith (sprintf "%A" e)
let blankField = parray 2 newline
let delimited d e =
let pEnd = preturn () .>> e
let part = spaces >>. (manyTill $ noneOf d $ (attempt (preturn () .>> pstring d) <|> pEnd)) |>> str
in part .>>. part
let delimited3 firstDelimiter secondDelimiter thirdDelimiter endMarker =
delimited firstDelimiter endMarker
.>>. opt (many (delimited secondDelimiter endMarker
>>. delimited thirdDelimiter endMarker))
// TODO: This is the parser I'm asking about.
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
in delimited3 ":" ";" "=" blankField
|>> makeHeader
>>. fun stream -> if includesBoundary // How do I access the output from makeHeader here?
then stream.UserState <- setBoundary b // I need b to be read from the output of makeHeader.
Reply ()
else Reply ()
let pHeaders = manyTill pHeader $ attempt (preturn () .>> blankField)
// N.b. This is the mess I'm currently wrestling with. It does not compile, and is
// not sound yet.
let rec pContent boundary =
match boundary with
| "" -> // Content is text.
let line = restOfLine false
in pipe2 pHeaders (manyTill line $ attempt (preturn () .>> blankField))
$ fun h c -> { headers=h
; content=Content $ System.String.Join (System.Environment.NewLine,c) }
| _ -> // Content contains boundaries.
let b = "--"+boundary
let p = pipe2 pHeaders (pContent b) $ fun h c -> { headers=h; content=c }
in skipString b >>. manyTill p (attempt (preturn () .>> blankField))
let pStream = runP (pipe2 pHeaders pContent $ fun h c -> { headers=h; content=c })
type MParser (s:Stream) =
let r = P.pStream s
let findHeader name =
match r.headers |> List.tryFind (fun h -> h.name.ToLower() = name) with
| Some h -> h.value
| None -> ""
member p.Boundary =
let isBoundary ((s:string),_) = s.ToLower() = "boundary"
let header = r.headers
|> List.tryFind (fun h -> if h.addl.IsSome
then h.addl.Value |> List.exists isBoundary
else false)
in match header with
| Some h -> h.addl.Value |> List.find isBoundary |> snd
| None -> ""
member p.ContentID = findHeader "content-id"
member p.ContentLocation = findHeader "content-location"
member p.ContentSubtype = findHeader "type"
member p.ContentTransferEncoding = findHeader "content-transfer-encoding"
member p.ContentType = findHeader "content-type"
member p.Content = r.content
member p.Headers = r.headers
member p.MessageID = findHeader "message-id"
member p.MimeVersion = findHeader "mime-version"
A truncated example of the POST I am trying to parse follows:
content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"
--RN-Http-Body-Boundary
Message-ID: <25845033.1160080657073.JavaMail.webmethods#exshaw>
Mime-Version: 1.0
Content-Type: multipart/related; type="application/xml";
boundary="----=_Part_235_11184805.1160080657052"
------=_Part_235_11184805.1160080657052
Content-Type: Application/XML
Content-Transfer-Encoding: binary
Content-Location: RN-Preamble
Content-ID: <1430586.1160080657050.JavaMail.webmethods#exshaw>
XML document begins here...

So basically, what you want to do in pHeader is to use the parser as a monad, rather than an applicative. Based on your code style you come from Haskell so I'll assume you know these words. Something like this then:
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
in delimited3 ":" ";" "=" blankField
|>> makeHeader
>>= fun header stream ->
if includesBoundary header
then let b = undefined // some expression including header, if I understood correctly
stream.UserState <- setBoundary b
Reply ()
else Reply ()
Or you can write it in a computation expression (which would correspond to do-notation in Haskell):
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
parse {
let! header =
delimited3 ":" ";" "=" blankField
|>> makeHeader
return! fun stream ->
if includesBoundary header
then let b = undefined // some expression including header, if I understood correctly
stream.UserState <- setBoundary b
Reply ()
else Reply ()
}

'sepEndBy' does not capture if wrapped in in 'between'

I want to parse the following text:
WHERE
( AND
ApplicationGroup.REFSTR = 5
BV_1.Year = 2009
BV_1.MonetaryCodeId = 'Commited'
BV_3.Year = 2009
BV_3.MonetaryCodeId = 'Commited'
BV_4.Year = 2009
BV_4.MonetaryCodeId = 'Commited
)
I started with a combinator for the list of conditions:
let multiConditionWhereList : Parser<WhereCondition list, unit> =
sepEndBy1 (ws >>. whereCondition) (newline)
<?> "where condition list"
When I give hand over the condition list of the where-statement (every line with an =) I get back a Reply with seven WhereConditions in its Result. The Status is "Ok". But the Error-list contains a "Expected newline" ErrorMessage.
But whenever I try to parse this kind of list wrapped in round braces with an oparator at the beginning with a combinator of the following shape:
let multiConditionWhereClause : Parser<WhereStatement, unit> =
pstringCI "where"
.>> spaces
>>. between (pchar '(') (pchar ')')
( ws >>. whereChainOperator .>> spaces1
.>>. multiConditionWhereList )
|>> (fun (chainOp, conds) -> { Operator = chainOp;
SearchConditions = conds } )
I get an Reply with Status "Error". But the Error-List is empty as well as the result.
So I'm kind of stuck at this point. First I don't understand, why the sepByEnd1 combinator in my multiConditionWhereList produces a non-empty error list and expects a newline at the end. And more important, I don't get why the list is not captured, when I wrap it in a between statement.
As a reference, I include the whole set of rules as well as an invocation of the rule which causes the problems:
#light
#r "System.Xml.Linq.dll"
#r #"..\packages\FParsec.1.0.1\lib\net40-client\FParsecCS.dll"
#r #"..\packages\FParsec.1.0.1\lib\net40-client\FParsec.dll"
module Ast =
open System
open System.Xml.Linq
type AlfabetParseError (msg: string) =
inherit Exception (msg)
type FindStatement =
{ TableReferences: TableReferences;}
and TableReferences =
{ PrimaryTableReference: TableReferenceWithAlias; JoinTableReferences: JoinTableReference list; }
and TableReferenceWithAlias =
{ Name: string; Alias: string }
and JoinTableReference =
{ JoinType:JoinType; TableReference: TableReferenceWithAlias; JoinCondition: JoinCondition; }
and JoinType =
| InnerJoin
| OuterJoin
| LeftJoin
| RightJoin
and JoinCondition =
{ LeftHandSide: FieldReference; RightHandSide: FieldReference; }
and WhereStatement =
{ Operator: WhereOperator; SearchConditions: WhereCondition list }
and WhereOperator =
| And
| Or
| Equal
| Is
| IsNot
| Contains
| Like
| NoOp
and WhereLeftHandSide =
| FieldReferenceLH of FieldReference
and WhereRightHandSide =
| FieldReferenceRH of FieldReference
| VariableReferenceRH of VariableReference
| LiteralRH of Literal
and WhereCondition =
{ LeftHandSide: WhereLeftHandSide; Operator: WhereOperator; RightHandSide: WhereRightHandSide; }
and FieldReference =
{ FieldName: Identifier; TableName: Identifier }
and VariableReference =
{ VariableName : Identifier; }
and Literal =
| Str of string
| Int of int
| Hex of int
| Bin of int
| Float of float
| Null
and Identifier =
Identifier of string
and QueryXml =
{ Doc : XDocument }
module AlfabetQueryParser =
open Ast
open FParsec
open System
open System.Xml.Linq
module Parsers =
(* Utilities *)
let toJoinType (str:string) =
match str.ToLowerInvariant() with
| "innerjoin" -> InnerJoin
| "outerjoin" -> OuterJoin
| "leftjoin" -> LeftJoin
| "rightjoin" -> RightJoin
| _ -> raise <| AlfabetParseError "Invalid join type"
let toWhereOperator (str:string) =
match str.ToLowerInvariant() with
| "and" -> And
| "or" -> Or
| "=" -> Equal
| "is" -> Is
| "is not" -> IsNot
| "contains" -> Contains
| "like" -> Like
| _ -> raise <| AlfabetParseError "Invalid where operator type"
(* Parsers *)
let ws : Parser<string, unit> =
manyChars (satisfy (fun c -> c = ' '))
let ws1 : Parser<string, unit> =
many1Chars (satisfy (fun c -> c = ' '))
let identifier : Parser<string, unit> =
many1Chars (satisfy (fun(c) -> isDigit(c) || isAsciiLetter(c) || c.Equals('_')))
let fieldReference : Parser<FieldReference, unit> =
identifier
.>> pstring "."
.>>. identifier
|>> (fun (tname, fname) -> {FieldName = Identifier(fname);
TableName = Identifier(tname) })
let variableReference : Parser<VariableReference, unit> =
pstring ":"
>>. identifier
|>> (fun vname -> { VariableName = Identifier(vname) })
let numeralOrDecimal : Parser<Literal, unit> =
numberLiteral NumberLiteralOptions.AllowFraction "number"
|>> fun num ->
if num.IsInteger then Int(int num.String)
else Float(float num.String)
let hexNumber : Parser<Literal, unit> =
pstring "#x" >>. many1SatisfyL isHex "hex digit"
|>> fun hexStr ->
Hex(System.Convert.ToInt32(hexStr, 16))
let binaryNumber : Parser<Literal, unit> =
pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
|>> fun hexStr ->
Bin(System.Convert.ToInt32(hexStr, 2))
let numberLiteral : Parser<Literal, unit> =
choiceL [numeralOrDecimal
hexNumber
binaryNumber]
"number literal"
let strEscape =
pchar '\\' >>. pchar '\''
let strInnard =
strEscape <|> noneOf "\'"
let strInnards =
manyChars strInnard
let strLiteral =
between (pchar '\'') (pchar '\'') strInnards
|>> Str
let literal : Parser<Literal, unit> =
(pstringCI "null" |>> (fun str -> Null))
<|> numberLiteral
<|> strLiteral
let joinCondition : Parser<JoinCondition, unit> =
spaces .>> pstring "ON" .>> spaces
>>. fieldReference
.>> spaces .>> pstring "=" .>> spaces
.>>. fieldReference
|>> (fun(lhs, rhs) -> { LeftHandSide = lhs; RightHandSide = rhs })
let tableReferenceWithoutAlias : Parser<TableReferenceWithAlias, unit> =
identifier
|>> (fun (name) -> { Name = name; Alias = ""})
let tableReferenceWithAlias : Parser<TableReferenceWithAlias, unit> =
identifier
.>> spaces .>> pstringCI "as" .>> spaces
.>>. identifier
|>> (fun (name, alias) -> { Name = name; Alias = alias})
let primaryTableReference : Parser<TableReferenceWithAlias, unit> =
attempt tableReferenceWithAlias <|> tableReferenceWithoutAlias
let joinTableReference : Parser<JoinTableReference, unit> =
identifier
.>> spaces
.>>. (attempt tableReferenceWithAlias <|> tableReferenceWithoutAlias)
.>> spaces
.>>. joinCondition
|>> (fun ((joinTypeStr, tableRef), condition) -> { JoinType = toJoinType(joinTypeStr);
TableReference = tableRef;
JoinCondition = condition } )
let tableReferences : Parser<TableReferences, unit> =
primaryTableReference
.>> spaces
.>>. many (joinTableReference .>> spaces)
|>> (fun (pri, joinTables) -> { PrimaryTableReference = pri;
JoinTableReferences = joinTables; } )
let whereConditionOperator : Parser<WhereOperator, unit> =
choice [
pstringCI "="
; pstringCI "is not"
; pstringCI "is"
; pstringCI "contains"
; pstringCI "like"
]
|>> toWhereOperator
let whereChainOperator : Parser<WhereOperator, unit> =
choice [
pstringCI "and"
; pstringCI "or"
]
|>> toWhereOperator
let whereCondition : Parser<WhereCondition, unit> =
let leftHandSide : Parser<WhereLeftHandSide, unit> =
fieldReference |>> FieldReferenceLH
let rightHandSide : Parser<WhereRightHandSide, unit> =
(attempt fieldReference |>> FieldReferenceRH)
<|> (attempt variableReference |>> VariableReferenceRH)
<|> (literal |>> LiteralRH)
leftHandSide
.>> ws1 .>>. whereConditionOperator .>> ws1
.>>. rightHandSide
|>> (fun((lhs, op), rhs) -> { LeftHandSide = lhs;
Operator = op;
RightHandSide = rhs })
let singleConditionWhereClause : Parser<WhereStatement, unit> =
pstringCI "where" .>> spaces
>>. whereCondition
|>> (fun (cond) -> { Operator = NoOp;
SearchConditions = [ cond ] } );
let multiConditionChainOperator : Parser<WhereOperator, unit> =
pstring "(" .>> spaces >>. whereChainOperator .>> spaces
<?> "where multi-condition operator"
let multiConditionWhereList : Parser<WhereCondition list, unit> =
sepEndBy1 (ws >>. whereCondition) (newline)
<?> "where condition list"
let multiConditionWhereClause : Parser<WhereStatement, unit> =
pstringCI "where"
.>> spaces
>>. between (pchar '(') (pchar ')')
( ws >>. whereChainOperator .>> spaces1
.>>. multiConditionWhereList )
|>> (fun (chainOp, conds) -> { Operator = chainOp;
SearchConditions = conds } )
let whereClause : Parser<WhereStatement, unit> =
(attempt multiConditionWhereClause)
<|> singleConditionWhereClause
let findStatement : Parser<FindStatement, unit> =
spaces .>> pstringCI "find" .>> spaces
>>. tableReferences
|>> (fun (tableRef) -> { TableReferences = tableRef; } )
let queryXml : Parser<QueryXml, unit> =
pstringCI "QUERY_XML" .>> newline
>>. manyCharsTill anyChar eof
|>> (fun (xmlStr) -> { Doc = XDocument.Parse(xmlStr) } )
let parse input =
match run Parsers.findStatement input with
| Success (x, _, _) -> x
| Failure (x, _, _) -> raise <| AlfabetParseError x
open FParsec
let input = #"WHERE
( AND
ApplicationGroup.REFSTR CONTAINS :BASE
BV_1.Year = 2009
BV_1.MonetaryCodeId = 'Commited'
BV_3.Year = 2009
BV_3.MonetaryCodeId = 'Commited'
BV_4.Year = 2009
BV_4.MonetaryCodeId = 'Commited'
)"
let r = run AlfabetQueryParser.Parsers.multiConditionWhereClause input

The reason FParsec can't generate more useful error messages for your example is that you've defined the ws and id parsers using the satisfy primitive. Since you only specified a predicate function, FParsec doesn't know how to describe the expected input. The User's Guide explains this issues and how to avoid it. In your code, you could e.g. use satisfyL or many1SatisfyL for the definitions.
After fixing the ws and id parsers you'll quickly discover that your code doesn't properly parse the list because the whitespace parsing is messed up. Where possible, you should always parse whitespace as trailing whitespace, not as leading whitespace, because this avoids the need for backtracking. To fix your parser for the input you gave above, you could e.g. replace
sepEndBy1 (ws >>. whereCondition) (newline)
with
sepEndBy1 (whereCondition .>> ws) (newline >>. ws)
in the definition of multiConditionWhereList.
Note that a non-empty error message list doesn't necessarily imply an error, as FParsec will generally collect the error messages of all parsers that were applied at the current position in the stream, even if the parser is "optional". This is probably the reason you were seeing the "expected newline", since a newline would have been accepted at that position.

How do I test for exactly 2 characters with fparsec?

I have the following program that runs. It takes a line of text and splits it into two parts, the first is an identifier and the second is the remainder of the line. My parser for the identifier (factID) takes any string of characters as the identifier, which is not (quite) what I want. What I want is a parser that only succeeds when it encounters two consecutive upper case letters. So for example "AA" should succeed while "A", "A1" or "AAA" should not.
What I can't figure out is how construct a parser that looks for a fixed length token. I thought perhaps CharParsers.next2CharsSatisfy might be the function I am looking for, but I can't figure out how to properly use it.
open FParsec
let test p str =
match run p str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
let ws = spaces
let str_ws s = pstring s .>> ws
type StringConstant = StringConstant of string * string
let factID =
let isIdentifierFirstChar c = isLetter c
let isIdentifierChar c = isLetter c
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "factID"
let factText =
let isG c = isLetter c || isDigit c || c = ' ' || c = '.'
manySatisfy isG
let factParse = pipe3 factID (str_ws " ") factText
(fun id _ str -> StringConstant(id, str))
[<EntryPoint>]
let main argv =
test factParse "AA This is some text." // This should pass
test factParse "A1 This is some text." // This should fail
test factParse "AAA This is some text." // This passes but I want it to fail
0 // return an integer exit code

I think this would do it
let pFactID = manyMinMaxSatisfy 2 2 Char.IsUpper

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Improving the readability of a FParsec parser - parsing

Related

Mixfix operators with Fparsec

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

F#, FParsec, and Updating UserState

'sepEndBy' does not capture if wrapped in in 'between'

How do I test for exactly 2 characters with fparsec?

Categories

Resources