fparsec parsing alternatives with discriminated unions for DSL

fparsec parsing alternatives with discriminated unions for DSL - f#

I am trying to achieve below with fparsec and unions
(1 + (2 * 3)) //DSL sample input(recursive)
type AirthmeticExpression =
| Constant of float
| AddNumber of AirthmeticExpression * AirthmeticExpression
| Mul of AirthmeticExpression * AirthmeticExpression
in fparsec I have createParserForwardedToRef for Add and Mul as
let parseExpression, implementation = createParserForwardedToRef<AirthmeticExpression, unit>();;
let parseAdd = between pstring"(" pstring ")" (tuple2 (parseExpression .>> pstring " + ") parseExpression) |>> AddNumber
let parseMul = between pstring"(" pstring ")" (tuple2 (parseExpression .>> pstring " * ") parseExpression) |>> Mul
implementation := parseConstant <|> parseAdd <|> parseMull
but fparsec doc says for alternatives if parser p1 consumes input and fails it will not try p2.
in my case both Add and Mul has same pattern before operator, so only p1 is working. how can I refactor it so I can parse my input? in fparsec doc solution example, it worked as it was just parsing and not constructing Discriminated union instance. in my case I have to know which pattern matched so that I can create either Add or Mul

Edit: my original comment was just as flawed, as pointed out by #FyodorSoikin.
You are on the right track in your comment from yesterday by making a common parser for the operators and then having a single parser for operations that uses it. To make this more functional, you can have the operator parser return the union case to apply. This way, when parsing the full operation, you can just call it as a function.
let parseExpression, implementation = createParserForwardedToRef<AirthmeticExpression, unit>();;
let parseOperator = // : Parser<AirthmeticExpression * AirthmeticExpression -> AirthmeticExpression>
(pstring " + " |>> AddNumber)
<|> (pstring " * " |>> Mul)
let parseOperation =
pipe3 parseConstant parseOperator parseConstant
(fun x op y -> op (x, y)) // Here, op is either AddNumber or Mul
|> between (pstring "(") (pstring ")")
implementation := parseConstant <|> parseOperation
Original comment:
One possibility is to use attempt as said in the comments, but that function should generally be used as a last resort. A better solution is to factor out the wrapping:
let parseExpression, implementation = createParserForwardedToRef<AirthmeticExpression, unit>();;
let parseAdd = tuple2 (parseExpression .>> pstring " + ") parseExpression |>> AddNumber
let parseMul = tuple2 (parseExpression .>> pstring " * ") parseExpression |>> Mul
let parseOp = between (pstring "(") (pstring ")") (parseAdd <|> parseMul)
implementation := parseConstant <|> parseOp

Related

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

NOTE: Not long ago, I had already asked a similar question. This is not a duplication, but the clarifications to be requested did not fall within the scope of the subject itself. I therefore allow myself to open another position dealing with the analysis of an ML-like syntax based on indentation, and considering everything as an instruction / expression.
For example:
"Hello" is an expression,
let foo = 2 + 1 is an instruction using an expression (2 + 1),
print foo is an instruction, ...
In short, a syntax and semantics that is quite modular and dynamic. Like F#, or OCaml.
To do this, I use F#, with the API (available on nuget) FParsec. The FParsec wiki provides an example of a syntax based on indentation, so I have taken it up again. The module in the code below used is IndentationParserWithoutBacktracking.
The example code to be parsed uses an elementary indentation, not mixing "literal" and "instructions/expressions":
loop i 1 10
loop k 1 10
print k
print i
print j
A simple code, and without context (but this is not important at the moment).
My implementation allows codes such as:
let foo = a + b
let foo =
let a = 9
let b = 1
a + b
let foo = 7
let foo =
loop i 1 10
print i
For example. (The loop and print are there just for the tests...)
The problem I have been having for a long week now, and that I can't solve, is the fact that the indentation module asks me every time an instruction is expected in a parser for a new line... Here is a screenshot:
This applies to all the examples mentioned above. I don't really understand the problem, and therefore don't know how to solve it.
Here is the code tested for this question, it meets the minimum and functional code criteria, however, FParsec must be used:
open FParsec
// This module come from 'https://github.com/stephan-tolksdorf/fparsec/wiki/Parsing-indentation-based-syntax-with-FParsec'
// I used the second module: 'IndentationParserWithoutBacktracking'
module IndentationParserWithoutBacktracking =
let tabStopDistance = 8
type LastParsedIndentation() =
[<DefaultValue>]
val mutable Value: int32
[<DefaultValue>]
val mutable EndIndex: int64
type UserState =
{Indentation: int
// We put LastParsedIndentation into the UserState so that we
// can conveniently use a separate instance for each stream.
// The members of the LastParsedIndentation instance will be mutated
// directly and hence won't be affected by any stream backtracking.
LastParsedIndentation: LastParsedIndentation}
with
static member Create() = {Indentation = -1
LastParsedIndentation = LastParsedIndentation(EndIndex = -1L)}
type CharStream = CharStream<UserState>
type Parser<'t> = Parser<'t, UserState>
// If this function is called at the same index in the stream
// where the function previously stopped, then the previously
// returned indentation will be returned again.
// This way we can avoid backtracking at the end of indented blocks.
let skipIndentation (stream: CharStream) =
let lastParsedIndentation = stream.UserState.LastParsedIndentation
if lastParsedIndentation.EndIndex = stream.Index then
lastParsedIndentation.Value
else
let mutable indentation = stream.SkipNewlineThenWhitespace(tabStopDistance, false)
lastParsedIndentation.EndIndex <- stream.Index
lastParsedIndentation.Value <- indentation
indentation
let indentedMany1 (p: Parser<'t>) label : Parser<'t list> =
fun stream ->
let oldIndentation = stream.UserState.Indentation
let indentation = skipIndentation stream
if indentation <= oldIndentation then
Reply(Error, expected (if indentation < 0 then "newline" else "indented " + label))
else
stream.UserState <- {stream.UserState with Indentation = indentation}
let results = ResizeArray()
let mutable stateTag = stream.StateTag
let mutable reply = p stream // parse the first element
let mutable newIndentation = 0
while reply.Status = Ok
&& (results.Add(reply.Result)
newIndentation <- skipIndentation stream
newIndentation = indentation)
do
stateTag <- stream.StateTag
reply <- p stream
if reply.Status = Ok
|| (stream.IsEndOfStream && results.Count > 0 && stream.StateTag = stateTag)
then
if newIndentation < indentation || stream.IsEndOfStream then
stream.UserState <- {stream.UserState with Indentation = oldIndentation}
Reply(List.ofSeq results)
else
Reply(Error, messageError "wrong indentation")
else // p failed
Reply(reply.Status, reply.Error)
open IndentationParserWithoutBacktracking
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
// A value is just a literal or a data name, called here "Variable"
type Value =
| Int of int | Float of float
| Bool of bool | String of string
| Char of char | Variable of Identifier
// All is an instruction, but there are some differences:
type Instr =
// Arithmetic
| Literal of Value | Infix of Instr * InfixOp * Instr
// Statements (instructions needing another instructions)
| Let of Identifier * Instr list
| Loop of Identifier * Instr * Instr * Instr list
// Other - the "print" function, from the link seen above
| Print of Identifier
and InfixOp =
// Arithmetic
| Sum | Sub | Mul | Div
// Logic
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> Identifier) <?> "identifier"
let betweenParentheses p =
(between (str "(") (str ")") p) <?> ""
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable)
// Instressions and statements
let pInstrs, pInstrimpl = createParserForwardedToRef()
// `ploop` is located here to force `pInstrs` to be of the type `Instr list`, `ploop` requesting an instression list.
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal)
(ws1 >>. literal)
(pInstrs)
(fun id min max stmts -> Loop(id, min, max, stmts))
// `singlepInstr` allows to use only one Instression, used just after.
let singlepInstr =
pInstrs |>> fun ex -> ex.Head
let term =
(ws >>. singlepInstr .>> ws) <|>
(betweenParentheses (ws >>. singlepInstr)) <|>
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. literal))
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum | "-" -> Sub
| "*" -> Mul | "/" -> Div
// Logical operators
| "&&" -> And | "||" -> Or
| "==" -> Equal | "!=" -> NotEqual
| ">" -> Greater | "<" -> Smaller
| ">=" -> GreaterEqual | "<=" -> SmallerEqual
| _ -> failwith ("Unknown operator: " + op)
let opParser = new OperatorPrecedenceParser<Instr, unit, UserState>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
// Statements
(*
- let:
let <identifier> = <instruction(s) / value>
- print:
print <identifier>
- loop:
loop <identifier> <literal> <literal> <indented statements>
*)
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. pInstrs)
(fun id exp -> Let(id, exp))
let print =
keyword "print" >>. ws1 >>. identifier
|>> Print
let instruction =
print <|> ploop <|> plet <|>
opParser.ExpressionParser <|>
literal
pInstrimpl := indentedMany1 instruction "instruction"
let document = pInstrs .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "%A" result
| Failure(errorMsg, _, _) -> printfn "%s" errorMsg
System.Console.Clear()
let code = test #"
let foo = a + b
"
I would like to understand first of all why it doesn't work, but also to be able to find a solution to my problem, and that this solution can be extended to the potential syntax additions of the parser.
Awaiting a salutary answer, thank you.

In order to understand why your parser doesn't work, you need to isolate the issues.
If I understand you correctly, you want your let parser to support either a single instruction on the same line or indented instructions on subsequent lines, e.g:
let x = instruction
let b =
instruction
instruction
If you can't get your existing implementation to work, I'd recommend going back to the implementation on the Wiki and trying to just add support for the let statement.
For example, I made the Wiki parser accept simple let statements with the following modifications:
type Statement = Loop of Identifier * int * int * Statement list
| Print of Identifier
| Let of Identifier * Statement list
let ws = skipManySatisfy isBlank
let str s = pstring s .>> ws
let statement, statementRef = createParserForwardedToRef()
let indentedStatements = indentedMany1 statement "statement"
let plet = keyword "let" >>. pipe2 (ws1 >>. identifier)
(ws >>. str "=" >>. ws
>>. (indentedStatements
<|> (statement |>> fun s -> [s])))
(fun id exp -> Let(id, exp))
statementRef := print <|> loop <|> plet
Note that in the modified version statement is now the parser forwarded to a ref cell, not indentedStatements.
Note also that ws is not implemented with spaces, like in your parser. This is important because spaces also consumes newlines, which would prevent the indentedMany1 from seeing the newline and properly calculating the indentation.
The reason your parser produced an "Expecting: newline" error is that indentedMany1 needs a newline at the beginning of the indented sequence in order to be able to calculate the indentation. You would have to modify the implementation of indentedMany1 if you wanted to support e.g. the following indentation pattern:
let x = instruction
instruction
instruction

Indentation, expressions, statements and StackOverflowException with FParsec - Errors

I test indentation with FParsec, according to this implementation, but when I make it a little more complex by adding expressions (literals, lists, tuples and arithmetic operations), allowing expressions to top-level, and adding a variable creation statement; I first get a StackOverflowException error . In my opinion, this is because the expression parser is solicited in such a way as to make an infinite loop in the program. I see no other reason, however, I don't know how to fix this problem.
If I remove the attempt pexpression from my parser data statement, there is no more StackOverflowException, nevertheless the module IndentationParserWithoutBacktracking (therefore managing indentation) tells me that the code to be parsed is missing a "newline":
Failure: Error in Ln: 2 Col: 1
loop i 0 10
^
Expecting: let or print
The parser backtracked after:
Error in Ln: 3 Col: 5
let myVar = 2 + 1
^
Expecting: loop or print
The parser backtracked after:
Error in Ln: 3 Col: 17
let myVar = 2 + 1
^
Expecting: newline
All this according to the following text to be parsed:
loop i 0 10
let myVar = 2 + 1
print myVar
Here is my code:
open FParsec
// module IndentationParserWithoutBacktracking // see the link
// Utils
open IndentationParserWithoutBacktracking
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
type InfixOp =
| Sum | Sub | Mul | Div | Pow | Mod
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
type Value =
| Int of int
| Float of float
| Bool of bool
| String of string
| Char of char
| Variable of Identifier
type Expr =
| Literal of Value
| Infix of Expr * InfixOp * Expr
| List of Expr list
| Tuple of Expr list
type Statement =
| Expression of Expr
| Let of Identifier * Statement list
| Loop of Identifier * Expr * Expr * Statement list
| Print of Identifier
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary ||| NumberLiteralOptions.AllowPlusSign
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> fun i -> Identifier i) <?> "valid identifier"
let betweenParentheses p =
(between (str "(") (str ")") p)
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable) <?> "literal"
// Expressions
let pexpr, pexprimpl = createParserForwardedToRef()
let term =
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. pexpr)) <|>
(ws >>. pexpr .>> ws)
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum
| "-" -> Sub
| "*" -> Mul
| "/" -> Div
| "%" -> Mod
// Logical operators
| "&&" -> And
| "||" -> Or
| "==" -> Equal
| "!=" -> NotEqual
| ">" -> Greater
| "<" -> Smaller
| ">=" -> GreaterEqual
| "<=" -> SmallerEqual
let opParser = new OperatorPrecedenceParser<_, _, _>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
let list = between (str "[") (str "]") (sepBy pexpr (str ",")) |>> List
let tuple = between (str "(") (str ")") (sepBy pexpr (str ",")) |>> Tuple
let expression =
opParser.ExpressionParser <|> // I removed this line to don't have the mistake again.
list <|>
tuple <|>
literal
pexprimpl := attempt expression
// Statements
let statements, statementsRef = createParserForwardedToRef()
let pexpression = expression |>> Expression
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. statements)
(fun id gtt exp -> Let(id, gtt, exp))
// From the link, but "revisited"
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal) // If I put 'pexpr', it doesn't work too...
(ws1 >>. literal)
(statements)
(fun id min max stmts -> Loop(id, min, max, stmts))
let print = keyword "print" >>. (ws1 >>. identifier |>> Print)
let statement =
attempt plet <|>
attempt print <|>
attempt ploop <|>
attempt pexpression
statementsRef := indentedMany1 statement "statement"
let document = statements .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
System.Console.Clear()
test #"
loop i 0 10
let myVar = 2 + 1
print myVar
"
I know I ask several questions at the same time, and the site doesn't really allow it, but they're all a little linked together, so I might as well kill two birds with one stone...
I would really like to understand my mistakes, in order to design a parser for a very small ML-like language.
Thank you.
Edit
Here is my current code, which has been modified to respond to the first problems encountered with indentation:
open IndentationParserWithoutBacktracking // So from the link
let isBlank = fun c -> c = ' ' || c = '\t'
let ws = spaces
let ws1 = skipMany1SatisfyL isBlank "whitespace"
let str s = pstring s .>> ws
let keyword str = pstring str >>? nextCharSatisfiesNot (fun c -> isLetter c || isDigit c) <?> str
// AST
type Identifier = Identifier of string
type Value =
| Int of int
| Float of float
| Bool of bool
| String of string
| Char of char
| Variable of Identifier
// In FP, "all" is an expression, so:
type Expr =
// Arithmetic + lists and tuple
| Literal of Value
| Infix of Expr * InfixOp * Expr
| List of Expr list
| Tuple of Expr list
// Statements
| Return of Expr
| Loop of Identifier * Expr * Expr * Expr list
| Print of Identifier
and InfixOp =
| Sum | Sub | Mul | Div | Pow | Mod
| And | Or | Equal | NotEqual | Greater | Smaller | GreaterEqual | SmallerEqual
// Literals
let numberFormat = NumberLiteralOptions.AllowMinusSign ||| NumberLiteralOptions.AllowFraction |||
NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowOctal |||
NumberLiteralOptions.AllowBinary
let literal_numeric =
numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Literal (Int(int nl.String))
else Literal (Float(float nl.String))
let literal_bool =
(choice [
(stringReturn "true" (Literal (Bool true)))
(stringReturn "false" (Literal (Bool false)))
]
.>> ws) <?> "boolean"
let literal_string =
(between (pstring "\"") (pstring "\"") (manyChars (satisfy (fun c -> c <> '"')))
|>> fun s -> Literal (String s)) <?> "string"
let literal_char =
(between (pstring "'") (pstring "'") (satisfy (fun c -> c <> '''))
|>> fun c -> Literal (Char c)) <?> "character"
let identifier =
(many1Satisfy2L isLetter (fun c -> isLetter c || isDigit c) "identifier"
|>> fun i -> Identifier i) <?> "identifier"
let betweenParentheses p =
(between (str "(") (str ")") p) <?> ""
let variable = identifier |>> fun id -> Literal (Variable id)
let literal = (attempt literal_numeric <|>
attempt literal_bool <|>
attempt literal_char <|>
attempt literal_string <|>
attempt variable) <?> "literal"
// Expressions and statements
let pexprs, pexprimpl = createParserForwardedToRef()
// `ploop` is located here to force `pexprs` to be of the type `Expr list`, `ploop` requesting an expression list.
let ploop =
pipe4
(keyword "loop" >>. ws1 >>. identifier)
(ws1 >>. literal)
(ws1 >>. literal)
(pexprs)
(fun id min max stmts -> Loop(id, min, max, stmts))
// `singlepexpr` allows to use only one expression.
let singlepexpr =
pexprs |>> fun ex -> ex.Head
let term =
(ws >>. singlepexpr .>> ws) <|>
(betweenParentheses (ws >>. singlepexpr)) <|>
(ws >>. literal .>> ws) <|>
(betweenParentheses (ws >>. literal))
let infixOperator (p: OperatorPrecedenceParser<_, _, _>) op prec map =
p.AddOperator(InfixOperator(op, ws, prec, Associativity.Left, map))
let ops =
// Arithmetic
[ "+"; "-"; "*"; "/"; "%" ] #
// Logical
[ "&&"; "||"; "=="; "!="; ">"; "<"; ">="; "<=" ]
let opCorrespondance op =
match op with
// Arithmetic operators
| "+" -> Sum
| "-" -> Sub
| "*" -> Mul
| "/" -> Div
| "%" -> Mod
// Logical operators
| "&&" -> And
| "||" -> Or
| "==" -> Equal
| "!=" -> NotEqual
| ">" -> Greater
| "<" -> Smaller
| ">=" -> GreaterEqual
| "<=" -> SmallerEqual
let opParser = new OperatorPrecedenceParser<Expr, unit, UserState>()
for op in ops do
infixOperator opParser op 1 (fun x y -> Infix(x, opCorrespondance op, y))
opParser.TermParser <- term
let list = (between (str "[") (str "]") (sepBy singlepexpr (str ",")) |>> List) <?> "list"
let tuple = (between (str "(") (str ")") (sepBy singlepexpr (str ",")) |>> Tuple) <?> "tuple"
// Statements
// A commented `let` expression, commented for tests with the `return` instruction.
//let plet =
// pipe3
// (keyword "let" >>. ws1 >>. identifier)
// (ws >>. gtt ":")
// (ws >>. str "=" >>. ws >>. pexprs)
// (fun id gtt exp -> Let(id, gtt, exp))
let preturn =
keyword "return" >>. ws >>. singlepexpr
|>> fun ex -> Return ex
let print = keyword "print" >>. (ws1 >>. identifier |>> Print)
let instruction =
print <|>
ploop <|>
preturn <|>
opParser.ExpressionParser <|> // So we add the arithmetic, like x + y or 21 * 32 - 12 for example
list <|>
tuple <|>
literal
pexprimpl := indentedMany1 instruction "instruction"
let document = pexprs .>> spaces .>> eof
let test str =
match runParserOnString document (UserState.Create()) "" str with
| Success(result, _, _) -> printfn "%A" result
| Failure(errorMsg, _, _) -> printfn "%s" errorMsg
System.Console.Clear()
// The test code that give an error of "newline" expecting
let code = test #"
return 2 + 1
"
And here some screenshots about error:

The reason why indentedMany1 tells you it's expecting a newline in your example code is because that's what it's looking for: an indented block. Not an expression on one line. So your let myVar = 2 + 1 line is confusing it. If you wrote it as:
let myVar =
2 + 1
then I bet it would work.
What you need, I believe, is to change your let parser to allow one of two things: either an expression on a single line, or a block of statements (your statements parser). I.e., something like:
let pLetValue = expression <|> statements
let plet =
pipe2
(keyword "let" >>. ws1 >>. identifier)
(ws >>. str "=" >>. ws >>. pLetValue)
(fun id gtt exp -> Let(id, gtt, exp))
Note that I haven't tested this, as I don't have much time today. It's possible that instead of expression above, you'd want attempt expression (or pexpr, which is the same thing). Experiment a little and see what happens; and if you're completely lost as you try to figure out how FParsec is handling a given expression, remember the advice given in http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html.

Get the last error message thrown for an instruction

I noticed that the error messages sent by FParsec were quite "ambiguous", except for the last message sent for an instruction.
Here is an example:
Code to parse:
if (2 + 2 == 4)
Here, normally, there should be an instruction block (so in brackets).
And what I get:
Failure: Error in Ln: 1 Col: 1 if (2 + 2 == 4) ^ Expecting: [some instructions]
The parser backtracked after: Error in Ln: 1 Col: 3 if (2 + 2 ==
4)
^ Bad identifier: 'if' is a reserved keyword
The parser backtracked after: Error in Ln: 1 Col: 16 if (2 + 2 ==
4)
^ Note: The error occurred at the end of the input stream. Expecting: start block
As you can see, only the last error message is relevant. So I would like to know if there is not a way to display only this one, and therefore the last one without going through the others.
I guess it's not easy, since it's a feature of FParsec, but you never know...
I don't think I need to post F# code, since it's usually in the use of the library.
Edit
Here is the code of my analyzer to parse the example above:
type Statement =
| If of Expr * Block option
// And others...
and Block = Block of Statement list
let ws = pspaces >>. many pspaces |>> (fun _ -> ())
let str_ws s = pstring s .>> ws
let pexpr, pexprimpl = createParserForwardedToRef ()
// With their implementations (addition, subtraction, ...)
let pstatement, pstatementimpl = createParserForwardedToRef ()
// With their implementations, like "pif" below
let psinglestatement = pstatement |>> fun statement -> [statement]
let pstatementblock =
psinglestatement <|>
between (ws >>. str_ws "{") (ws >>. str_ws "}") (many pstatement)
let pif =
pipe2
(str_ws "if" >>. pexpr)
(pstatementblock)
(fun cnd block -> If(cnd, Some (Block(block))))
pstatementimpl :=
attempt (pif) <|>
// And others...
Edit II:
Here is the code of identifier analyse:
let reserved = [
"if"; "else" // And other...
]
let pidentifierraw =
let inline isIdentifierFirstChar c = isLetter c
let inline isIdentifierChar c = isLetter c || isDigit c
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"
let pidentifier =
pidentifierraw
>>= fun s ->
if reserved |> List.exists ((=) s) then fail ("Bad identifier: '" + s + "' is a reserved keyword")
else preturn s
type Literal =
| Identifier of string
// And other...
let pid = pidentifier |>> Literal.Identifier
pexpr is a set of values, including identifiers, literals, and their operations:
let pexpr, pexprimpl = createParserForwardedToRef ()
type Assoc = Associativity
let opp = OperatorPrecedenceParser<Expr, unit, unit> ()
pexprimpl := opp.ExpressionParser <?> "expression"
let term = pvalue .>> ws <|> between (str_ws "(") (str_ws ")") pexpr
opp.TermParser <- term
let inops = [ "+"; "-"; "*"; "/"; "=="; "!="; "<="; ">="; "<"; ">" ]
for op in inops do opp.AddOperator(InfixOperator(op, ws, 1, Assoc.Left, fun x y -> InfixOp(x, op, y)))
pvalue defines literals, including identifiers with pidentifier. I don't think I need to put their definitions, since they all follow this pattern (for example):
let ptrue = str_ws "true" |>> fun _ -> Bool(true)
let pfalse = str_ws "false" |>> fun _ -> Bool(false)
let pbool = ptrue <|> pfalse

With FParsec, how does one use the manyCharsTill and between parsers and not fail on the closing string?

I'm trying to use FParsec to parse a TOML multi-line string, and I'm having trouble with the closing delimiter ("""). I have the following parsers:
let controlChars =
['\u0000'; '\u0001'; '\u0002'; '\u0003'; '\u0004'; '\u0005'; '\u0006'; '\u0007';
'\u0008'; '\u0009'; '\u000a'; '\u000b'; '\u000c'; '\u000d'; '\u000e'; '\u000f';
'\u0010'; '\u0011'; '\u0012'; '\u0013'; '\u0014'; '\u0015'; '\u0016'; '\u0017';
'\u0018'; '\u0019'; '\u001a'; '\u001b'; '\u001c'; '\u001d'; '\u001e'; '\u001f';
'\u007f']
let nonSpaceCtrlChars =
Set.difference (Set.ofList controlChars) (Set.ofList ['\n';'\r';'\t'])
let multiLineStringContents : Parser<char,unit> =
satisfy (isNoneOf nonSpaceCtrlChars)
let multiLineString : Parser<string,unit> =
optional newline >>. manyCharsTill multiLineStringContents (pstring "\"\"\"")
|> between (pstring "\"\"\"") (pstring "\"\"\"")
let test parser str =
match run parser str with
| Success (s1, s2, s3) -> printfn "Ok: %A %A %A" s1 s2 s3
| Failure (f1, f2, f3) -> printfn "Fail: %A %A %A" f1 f2 f3
When I test multiLineString against an input like this:
test multiLineString "\"\"\"x\"\"\""
The parser fails with this error:
Fail: "Error in Ln: 1 Col: 8 """x"""
^ Note: The error occurred at the end of the input stream. Expecting: '"""'
I'm confused by this. Wouldn't the manyCharsTill multiLineStringContents (pstring "\"\"\"") parser stop at the """ for the between parser to find it? Why is the parser eating all the input and then failing the between parser?
This seems like a relevant post: How to parse comments with FParsec
But I don't see how the solution to that one differs from what I'm doing here, really.

The manyCharsTill documentation says (emphasis mine):
manyCharsTill cp endp parses chars with the char parser cp until the parser endp succeeds. It stops after endp and returns the parsed chars as a string.
So you don't want to use between in combination with manyCharsTill; you want to do something like pstring "\"\"\"" >>. manyCharsTill (pstring "\"\"\"").
But as it happens, I can save you a lot of work. I've been working on a TOML parser with FParsec myself in my spare time. It's far from complete, but the string part works and handles backslash escapes correctly (as far as I can tell: I've tested thoroughly but not exhaustively). The only thing I'm missing is the "strip first newline if it appears right after the opening delimiter" rule, which you've handled with optional newline. So just add that bit into my code below and you should have a working TOML string parser.
BTW, I am planning to license my code (if I finish it) under the MIT license. So I hereby release the following code block under the MIT license. Feel free to use it in your project if it's useful to you.
let pShortCodepointInHex = // Anything from 0000 to FFFF, *except* the range D800-DFFF
(anyOf "dD" >>. (anyOf "01234567" <?> "a Unicode scalar value (range D800-DFFF not allowed)") .>>. exactly 2 isHex |>> fun (c,s) -> sprintf "d%c%s" c s)
<|> (exactly 4 isHex <?> "a Unicode scalar value")
let pLongCodepointInHex = // Anything from 00000000 to 0010FFFF, *except* the range D800-DFFF
(pstring "0000" >>. pShortCodepointInHex)
<|> (pstring "000" >>. exactly 5 isHex)
<|> (pstring "0010" >>. exactly 4 isHex |>> fun s -> "0010" + s)
<?> "a Unicode scalar value (i.e., in range 00000000 to 0010FFFF)"
let toCharOrSurrogatePair p =
p |> withSkippedString (fun codePoint _ -> System.Int32.Parse(codePoint, System.Globalization.NumberStyles.HexNumber) |> System.Char.ConvertFromUtf32)
let pStandardBackslashEscape =
anyOf "\\\"bfnrt"
|>> function
| 'b' -> "\b" // U+0008 BACKSPACE
| 'f' -> "\u000c" // U+000C FORM FEED
| 'n' -> "\n" // U+000A LINE FEED
| 'r' -> "\r" // U+000D CARRIAGE RETURN
| 't' -> "\t" // U+0009 CHARACTER TABULATION a.k.a. Tab or Horizonal Tab
| c -> string c
let pUnicodeEscape = (pchar 'u' >>. (pShortCodepointInHex |> toCharOrSurrogatePair))
<|> (pchar 'U' >>. ( pLongCodepointInHex |> toCharOrSurrogatePair))
let pEscapedChar = pstring "\\" >>. (pStandardBackslashEscape <|> pUnicodeEscape)
let quote = pchar '"'
let isBasicStrChar c = c <> '\\' && c <> '"' && c > '\u001f' && c <> '\u007f'
let pBasicStrChars = manySatisfy isBasicStrChar
let pBasicStr = stringsSepBy pBasicStrChars pEscapedChar |> between quote quote
let pEscapedNewline = skipChar '\\' .>> skipNewline .>> spaces
let isMultilineStrChar c = c = '\n' || isBasicStrChar c
let pMultilineStrChars = manySatisfy isMultilineStrChar
let pTripleQuote = pstring "\"\"\""
let pMultilineStr = stringsSepBy pMultilineStrChars (pEscapedChar <|> (notFollowedByString "\"\"\"" >>. pstring "\"")) |> between pTripleQuote pTripleQuote

#rmunn provided a correct answer, thanks! I also solved this in a slightly different way after playing with the FParsec API a bit more. As explained in the other answer, The endp argument to manyCharTill was eating the closing """, so I needed to switch to something that wouldn't do that. A simple modification using lookAhead did the trick:
let multiLineString : Parser<string,unit> =
optional newline >>. manyCharsTill multiLineStringContents (lookAhead (pstring "\"\"\""))
|> between (pstring "\"\"\"") (pstring "\"\"\"")

How to parse prefix function such as Pow() with multiple parameters using FParsec

I tried to parse a prefix function such as Pow(3+2,2) using FParsec. I read the calculator tutorial in the example files as follows. The examples are all unary prefix function. I wonder how can I achieve prefix functions with more than one inputs using FParsec.OperatorPrecedenceParser.
http://www.quanttec.com/fparsec/reference/operatorprecedenceparser.html#members.PrefixOperator
let number = pfloat .>> ws
let opp = new OperatorPrecedenceParser<float,unit,unit>()
let expr = opp.ExpressionParser
opp.TermParser <- number <|> between (str_ws "(") (str_ws ")") expr
opp.AddOperator(InfixOperator("+", ws, 1, Associativity.Left, (+)))
opp.AddOperator(InfixOperator("-", ws, 1, Associativity.Left, (-)))
opp.AddOperator(InfixOperator("*", ws, 2, Associativity.Left, (*)))
opp.AddOperator(InfixOperator("/", ws, 2, Associativity.Left, (/)))
opp.AddOperator(InfixOperator("^", ws, 3, Associativity.Right, fun x y -> System.Math.Pow(x, y)))
opp.AddOperator(PrefixOperator("-", ws, 4, true, fun x -> -x))
let ws1 = nextCharSatisfiesNot isLetter >>. ws
opp.AddOperator(PrefixOperator("log", ws1, 4, true, System.Math.Log))
opp.AddOperator(PrefixOperator("exp", ws1, 4, true, System.Math.Exp))
Update 1
I've written a quick script following after-string parser example as I need after-string parser for the actual application
http://www.quanttec.com/fparsec/users-guide/tips-and-tricks.html#parsing-f-infix-operators
abs(pow(1,2)) can be parsed but pow(abs(1),2) cannot be done. I'm puzzled about how to use prefix function as part of the input for identWithArgs.
#I #"..\packages\FParsec.1.0.2\lib\net40-client"
#r "FParsecCS.dll"
#r "FParsec.dll"
open FParsec
type PrefixFunc = POW
type Expr =
| InfixOpExpr of string * Expr * Expr
| PrefixOpExpr of string * Expr
| PrefixFuncExpr of PrefixFunc * Expr list
| Number of int
let ws = spaces
let ws1 = spaces1
let str s = pstring s
let str_ws s = ws >>. str s .>> ws
let strci s = pstringCI s
let strci_ws s = ws >>. strci s .>> ws
let strciret_ws s x = ws >>. strci s .>> ws >>% x
let isSymbolicOperatorChar = isAnyOf "!%&*+-./<=>#^|~?"
let remainingOpChars_ws = manySatisfy isSymbolicOperatorChar .>> ws
let primitive = pint32 .>> ws |>> Number
let argList = sepBy primitive (str_ws ",")
let argListInParens = between (str_ws "(") (str_ws ")") argList
let prefixFunc = strciret_ws "pow" POW
let identWithArgs =
pipe2 prefixFunc argListInParens (fun funcId args -> PrefixFuncExpr(funcId, args))
let opp = new OperatorPrecedenceParser<Expr, string, unit>()
opp.TermParser <-
primitive <|>
identWithArgs <|>
between (pstring "(") (pstring ")") opp.ExpressionParser
// a helper function for adding infix operators to opp
let addSymbolicInfixOperators prefix precedence associativity =
let op = InfixOperator(prefix, remainingOpChars_ws,
precedence, associativity, (),
fun remOpChars expr1 expr2 ->
InfixOpExpr(prefix + remOpChars, expr1, expr2))
opp.AddOperator(op)
// the operator definitions:
addSymbolicInfixOperators "*" 10 Associativity.Left
addSymbolicInfixOperators "**" 20 Associativity.Right
opp.AddOperator(PrefixOperator("abs",remainingOpChars_ws,3,true,(),fun remOpChars expr -> PrefixOpExpr("abs", expr)))
opp.AddOperator(PrefixOperator("log",remainingOpChars_ws,3,true,(),fun remOpChars expr -> PrefixOpExpr("log", expr)))
run opp.ExpressionParser "abs(pow(1,2))"
run opp.ExpressionParser "pow(abs(1),2)"

I started to review the problem after one year and finally realized the problem.
I've changed the following code
let argList = sepBy primitive (str_ws ",")
to the following
let opp = new OperatorPrecedenceParser<Expr, string, unit>()
let argList = sepBy opp.ExpressionParser (str_ws ",")
I bring OperatorPrecedenceParser to the beginning of the code. And then I achieve recursively calling opp.ExpressionParser by putting it directly into argList.
I just realized that OperatorPrecedenceParser is very similar to createParserForwardedToRef. It creates a parser first without writing down implementation until later. FParsec has to achieve recursiveness in this way. Similar to its JSON sample parser.
After this change, both abs(pow(1,2)) and pow(abs(1),2) can be parsed. Hope this helps others who ever got this problem.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

fparsec parsing alternatives with discriminated unions for DSL - f#

Related

Parsing an ML-like syntax based on indentation, and everything considered to be an instruction/expression

Indentation, expressions, statements and StackOverflowException with FParsec - Errors

Get the last error message thrown for an instruction

With FParsec, how does one use the manyCharsTill and between parsers and not fail on the closing string?

How to parse prefix function such as Pow() with multiple parameters using FParsec

Categories

Resources