I'm coding my first parser. It's in F# and I'm using with FParsec.
My parser parses things like true and false, (true and false or true), true, (((true and false or true))) etc, which is correct.
But it doesn't parses when it's like (true and false) or true. It fails when there are parentheses in the middle of the text.
How can I solve it?
Sample code:
let private infixOperator (opp: OperatorPrecedenceParser<_,_,_>) op prec map =
opp.AddOperator(InfixOperator (op, ws, prec, Associativity.Left, map))
let private oppLogic = new OperatorPrecedenceParser<_,_,_>()
infixOperator oppLogic "is" 1 (fun x y -> Comparison (x, Equal, y))
infixOperator oppLogic "isnt" 1 (fun x y -> Comparison (x, NotEqual, y))
infixOperator oppLogic "or" 2 (fun x y -> Logic (x, Or, y))
infixOperator oppLogic "and" 3 (fun x y -> Logic (x, And, y))
let private exprParserLogic = oppLogic.ExpressionParser
let private betweenParentheses p =
between (str "(") (str ")") p
oppLogic.TermParser <- choice [
betweenParentheses exprParserLogic
pboolean
]
let pexpression =
choice [
attempt <| betweenParentheses exprParserLogic
exprParserLogic
]
let private pline =
ws
>>. pexpression
.>> eof
What happens for an input like "(true and false) or true" is that pline applies, which pexpression tries to apply betweenParentheses exprParserLogic. This succeeds and parses "(true and false)". So since the parse was successful, it never tries the second option exprParserLogic and simply returns to pline. pline then applies eof, which fails because "or true" is still left in the input.
Since betweenParentheses exprParserLogic is already part of the operator parser's term parser, there's no reason for you to try to parse it in its own rule. You can just have pline invoke exprParserLogic and remove pexpression altogether (or define let pexpression = oppLogic.ExpressionParser and remove exprParserLogic). This will correctly parse "(true and false) or true".
Related
I usually use FParsec for LL grammars, but sometimes it happens that in a whole grammar only one element requires left recursive parsing (so the grammar is no longer LL). Currently I have such a situation, I have a large LL grammar implemented with FParsec, but a small grammar element is bothering me because it obviously cannot be parsed correctly.
The syntax element in question is an access to an array index à la F#, e.g. myArray.[index] where myArray can be any expression and index can be any expression too. It turns out that my function calls use square brackets, not parentheses, and my identifiers can be qualified with dots.
An example of correct syntax for an expression is: std.fold[fn, f[myArray.[0]], std.tail[myArray]].
The .[] syntax element is obviously left recursive, but perhaps there is a trick that allows me to parse it anyway? My minimal code is as follows:
open FParsec
type Name = string list
type Expr =
(* foo, Example.Bar.fizz *)
| Variable of Name
(* 9, 17, -1 *)
| Integer of int
(* foo[3, 2], Std.sqrt[2] *)
| FunCall of Name * Expr list
(* (a + b), (a + (1 - c)) *)
| Parens of Expr
(* myArray.[0], table.[index - 1] *)
| ArrayAccess of Expr * Expr
(* a + b *)
| Addition of Expr * Expr
let opp =
new OperatorPrecedenceParser<Expr, _, _>()
let pExpr = opp.ExpressionParser
let pName =
let id =
identifier (IdentifierOptions(isAsciiIdStart = isAsciiLetter, isAsciiIdContinue = isAsciiLetter))
sepBy1 id (skipChar '.')
let pVariable = pName |>> Variable
let pInt = pint32 |>> Integer
let pFunCall =
pipe4
pName
(spaces >>. skipChar '[')
(sepBy (spaces >>. pExpr) (skipChar ','))
(spaces >>. skipChar ']')
(fun name _ args _ -> FunCall(name, args))
let pArrayAccess =
pipe5
pExpr
(spaces >>. skipChar '.')
(spaces >>. skipChar '[')
(spaces >>. pExpr)
(spaces >>. skipChar ']')
(fun expr _ _ index _ -> ArrayAccess(expr, index))
let pParens =
between (skipChar '(') (skipChar ')') (spaces >>. pExpr)
opp.TermParser <-
choice [ attempt pFunCall
pVariable
pArrayAccess
pInt
pParens ]
.>> spaces
let addInfixOperator str prec assoc mapping =
opp.AddOperator
<| InfixOperator(str, spaces, prec, assoc, (), (fun _ leftTerm rightTerm -> mapping leftTerm rightTerm))
addInfixOperator "+" 6 Associativity.Left (fun a b -> Addition(a, b))
let startParser = runParserOnString (pExpr .>> eof) () ""
printfn "%A" <| startParser "std.fold[fn, f[myArray.[0]], std.tail[myArray]]"
One way to do this is as follows: instead of making a list of parsing choices that also lists pArrayAccess like above, which will at some point cause an infinite loop, one can modify pExpr to parse the grammar element in question as an optional element following an expression:
let pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") opp.ExpressionParser
match! opt pArrayAccess with
| None -> return exp
| Some index -> return ArrayAccess(exp, index)
}
After testing, it turns out that this works very well if the following two conditions are not met:
The contents of the square brackets must not contain access to another array ;
An array cannot be accessed a second time in succession (my2DArray.[x].[y]).
This restricts usage somewhat. How can I get away with this? Is there a way to do this or do I have to change the grammar?
Finally, a solution to this problem is quite simple: just expect a list of array access. If the list is empty, then return the initial expression, otherwise fold over all the array accesses and return the result. Here is the implementation:
let rec pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") pExpr
match! many pArrayAccess with
| [] -> return exp
| xs -> return List.fold
(fun acc curr -> ArrayAccess(acc, curr)) exp xs
}
This way of doing things meets my needs, so I'd be happy with it, if anyone passes by and wants something more general and not applicable with the proposed solution, then I refer to #Martin Freedman comment, using createParserForwardedToRef().
I try to parse the call of a function, here are the variants:
add 8 2
add x y
add (inc x) (dec y)
funcWithoutArgs
Depending on how I distribute my analyzers in the code, and perhaps also how they are coded, I get errors, as well as successful but unwanted analyses.
For example, this:
add 4 7
returns the following AST:
[Call ("foo",[Number 4]);
Number 7]
He therefore only takes the first parameter.
When I do that:
foo x y
He sends me back this AST:
[Call ("foo",[Call ("x",[Call ("y",[])])])]
And that's not what I want, since here, each parameter calls the next one as a parameter.
Another example, when I do this:
foo x y
inc x
I get:
[Call ("foo",[Call ("x",[Call ("y",[Call ("inc",[Call ("x",[])])])])])]
It does the same as above, but also calls the code that follows the line. When I ask my analyzer for a new line (see code), it sends me this:
[Call ("foo",[]); Call ("x",[]); Call ("y",[]); Call ("inc",[]); Call ("x",[])]
Even in brackets it doesn't work:
foo (x) (y)
Give:
[Call ("foo",[]); Call ("x",[]); Call ("y",[])]
And:
add (inc x) (dec y)
Give:
Error in Ln: 1 Col: 1
Note: The error occurred on an empty line.
The parser backtracked after:
Error in Ln: 2 Col: 5
add (inc x) (dec y)
^
Expecting: end of input or integer number (32-bit, signed)
The parser backtracked after:
Error in Ln: 2 Col: 10
add (inc x) (dec y)
^
Expecting: ')'
[]
In short, my function call analyzer does not work properly. Every time I change something, like a new line, an attempt, or a different hierarchy, something doesn't work...
Do you have any idea how to solve this very annoying problem?
Here is the minimum functional code that was used:
open FParsec
// Ast
type Expression =
| Number of int
| Call of string * Expression list
type Program = Expression list
// Tools
let private bws p =
spaces >>? p .>>? spaces
let private suiteOf p =
sepEndBy p spaces1
let inline private betweenParentheses p label =
between (pstring "(") (pstring ")") p
<?> (label + " between parentheses")
let private identifier =
many1Satisfy2 isLetter (fun c -> isLetter c)
// Expressions
let rec private call = parse {
let! call = pipe2 (spaces >>? identifier) (spaces >>? parameters)
(fun id parameters -> Call(id, parameters)) // .>>? newline
return call
}
and private parameters = suiteOf expression
and private callFuncWithoutArgs =
identifier |>> fun id -> Call(id, [])
and private number = pint32 |>> Number
and private betweenParenthesesExpression =
parse { let! ex = betweenParentheses expression "expression"
return ex }
and private expression =
bws (attempt betweenParenthesesExpression <|>
attempt number <|>
attempt call <|>
callFuncWithoutArgs)
// -------------------------------
let parse code =
let parser = many expression .>>? eof
match run parser code with
| Success(result, _, _) -> result
| Failure(msg, _, _) ->
printfn "%s" msg
[]
System.Console.Clear()
parse #"
add 4 7
foo x y
inc x
foo (x) (y)
add (inc x) (dec y)
" |> printfn "%A"
Your main problem is that you have the wrong high-level design for your parser.
Your current design is that an expression can be:
An expression (a "sub-expression", so to speak) between parentheses (no problem here)
A number (no problem here)
A call with parameters, which is an identifier followed by a space-separated list of expressions (this is the main part of the problem)
A call without parameters, which is a single identifier (this contributes to the problem)
Looking at the expression foo x y, let's apply those rules in order as a parser would. There are no parentheses and foo isn't a number, so it's either 3 or 4. First we try 3. foo is followed by x y: does x y parse as an expression? Why, yes, it does: it parses as a call with parameters, where x is the function and y is the parameter. Since x y matches 3, it parses according to rule 3 without checking rule 4, and so foo x y matches like foo (x y) would: a call to foo with a single parameter, which is a call to x with parameter y.
How to fix this? Well, you could try swapping the order of 3 and 4, so that a function call without parameters is checked before a call with parameters (which would make x y parse as just x. But that would fail, because foo x y would match as just foo. So putting rule 4 before rule 3 doesn't work here.
The real solution is to split the rules for an expression apart into two levels. The "inner" level, which I'll call a "value", could be:
An expression between parentheses
A number
A function call without parameters
And the "outer" level, the parse rules for expressions, would be:
A function call with parameters, all of which are values, not expressions
A value
Note that these parsing levels are mutually recursive, so you'll need to use createParserForwardedToRef in your implementation. Let's look at how foo x y would be parsed with this design:
First, foo parses as an identifier, so check if it could be a function call with parameters. Does x parse as a value? Yes, under rule 3 of values. And does y parse as a value? Yes, under rule 3 of values. So foo x y parses as a function call.
Now what about funcWithoutParameters? It would fail rule 1 of expressions because it's not followed by a list of parameters. So it would be checked for rule 2 of expressions, and then it would match under rule 3 of values.
Okay, a basic sanity check of the pseudocode works, so let's turn this into code. But first, I'll mention the other problem in your parser which I haven't mentioned yet, which is that you don't realize that the FParsec spaces parser also matches newlines. So when you wrap your expression parser in bws ("between whitespace"), it will also consume newlines after the text it parses. So when you're parsing something like:
foo a b
inc c
The suiteOf expression sees the list a b inc c and turns all of those into parameters for foo. In my code below I've distinguished between FParsec's spaces parser (which includes newlines) and a parser that parses only horizontal whitespace (space and tab but not newline), using each in the appropriate place. The following code implements the design I mentioned in this answer and its output looks right to me for all the test expressions you wrote:
open FParsec
// Ast
type Expression =
| Number of int
| Call of string * Expression list
type Program = Expression list
// Tools
let private justSpaces = skipMany (pchar ' ' <|> pchar '\t')
let private justSpaces1 = skipMany1 (pchar ' ' <|> pchar '\t')
let private bws p =
spaces >>? p .>>? spaces
let private suiteOf p =
sepEndBy1 p (justSpaces1)
let inline private betweenParentheses p label =
between (pstring "(") (pstring ")") p
<?> (label + " between parentheses")
let private identifier =
many1Satisfy2 isLetter (fun c -> isLetter c)
// Expressions
let private expression, expressionImpl = createParserForwardedToRef()
let private betweenParenthesesExpression =
parse { let! ex = betweenParentheses expression "expression"
return ex }
let private callFuncWithoutArgs =
(identifier |>> fun id -> Call(id, []))
let private number = pint32 |>> Number
let private value =
justSpaces >>? (attempt betweenParenthesesExpression <|>
attempt number <|>
callFuncWithoutArgs)
let private parameters = suiteOf value
let rec private callImpl = parse {
let! call = pipe2 (justSpaces >>? identifier) (justSpaces >>? parameters)
(fun id parameters -> Call(id, parameters))
return call }
let call = callImpl
expressionImpl.Value <-
bws (attempt call <|>
value)
// -------------------------------
let parse code =
let parser = many expression .>>? (spaces >>. eof)
match run parser code with
| Success(result, _, _) -> result
| Failure(msg, _, _) ->
printfn "%s" msg
[]
System.Console.Clear()
parse #"
add 4 7
foo x y
inc x
foo (x) (y)
add (inc x) (dec y)
" |> printfn "%A"
P.S. I used the following operator suggested by http://www.quanttec.com/fparsec/users-guide/debugging-a-parser.html to greatly help me in tracing the problem:
let (<!>) (p: Parser<_,_>) label : Parser<_,_> =
fun stream ->
printfn "%A: Entering %s" stream.Position label
let reply = p stream
printfn "%A: Leaving %s (%A)" stream.Position label reply.Status
reply
Usage: turn let parseFoo = ... into let parseFoo = ... <!> "foo". Then you'll get a stream of debugging output in your console that looks something like this:
(Ln: 2, Col: 20): Entering expression
(Ln: 3, Col: 1): Entering call
(Ln: 3, Col: 5): Entering parameters
(Ln: 3, Col: 5): Entering bwParens
(Ln: 3, Col: 5): Leaving bwParens (Error)
(Ln: 3, Col: 5): Entering number
(Ln: 3, Col: 6): Leaving number (Ok)
(Ln: 3, Col: 7): Entering bwParens
(Ln: 3, Col: 7): Leaving bwParens (Error)
(Ln: 3, Col: 7): Entering number
(Ln: 3, Col: 8): Leaving number (Ok)
(Ln: 3, Col: 8): Leaving parameters (Ok)
(Ln: 3, Col: 8): Leaving call (Ok)
(Ln: 3, Col: 8): Leaving expression (Ok)
That helps greatly when you're trying to figure out why your parser isn't doing what you expect.
Noob alert!
Ok, I'm trying to build a simple math expression parser in fparsec. Right now all I want it to do is handle strings like this "1+2-3*4/5" and return a double as the result of the evaluation. No spaces, newlines, or parens, and left to right order of operations is fine.
Here's what I have so far:
let number = many1 digit |>> fun ds -> int <| String.Concat(ds)
let op : Parser<int -> int -> int, unit> =
charReturn '+' (+) <|>
charReturn '-' (-) <|>
charReturn '*' (*) <|>
charReturn '/' (/)
let expression, expressionImpl = createParserForwardedToRef()
do expressionImpl :=
choice[
attempt(number .>> op >>. expression);
number]
let test p str =
match run (p .>> eof) str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(result, _, _) -> printfn "Failure: %A" result
[<EntryPoint>]
let main argv =
test expression "1+1/2*3-4"
Console.Read() |> ignore
0
In the first choice of the expression parser, I'm not sure how to apply the function returned by the op parser.
As usual, I find the answer right after posting the question (after 3 hours of searching).
I just changed this line:
attempt(number .>> op >>. expression);
to this:
attempt(pipe3 number op expression (fun x y z -> y x z));
However, I just realized that my expressions parse backwards. Back to the drawing board.
Within a simple query language I'd like to recognize date and time literals, preferably without using delimiters. For example,
CreationDate = 2013-05-13 5:30 PM
I could use a combinator to detect the basic syntax (e.g., yyyy-MM-dd hh:mm tt), but then it needs to be passed to DateTime.TryParse for full validation.
A few questions:
Is there a combinator for "post processing" a parser result, e.g., pstring "1/2/2000" |> (fun s -> try OK(DateTime.Parse s) with _ -> Fail("not a date"))
Is it possible to apply a predicate to a string (as satisfy does to char)?
Is there a better approach for parsing date/time?
UPDATE
Using Guvante's and Stephan's examples, I came up with this:
let dateTimeLiteral =
let date sep = pipe5 pint32 sep pint32 sep pint32 (fun a _ b _ c -> a, b, c)
let time =
(pint32 .>>. (skipChar ':' >>. pint32)) .>>.
(opt (stringCIReturn " am" false <|> stringCIReturn " pm" true))
(date (pstring "/") <|> date (pstring "-")) .>>.
(opt (skipChar ' ' >>. time)) .>> ws
>>=? (fun ((a, b, c), tt) ->
let y, m, d = if a > 12 then a, b, c else c, a, b
let h, n =
match tt with
| Some((h, n), tt) ->
match tt with
| Some true -> (match h with 12 -> h | _ -> h + 12), n
| Some false -> (match h with 12 -> h - 12 | _ -> h), n
| None -> h, n
| None -> 0, 0
try preturn (System.DateTime(y, m, d, h, n, 0)) |>> DateTime
with _ -> fail "Invalid date/time format")
You can easily build a custom combinator or parser that validates parsed input.
If you only want to use combinators ("Haskell-style"), you could use
let pDateString = pstring "1/2/2000"
let pDate1 =
pDateString
>>= fun str ->
try preturn (System.DateTime.Parse(str))
with _ -> fail "Date format error"
as Guvante just proposed.
If you want to avoid construction temporary parsers (see preturn ... and pfail ... above), you can just let the function accept a second parameter and directly return Reply values:
let pDate2 =
pDateString
>>= fun str stream ->
try Reply(System.DateTime.Parse(str))
with _ -> Reply(Error, messageError "Date format error")
If you want the error location to be at the beginning of the malformed date string, you could replace >>= with >>=?. Note that this also has consequences for error recovery.
If you want to have full control, you can write the parser only using the lower level API, starting with a basic version like the following:
let pDate3 =
fun stream ->
let reply = pDateString stream
if reply.Status = Ok then
try Reply(System.DateTime.Parse(reply.Result))
with _ -> Reply(Error, messageError "Date format error")
else
Reply(reply.Status, reply.Error)
This last version would also allow you to replace the pDateString parser with code that directly accesses the CharStream interface, which could give you some additional flexibility or performance.
Is there a combinator for "post processing" a parser result
It depends on what you want to do if you fail. You can always do |>> to get your DateTime out. Failing it is equally interesting, I think your example could be (given a parser sp that gets the correct string, note it would be of type Parser<string,'u>)
sp >>= (fun s -> match DateTime.TryParse s with
| true,result -> preturn result
| false,_ -> fail)
Here I am taking in the resultant string and calling the TryParse method, and returning either a preturn or a fail depending on whether it succeeds. I couldn't find any of the methods that worked exactly like that.
Note that >>=? would cause a backtrack if it failed.
Is it possible to apply a predicate to a string (as satisfy does for char)?
You would have to call the predicate for every character (2, 20, 201) which is usually not ideal. I am pretty sure you could whip up something like this if you wanted, but I don't think it is ideal for that reason, not to mention handling partial matches becomes harder.
Is there a better approach for parsing date/time?
The biggest factor is "What do you know about the date/time?" If you know it is in that syntax exactly then you should be able to use a post process and be fine (since hopefully the error case will be rare)
If you aren't sure, for instance if PM is optional, but would be unambiguously detailed, then you will probably want to break up the definition and combine it at the end. Note that here I have relaxed the character counts a little, you could add some opt to get even more relaxed, or replace the pint32 with digit and manual conversions.
let pipe6 = //Implementation left as an exercise
let dash = skipChar '-'
let space = skipChar ' '
let colon = skipChar ':'
pipe6 (pint32 .>> dash) //Year
(pint32 .>> dash) //Month
(pint32 .>> space) //Day
(pint32 .>> colon) //Hour
(pint32 .>> space) //Minute
(anyString) //AM/PM
(fun year month day hour minute amPm ->
DateTime(year, month, day,
hour + (if amPm.Equals("PM", StringComparison.InvariantCultureIgnoreCase)
then 12 else 0),
minute, 0)) //No seconds
Writing all that out I am not sure if you are better or worse off...
I've used next code to parse given date string into DataTime object.
2000-01-01 12:34:56,789
let pipe7 p1 p2 p3 p4 p5 p6 p7 f =
p1 >>= fun x1 ->
p2 >>= fun x2 ->
p3 >>= fun x3 ->
p4 >>= fun x4 ->
p5 >>= fun x5 ->
p6 >>= fun x6 ->
p7 >>= fun x7 -> preturn (f x1 x2 x3 x4 x5 x6 x7)
let int_ac = pint32 .>> anyChar
let pDateStr : Parser<DateTime, unit> = pipe7 int_ac int_ac int_ac int_ac int_ac int_ac int_ac (fun y m d h mi s mil -> new DateTime(y,m,d,h,mi,s,mil))
I'm certain there's a really simple answer to this, but I've been staring at this all day and I can't figure it out.
As per the tutorial, I'm implementing a JSON parser. To challenge myself, I'm implementing the number parser myself.
This is what I got so far:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
let ``int`` =
digit
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
The trouble is that digit is a Parser<char,_>, whereas the second option for int is a Parser<string,_>. Would I normally just use a combinator to turn digit into a Parser<char,_>, or is there something else I should do?
The |>> operator is what you're looking for. I quote the FParsec reference:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
The parser p
|>> f applies the parser p and returns the result of the function
application f x, where x is the result returned by p.
p |>> f is an
optimized implementation of p >>= fun x -> preturn (f x).
For example:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
(digit |>> string) (* The operator is used here *)
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
You may want to read FParsec tutorial on parsing JSON which uses this operator quite intensively.