Parsing date and time with FParsec - parsing

Within a simple query language I'd like to recognize date and time literals, preferably without using delimiters. For example,
CreationDate = 2013-05-13 5:30 PM
I could use a combinator to detect the basic syntax (e.g., yyyy-MM-dd hh:mm tt), but then it needs to be passed to DateTime.TryParse for full validation.
A few questions:
Is there a combinator for "post processing" a parser result, e.g., pstring "1/2/2000" |> (fun s -> try OK(DateTime.Parse s) with _ -> Fail("not a date"))
Is it possible to apply a predicate to a string (as satisfy does to char)?
Is there a better approach for parsing date/time?
UPDATE
Using Guvante's and Stephan's examples, I came up with this:
let dateTimeLiteral =
let date sep = pipe5 pint32 sep pint32 sep pint32 (fun a _ b _ c -> a, b, c)
let time =
(pint32 .>>. (skipChar ':' >>. pint32)) .>>.
(opt (stringCIReturn " am" false <|> stringCIReturn " pm" true))
(date (pstring "/") <|> date (pstring "-")) .>>.
(opt (skipChar ' ' >>. time)) .>> ws
>>=? (fun ((a, b, c), tt) ->
let y, m, d = if a > 12 then a, b, c else c, a, b
let h, n =
match tt with
| Some((h, n), tt) ->
match tt with
| Some true -> (match h with 12 -> h | _ -> h + 12), n
| Some false -> (match h with 12 -> h - 12 | _ -> h), n
| None -> h, n
| None -> 0, 0
try preturn (System.DateTime(y, m, d, h, n, 0)) |>> DateTime
with _ -> fail "Invalid date/time format")

You can easily build a custom combinator or parser that validates parsed input.
If you only want to use combinators ("Haskell-style"), you could use
let pDateString = pstring "1/2/2000"
let pDate1 =
pDateString
>>= fun str ->
try preturn (System.DateTime.Parse(str))
with _ -> fail "Date format error"
as Guvante just proposed.
If you want to avoid construction temporary parsers (see preturn ... and pfail ... above), you can just let the function accept a second parameter and directly return Reply values:
let pDate2 =
pDateString
>>= fun str stream ->
try Reply(System.DateTime.Parse(str))
with _ -> Reply(Error, messageError "Date format error")
If you want the error location to be at the beginning of the malformed date string, you could replace >>= with >>=?. Note that this also has consequences for error recovery.
If you want to have full control, you can write the parser only using the lower level API, starting with a basic version like the following:
let pDate3 =
fun stream ->
let reply = pDateString stream
if reply.Status = Ok then
try Reply(System.DateTime.Parse(reply.Result))
with _ -> Reply(Error, messageError "Date format error")
else
Reply(reply.Status, reply.Error)
This last version would also allow you to replace the pDateString parser with code that directly accesses the CharStream interface, which could give you some additional flexibility or performance.

Is there a combinator for "post processing" a parser result
It depends on what you want to do if you fail. You can always do |>> to get your DateTime out. Failing it is equally interesting, I think your example could be (given a parser sp that gets the correct string, note it would be of type Parser<string,'u>)
sp >>= (fun s -> match DateTime.TryParse s with
| true,result -> preturn result
| false,_ -> fail)
Here I am taking in the resultant string and calling the TryParse method, and returning either a preturn or a fail depending on whether it succeeds. I couldn't find any of the methods that worked exactly like that.
Note that >>=? would cause a backtrack if it failed.
Is it possible to apply a predicate to a string (as satisfy does for char)?
You would have to call the predicate for every character (2, 20, 201) which is usually not ideal. I am pretty sure you could whip up something like this if you wanted, but I don't think it is ideal for that reason, not to mention handling partial matches becomes harder.
Is there a better approach for parsing date/time?
The biggest factor is "What do you know about the date/time?" If you know it is in that syntax exactly then you should be able to use a post process and be fine (since hopefully the error case will be rare)
If you aren't sure, for instance if PM is optional, but would be unambiguously detailed, then you will probably want to break up the definition and combine it at the end. Note that here I have relaxed the character counts a little, you could add some opt to get even more relaxed, or replace the pint32 with digit and manual conversions.
let pipe6 = //Implementation left as an exercise
let dash = skipChar '-'
let space = skipChar ' '
let colon = skipChar ':'
pipe6 (pint32 .>> dash) //Year
(pint32 .>> dash) //Month
(pint32 .>> space) //Day
(pint32 .>> colon) //Hour
(pint32 .>> space) //Minute
(anyString) //AM/PM
(fun year month day hour minute amPm ->
DateTime(year, month, day,
hour + (if amPm.Equals("PM", StringComparison.InvariantCultureIgnoreCase)
then 12 else 0),
minute, 0)) //No seconds
Writing all that out I am not sure if you are better or worse off...

I've used next code to parse given date string into DataTime object.
2000-01-01 12:34:56,789
let pipe7 p1 p2 p3 p4 p5 p6 p7 f =
p1 >>= fun x1 ->
p2 >>= fun x2 ->
p3 >>= fun x3 ->
p4 >>= fun x4 ->
p5 >>= fun x5 ->
p6 >>= fun x6 ->
p7 >>= fun x7 -> preturn (f x1 x2 x3 x4 x5 x6 x7)
let int_ac = pint32 .>> anyChar
let pDateStr : Parser<DateTime, unit> = pipe7 int_ac int_ac int_ac int_ac int_ac int_ac int_ac (fun y m d h mi s mil -> new DateTime(y,m,d,h,mi,s,mil))

Related

How to parse a recursive left syntax rule with FParsec?

I usually use FParsec for LL grammars, but sometimes it happens that in a whole grammar only one element requires left recursive parsing (so the grammar is no longer LL). Currently I have such a situation, I have a large LL grammar implemented with FParsec, but a small grammar element is bothering me because it obviously cannot be parsed correctly.
The syntax element in question is an access to an array index à la F#, e.g. myArray.[index] where myArray can be any expression and index can be any expression too. It turns out that my function calls use square brackets, not parentheses, and my identifiers can be qualified with dots.
An example of correct syntax for an expression is: std.fold[fn, f[myArray.[0]], std.tail[myArray]].
The .[] syntax element is obviously left recursive, but perhaps there is a trick that allows me to parse it anyway? My minimal code is as follows:
open FParsec
type Name = string list
type Expr =
(* foo, Example.Bar.fizz *)
| Variable of Name
(* 9, 17, -1 *)
| Integer of int
(* foo[3, 2], Std.sqrt[2] *)
| FunCall of Name * Expr list
(* (a + b), (a + (1 - c)) *)
| Parens of Expr
(* myArray.[0], table.[index - 1] *)
| ArrayAccess of Expr * Expr
(* a + b *)
| Addition of Expr * Expr
let opp =
new OperatorPrecedenceParser<Expr, _, _>()
let pExpr = opp.ExpressionParser
let pName =
let id =
identifier (IdentifierOptions(isAsciiIdStart = isAsciiLetter, isAsciiIdContinue = isAsciiLetter))
sepBy1 id (skipChar '.')
let pVariable = pName |>> Variable
let pInt = pint32 |>> Integer
let pFunCall =
pipe4
pName
(spaces >>. skipChar '[')
(sepBy (spaces >>. pExpr) (skipChar ','))
(spaces >>. skipChar ']')
(fun name _ args _ -> FunCall(name, args))
let pArrayAccess =
pipe5
pExpr
(spaces >>. skipChar '.')
(spaces >>. skipChar '[')
(spaces >>. pExpr)
(spaces >>. skipChar ']')
(fun expr _ _ index _ -> ArrayAccess(expr, index))
let pParens =
between (skipChar '(') (skipChar ')') (spaces >>. pExpr)
opp.TermParser <-
choice [ attempt pFunCall
pVariable
pArrayAccess
pInt
pParens ]
.>> spaces
let addInfixOperator str prec assoc mapping =
opp.AddOperator
<| InfixOperator(str, spaces, prec, assoc, (), (fun _ leftTerm rightTerm -> mapping leftTerm rightTerm))
addInfixOperator "+" 6 Associativity.Left (fun a b -> Addition(a, b))
let startParser = runParserOnString (pExpr .>> eof) () ""
printfn "%A" <| startParser "std.fold[fn, f[myArray.[0]], std.tail[myArray]]"
One way to do this is as follows: instead of making a list of parsing choices that also lists pArrayAccess like above, which will at some point cause an infinite loop, one can modify pExpr to parse the grammar element in question as an optional element following an expression:
let pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") opp.ExpressionParser
match! opt pArrayAccess with
| None -> return exp
| Some index -> return ArrayAccess(exp, index)
}
After testing, it turns out that this works very well if the following two conditions are not met:
The contents of the square brackets must not contain access to another array ;
An array cannot be accessed a second time in succession (my2DArray.[x].[y]).
This restricts usage somewhat. How can I get away with this? Is there a way to do this or do I have to change the grammar?
Finally, a solution to this problem is quite simple: just expect a list of array access. If the list is empty, then return the initial expression, otherwise fold over all the array accesses and return the result. Here is the implementation:
let rec pExpr =
parse {
let! exp = opp.ExpressionParser
let pArrayAccess =
between (skipString ".[") (skipString "]") pExpr
match! many pArrayAccess with
| [] -> return exp
| xs -> return List.fold
(fun acc curr -> ArrayAccess(acc, curr)) exp xs
}
This way of doing things meets my needs, so I'd be happy with it, if anyone passes by and wants something more general and not applicable with the proposed solution, then I refer to #Martin Freedman comment, using createParserForwardedToRef().

Getting FParsec to reject unmatched start end tags?

Writing my own XML parser to learn FParsec I need to test that the XML start and end tags match or have the parser fail.
In the code fragment below ...
The parsers xStartTag and xKey return strings which I want to match.
The parser xContent_UntilCloseTag just returns whatever is between the tags.
ws skips whitespace and str(">") skips a '>'.
The PipeN.pipe10 function is an extension of the standard FParsec primitive pipe5 to feed the result of 10 parsers into a function.
All these parsers compile and work.
How can I get the following parser of type Parser<XELEMENT, USER_STATE> constructing the type XELEMENT_CONTENT to fail when the start and end tags do not match?
00 let xElement_Content : Parser<XELEMENT, USER_STATE> =
01 (PipeN.pipe10 ws xStartTag ws xContent_UntilCloseTag ws xKey ws (str ">") ws
02 (fun stream -> getUserState stream)
03 (fun x1 x2_startTag x3 x4_content x5 x6_closeTag x7 x8 x9 userState ->
04 if x2_startTag.head = x6_closeTag
05 then
06 (userState.Deeper(x2_startTag.head), x2_startTag, x4_content)
07 else Reply(FatalError, messageError ("in xElementContent: head tag (%s) does not match close tag (%s)", x2_startTag.head, x6_closeTag))
08 ) |>> C_XELEMENT_CONTENT
09 )
Line 07 throws the compiler error ...
FS0001: All branches of an 'if' expreession must return values of the same type as the first branch which here is 'USER_STATE * XHEADandATTRIBUTES_RECORD * string'. This branch returns a value of the type 'Reply<`a>'.
The code compiles fine if I comment out the if-then-else statement (lines 04, 05, and 07) leaving 06. Types agree in that case. I think I understand that.
But I need to throw an error and have the parser fail when the strings returned from xStartTag and xKey don't agree - How?
I don't think knowing the types is necessary for the answer; but just in case here are the various type definitions...
type USER_STATE =
{
tag : string
depth : int
}
type XELEMENT_CONTENT = USER_STATE * XHEADandATTRIBUTES_RECORD * string
type XELEMENT_EMPTY = XHEADandATTRIBUTES_RECORD
type XELEMENT =
| C_XELEMENT_CONTENT of XELEMENT_CONTENT
| C_XELEMENT_NESTED of USER_STATE * XHEADandATTRIBUTES_RECORD * (XELEMENT list)
| C_XELEMENT_EMPTY of XELEMENT_EMPTY
type XELEMENT_NESTED = XHEADandATTRIBUTES_RECORD * (XELEMENT list)
I have reviewed the FParsec documentation in detail (especially the User's Guide on 'Parsing with User State') but maybe I missed something?
I don't think you need to maintain user state to match XML tags. Here's a very simple parser that detects mismatches correctly (but doesn't handle nested tags):
open FParsec
let parseTagOpen =
pstring "<"
>>. manySatisfy (fun c -> c <> '>')
.>> pstring ">"
let parseTagClose =
pstring "</"
>>. manySatisfy (fun c -> c <> '>')
.>> pstring ">"
let parseInput =
parseTagOpen
.>>. (manySatisfy (fun c -> c <> '<'))
.>>. parseTagClose
>>= (fun ((tag1, content), tag2) -> // *** this is the important part ***
if tag1 = tag2 then preturn (tag1, content)
else failFatally "mismatch")
[<EntryPoint>]
let main argv =
[
"<moo>baa</moo>"
"<moo>baa</oink>"
] |> Seq.iter (fun input ->
let result = run parseInput input
printfn ""
printfn "%s" input
printfn "%A" result)
0

Applying a function returned from a subparser with fparsec

Noob alert!
Ok, I'm trying to build a simple math expression parser in fparsec. Right now all I want it to do is handle strings like this "1+2-3*4/5" and return a double as the result of the evaluation. No spaces, newlines, or parens, and left to right order of operations is fine.
Here's what I have so far:
let number = many1 digit |>> fun ds -> int <| String.Concat(ds)
let op : Parser<int -> int -> int, unit> =
charReturn '+' (+) <|>
charReturn '-' (-) <|>
charReturn '*' (*) <|>
charReturn '/' (/)
let expression, expressionImpl = createParserForwardedToRef()
do expressionImpl :=
choice[
attempt(number .>> op >>. expression);
number]
let test p str =
match run (p .>> eof) str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(result, _, _) -> printfn "Failure: %A" result
[<EntryPoint>]
let main argv =
test expression "1+1/2*3-4"
Console.Read() |> ignore
0
In the first choice of the expression parser, I'm not sure how to apply the function returned by the op parser.
As usual, I find the answer right after posting the question (after 3 hours of searching).
I just changed this line:
attempt(number .>> op >>. expression);
to this:
attempt(pipe3 number op expression (fun x y z -> y x z));
However, I just realized that my expressions parse backwards. Back to the drawing board.

Use FParsec to parse a self-describing input

I'm using FParsec to parse an input that describes its own format. For example, consider this input:
int,str,int:4,'hello',3
The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.
What is the best way to parse something like this with FParsec?
I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.
open FParsec
type State = { Formats : string[]; Index : int32 }
with static member Default = { Formats = [||]; Index = 0 }
type Value =
| Integer of int
| String of string
let parseFormat : Parser<_, State> =
parse {
let! formats =
sepBy
(pstring "int" <|> pstring "str")
(skipString ",")
|>> Array.ofList
do! updateUserState (fun state -> { state with Formats = formats })
}
let parseValue format =
match format with
| "int" -> pint32 |>> Integer
| "str" ->
between
(skipString "'")
(skipString "'")
(manySatisfy (fun c -> c <> '\''))
|>> String
| _ -> failwith "Unexpected"
let parseValueByState =
parse {
let! state = getUserState
let format = state.Formats.[state.Index]
do! setUserState { state with Index = state.Index + 1}
return! parseValue format
}
let parseData =
sepBy
parseValueByState
(skipString ",")
let parse =
parseFormat
>>. skipString ":"
>>. parseData
[<EntryPoint>]
let main argv =
let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
printfn "%A" result
0
There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.
First, several library functions that may appear useful in other FParsec-related projects:
/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)
/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>
/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)
/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
let rec fold state =
function
| [] -> pzero
| hd::[] -> state <<+ hd
| hd::tl -> fold (state <<+ (hd .>> separator)) tl
fold (preturn []) parsers
|>> List.rev // reverse the list since we appended to the top
Now, the main code. The basic idea is to run parsing in three steps:
Parse out the keys (which are plain ASCII strings)
Map these keys to actual Value parsers
Run these parsers in order
The rest seems to be commented within the code. :)
/// The resulting type
type Output =
| Integer of int
| String of string
/// tag to parser mappings
let mappings =
[
"int" => (pint32 |>> Integer)
"str" => (
manySatisfy (fun c -> c <> '\'')
|> between (skipChar ''') (skipChar ''')
|>> String
)
]
|> makeMap
let myProcess : Parser<Output list> =
let pKeys = // First, we parse out the keys
many1Satisfy isAsciiLower // Parse one key; keys are always ASCII strings
|> sepBy <| (skipChar ',') // many keys separated by comma
.>> (skipChar ':') // all this with trailing semicolon
let pValues = fun keys ->
keys // take the keys list
|> List.map // find the required Value parser
// (NO ERROR CHECK for bad keys)
(fun p -> Map.find p mappings)
|> allOfSepBy (skipChar ',') // they must run in order, comma-separated
pKeys >>= pValues
Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]
#bytebuster beat me to it but I still post my solution. The technique is similar to #bytebuster.
Thanks for an interesting question.
In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.
open FParsec
type Value =
| Integer of int
| String of string
type ValueParser = Parser<Value, unit>
let parseIntValue : Parser<Value, unit> =
pint32 |>> Integer
let parseStringValue : Parser<Value, unit> =
between
(skipChar '\'')
(skipChar '\'')
(manySatisfy (fun c -> c <> '\''))
<?> "string"
|>> String
let parseValueParser : Parser<ValueParser, unit> =
choice
[
skipString "int" >>% parseIntValue
skipString "str" >>% parseStringValue
]
let parseValueParsers : Parser<ValueParser list, unit> =
sepBy1
parseValueParser
(skipChar ',')
// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust ps =
match ps with
| [] -> preturn []
| h::t ->
adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
loop id ps
let parseLine : Parser<Value list, unit> =
parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof
[<EntryPoint>]
let main argv =
let s = "int,str,int:4,'hello',3"
let r = run parseLine s
printfn "%A" r
0
Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].
Parsing int,str,str:4,'hello',3 (incorrect) yields:
Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
^
Expecting: string
I rewrote #FuleSnabel's sepByList as follows to help me understand it better. Does this look right?
let sepByList (parsers : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust parsers =
parse {
match parsers with
| [] -> return []
| parser :: tail ->
let! value = adjust parser
let! values = loop (fun parser -> sep >>. parser) tail
return value :: values
}
loop id parsers

FParsec combinator to turn Parser<char,_> until Parser<string,_>?

I'm certain there's a really simple answer to this, but I've been staring at this all day and I can't figure it out.
As per the tutorial, I'm implementing a JSON parser. To challenge myself, I'm implementing the number parser myself.
This is what I got so far:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
let ``int`` =
digit
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
The trouble is that digit is a Parser<char,_>, whereas the second option for int is a Parser<string,_>. Would I normally just use a combinator to turn digit into a Parser<char,_>, or is there something else I should do?
The |>> operator is what you're looking for. I quote the FParsec reference:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
The parser p
|>> f applies the parser p and returns the result of the function
application f x, where x is the result returned by  p.
p |>> f is an
optimized implementation of p >>= fun x -> preturn (f x).
For example:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
(digit |>> string) (* The operator is used here *)
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
You may want to read FParsec tutorial on parsing JSON which uses this operator quite intensively.

Resources