FParsec combinator to turn Parser<char,_> until Parser<string,_>? - parsing

I'm certain there's a really simple answer to this, but I've been staring at this all day and I can't figure it out.
As per the tutorial, I'm implementing a JSON parser. To challenge myself, I'm implementing the number parser myself.
This is what I got so far:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
let ``int`` =
digit
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
The trouble is that digit is a Parser<char,_>, whereas the second option for int is a Parser<string,_>. Would I normally just use a combinator to turn digit into a Parser<char,_>, or is there something else I should do?

The |>> operator is what you're looking for. I quote the FParsec reference:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
The parser p
|>> f applies the parser p and returns the result of the function
application f x, where x is the result returned by  p.
p |>> f is an
optimized implementation of p >>= fun x -> preturn (f x).
For example:
let jnumber =
let neg = stringReturn "-" -1 <|> preturn 1
let digit = satisfy (isDigit)
let digit19 = satisfy (fun c -> isDigit c && c <> '0')
let digits = many1 digit
(digit |>> string) (* The operator is used here *)
<|> (many1Satisfy2 (fun c -> isDigit c && c <> '0') isDigit)
You may want to read FParsec tutorial on parsing JSON which uses this operator quite intensively.

Related

How to insert an indentation check into the operator precedence parser?

I am working on the parsing stage for the language I am making and am having difficulty with the following.
let test2 = // I'd like this to be an error.
"""
2
+ 2
"""
let result = run (spaces >>. expr) test2
val result : ParserResult<CudaExpr,unit> =
Success: Add (LitInt32 2,LitInt32 2)
I already managed to make the following example when the terms are indented incorrectly
2 +
2
give me an error, but not when the operator is on the wrong indentation level. I need something like a before-parse check.
let operators expr i =
let f expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f expr
f opp.ExpressionParser
The above function is how the operators phase is structured and as you can see, the term parsers get wrapped in a function that does the indentation check, but the last line is faulty.
Here is a simplified example of the full parser.
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsecCS.dll"
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsec.dll"
open FParsec
type Expr =
| V of string
| Add of Expr * Expr
let identifier = many1Satisfy2L isAsciiLetter (fun x -> isAsciiLetter x || isDigit x || x = ''') "identifier" .>> spaces |>> V
let indentations expressions (s: CharStream<_>) =
let i = s.Column
let expr_indent expr (s: CharStream<_>) =
let expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
many1 expr s
expr_indent (expressions i) s
let expr =
let opp = new OperatorPrecedenceParser<_,_,_>()
opp.AddOperator(InfixOperator("+", spaces, 6, Associativity.Left, fun x y -> Add(x,y)))
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
let rec expr s = indentations (operators identifier) s
expr
let test2 = // I'd like this to be an error.
"""
a
+
b
"""
let result = run (spaces >>. expr) test2
The full parser so far can be found here.
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
I did not realize it 2.5 weeks ago, but what happens when a new block gets opened and expr s gets called is that the term parser gets overwritten with the new indentation and there is no way to back it up and restore it on exit. I did a bit of looking around and managed to adapt the Pratt top down parsing method for my purposes.
Here is a talk by Douglas Crockford on the method.
let poperator: Parser<_,_> =
let f c = (isAsciiIdContinue c || isAnyOf [|' ';'\t';'\n';'\"';'(';')';'{';'}';'[';']'|] c) = false
(many1Satisfy f .>> spaces)
>>= fun token ->
match dict_operator.TryGetValue token with
| true, x -> preturn x
| false, _ -> fail "unknown operator"
let rec led poperator term left (prec,asoc,m) =
match asoc with
| Associativity.Left | Associativity.None -> tdop poperator term prec |>> m left
| Associativity.Right -> tdop poperator term (prec-1) |>> m left
| _ -> failwith "impossible"
and tdop poperator term rbp =
let rec f left =
poperator >>= fun (prec,asoc,m as v) ->
if rbp < prec then led poperator term left v >>= loop
else pzero
and loop left = attempt (f left) <|>% left
term >>= loop
let operators expr i (s: CharStream<_>) =
let expr_indent expr (s: CharStream<_>) = expr_indent i (<=) expr s
let op s = expr_indent poperator s
let term s = expr_indent expr s
tdop op term 0 s
The led and tdop functions which do the actual precedence parsing are 10 lines long. The above is just a snippet of the full parser for the language I am making - in terms of syntax it is similar to F# and is indentation sensitive. Here is a more straightforward F# translation of Douglas Crockford's Javascript example.

precedence climbing in haskell: parsec mutual recursion error

I'm programming the precedence climbing algorithm in Haskell, but for a reason unknown to me, does not work. I think that Parsec state info is lost at some point, but I don't even know that is the source of the error:
module PrecedenceClimbing where
import Text.Parsec
import Text.Parsec.Char
{-
Algorithm
compute_expr(min_prec):
result = compute_atom()
while cur token is a binary operator with precedence >= min_prec:
prec, assoc = precedence and associativity of current token
if assoc is left:
next_min_prec = prec + 1
else:
next_min_prec = prec
rhs = compute_expr(next_min_prec)
result = compute operator(result, rhs)
return result
-}
type Precedence = Int
data Associativity = LeftAssoc
| RightAssoc
deriving (Eq, Show)
data OperatorInfo = OPInfo Precedence Associativity (Int -> Int -> Int)
mkOperator :: Char -> OperatorInfo
mkOperator = \c -> case c of
'+' -> OPInfo 1 LeftAssoc (+)
'-' -> OPInfo 1 LeftAssoc (-)
'*' -> OPInfo 2 LeftAssoc (*)
'/' -> OPInfo 2 LeftAssoc div
'^' -> OPInfo 3 RightAssoc (^)
getPrecedence :: OperatorInfo -> Precedence
getPrecedence (OPInfo prec _ _) = prec
getAssoc :: OperatorInfo -> Associativity
getAssoc (OPInfo _ assoc _) = assoc
getFun :: OperatorInfo -> (Int -> Int -> Int)
getFun (OPInfo _ _ fun) = fun
number :: Parsec String () Int
number = do
spaces
fmap read $ many1 digit
operator :: Parsec String () OperatorInfo
operator = do
spaces
fmap mkOperator $ oneOf "+-*/^"
computeAtom = do
spaces
number
loop minPrec res = (do
oper <- operator
let prec = getPrecedence oper
if prec >= minPrec
then do
let assoc = getAssoc oper
next_min_prec = if assoc == LeftAssoc
then prec + 1
else prec
rhs <- computeExpr(next_min_prec)
loop minPrec $ getFun oper res rhs
else return res) <|> (return res)
computeExpr :: Int -> Parsec String () Int
computeExpr minPrec = (do
result <- computeAtom
loop minPrec result) <|> (computeAtom)
getResult minPrec = parse (computeExpr minPrec) ""
My program for some reason is only processing the first operation or the first operand depending on the case, but does not go any further
GHCi session:
*PrecedenceClimbing> getResult 1 "46+10"
Right 56
*PrecedenceClimbing> getResult 1 "46+10+1"
Right 56
I'm not sure exactly what's wrong with your code but I'll offer these comments:
(1) These statements are not equivalent:
Generic Imperative: rhs = compute_expr(next_min_prec)
Haskell: rhs <- computeExpr(next_min_prec)
The imperative call to compute_expr will always return. The Haskell call may fail in which case the stuff following the call never happens.
(2) You are really working against Parsec's strengths by trying to parse tokens one at a time in sequence. To see the "Parsec way" of generically parsing expressions with operators of various precedences and associativities, have a look at:
buildExpression
Parsec and Expression Printing
Update
I've posted a solution to http://lpaste.net/165651

Parsing non binary operators with Parsec

Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.
Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?
For example, the following expression should be parsed into the tree
(a + b) + c + d * e + f
Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.
This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:
module Lib where
import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
data Expr =
Identifier String
| Multiply Expr Expr
| Add Expr Expr
instance Show Expr where
show (Identifier s) = s
show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell
-- Infix parser.
infix_ operator func =
Expr.Infix (reserved operator >> return func) Expr.AssocLeft
parser =
Expr.buildExpressionParser table term <?> "expression"
where
table = [[infix_ "*" Multiply], [infix_ "+" Add]]
term =
parens parser
<|> (Identifier <$> identifier)
<?> "term"
Running this in GHCi:
λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)
There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:
data Expr' =
Identifier' String
| Add' [Expr']
| Multiply' [Expr']
deriving (Show)
collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e#(Add l r) f =
collect l f ++ collect r f
collect e#(Multiply l r) f =
collect l f ++ collect r f
isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False
isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False
optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e#(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e#(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))
I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.

Applying a function returned from a subparser with fparsec

Noob alert!
Ok, I'm trying to build a simple math expression parser in fparsec. Right now all I want it to do is handle strings like this "1+2-3*4/5" and return a double as the result of the evaluation. No spaces, newlines, or parens, and left to right order of operations is fine.
Here's what I have so far:
let number = many1 digit |>> fun ds -> int <| String.Concat(ds)
let op : Parser<int -> int -> int, unit> =
charReturn '+' (+) <|>
charReturn '-' (-) <|>
charReturn '*' (*) <|>
charReturn '/' (/)
let expression, expressionImpl = createParserForwardedToRef()
do expressionImpl :=
choice[
attempt(number .>> op >>. expression);
number]
let test p str =
match run (p .>> eof) str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(result, _, _) -> printfn "Failure: %A" result
[<EntryPoint>]
let main argv =
test expression "1+1/2*3-4"
Console.Read() |> ignore
0
In the first choice of the expression parser, I'm not sure how to apply the function returned by the op parser.
As usual, I find the answer right after posting the question (after 3 hours of searching).
I just changed this line:
attempt(number .>> op >>. expression);
to this:
attempt(pipe3 number op expression (fun x y z -> y x z));
However, I just realized that my expressions parse backwards. Back to the drawing board.

Parsing date and time with FParsec

Within a simple query language I'd like to recognize date and time literals, preferably without using delimiters. For example,
CreationDate = 2013-05-13 5:30 PM
I could use a combinator to detect the basic syntax (e.g., yyyy-MM-dd hh:mm tt), but then it needs to be passed to DateTime.TryParse for full validation.
A few questions:
Is there a combinator for "post processing" a parser result, e.g., pstring "1/2/2000" |> (fun s -> try OK(DateTime.Parse s) with _ -> Fail("not a date"))
Is it possible to apply a predicate to a string (as satisfy does to char)?
Is there a better approach for parsing date/time?
UPDATE
Using Guvante's and Stephan's examples, I came up with this:
let dateTimeLiteral =
let date sep = pipe5 pint32 sep pint32 sep pint32 (fun a _ b _ c -> a, b, c)
let time =
(pint32 .>>. (skipChar ':' >>. pint32)) .>>.
(opt (stringCIReturn " am" false <|> stringCIReturn " pm" true))
(date (pstring "/") <|> date (pstring "-")) .>>.
(opt (skipChar ' ' >>. time)) .>> ws
>>=? (fun ((a, b, c), tt) ->
let y, m, d = if a > 12 then a, b, c else c, a, b
let h, n =
match tt with
| Some((h, n), tt) ->
match tt with
| Some true -> (match h with 12 -> h | _ -> h + 12), n
| Some false -> (match h with 12 -> h - 12 | _ -> h), n
| None -> h, n
| None -> 0, 0
try preturn (System.DateTime(y, m, d, h, n, 0)) |>> DateTime
with _ -> fail "Invalid date/time format")
You can easily build a custom combinator or parser that validates parsed input.
If you only want to use combinators ("Haskell-style"), you could use
let pDateString = pstring "1/2/2000"
let pDate1 =
pDateString
>>= fun str ->
try preturn (System.DateTime.Parse(str))
with _ -> fail "Date format error"
as Guvante just proposed.
If you want to avoid construction temporary parsers (see preturn ... and pfail ... above), you can just let the function accept a second parameter and directly return Reply values:
let pDate2 =
pDateString
>>= fun str stream ->
try Reply(System.DateTime.Parse(str))
with _ -> Reply(Error, messageError "Date format error")
If you want the error location to be at the beginning of the malformed date string, you could replace >>= with >>=?. Note that this also has consequences for error recovery.
If you want to have full control, you can write the parser only using the lower level API, starting with a basic version like the following:
let pDate3 =
fun stream ->
let reply = pDateString stream
if reply.Status = Ok then
try Reply(System.DateTime.Parse(reply.Result))
with _ -> Reply(Error, messageError "Date format error")
else
Reply(reply.Status, reply.Error)
This last version would also allow you to replace the pDateString parser with code that directly accesses the CharStream interface, which could give you some additional flexibility or performance.
Is there a combinator for "post processing" a parser result
It depends on what you want to do if you fail. You can always do |>> to get your DateTime out. Failing it is equally interesting, I think your example could be (given a parser sp that gets the correct string, note it would be of type Parser<string,'u>)
sp >>= (fun s -> match DateTime.TryParse s with
| true,result -> preturn result
| false,_ -> fail)
Here I am taking in the resultant string and calling the TryParse method, and returning either a preturn or a fail depending on whether it succeeds. I couldn't find any of the methods that worked exactly like that.
Note that >>=? would cause a backtrack if it failed.
Is it possible to apply a predicate to a string (as satisfy does for char)?
You would have to call the predicate for every character (2, 20, 201) which is usually not ideal. I am pretty sure you could whip up something like this if you wanted, but I don't think it is ideal for that reason, not to mention handling partial matches becomes harder.
Is there a better approach for parsing date/time?
The biggest factor is "What do you know about the date/time?" If you know it is in that syntax exactly then you should be able to use a post process and be fine (since hopefully the error case will be rare)
If you aren't sure, for instance if PM is optional, but would be unambiguously detailed, then you will probably want to break up the definition and combine it at the end. Note that here I have relaxed the character counts a little, you could add some opt to get even more relaxed, or replace the pint32 with digit and manual conversions.
let pipe6 = //Implementation left as an exercise
let dash = skipChar '-'
let space = skipChar ' '
let colon = skipChar ':'
pipe6 (pint32 .>> dash) //Year
(pint32 .>> dash) //Month
(pint32 .>> space) //Day
(pint32 .>> colon) //Hour
(pint32 .>> space) //Minute
(anyString) //AM/PM
(fun year month day hour minute amPm ->
DateTime(year, month, day,
hour + (if amPm.Equals("PM", StringComparison.InvariantCultureIgnoreCase)
then 12 else 0),
minute, 0)) //No seconds
Writing all that out I am not sure if you are better or worse off...
I've used next code to parse given date string into DataTime object.
2000-01-01 12:34:56,789
let pipe7 p1 p2 p3 p4 p5 p6 p7 f =
p1 >>= fun x1 ->
p2 >>= fun x2 ->
p3 >>= fun x3 ->
p4 >>= fun x4 ->
p5 >>= fun x5 ->
p6 >>= fun x6 ->
p7 >>= fun x7 -> preturn (f x1 x2 x3 x4 x5 x6 x7)
let int_ac = pint32 .>> anyChar
let pDateStr : Parser<DateTime, unit> = pipe7 int_ac int_ac int_ac int_ac int_ac int_ac int_ac (fun y m d h mi s mil -> new DateTime(y,m,d,h,mi,s,mil))

Resources