Position information in fparsec - f#

My AST model needs to carry location information (filename, line, index). Is there any built in way to access this information? From the reference docs, the stream seems to carry the position, but I'd prefer that I dont have to implement a dummy parser just to save the position, and add that everywhere.
Thanks in advance

Parsers are actually type abbreviations for functions from streams to replies:
Parser<_,_> is just CharStream<_> -> Reply<_>
Keeping that in mind, you can easily write a custom parser for positions:
let position : CharStream<_> -> Reply<Position> = fun stream -> Reply(stream.Position)
(* OR *)
let position : Parser<_,_> = fun stream -> Reply stream.Position
and atttach position information to every bit you parse using
position .>>. yourParser (*or tuple2 position yourParser*)
position parser does not consume any input and thus it is safe to combine in that way.
You can keep the code change required restricted to a single line and avoid uncontrollable code spread:
type AST = Slash of int64
| Hash of int64
let slash : Parser<AST,_> = char '/' >>. pint64 |>> Slash
let hash : Parser<AST,_> = char '#' >>. pint64 |>> Hash
let ast : Parser<AST,_> = slash <|> hash
(*if this is the final parser used for parsing lists of your ASTs*)
let manyAst : Parser< AST list,_> = many (ast .>> spaces)
let manyAstP : Parser<(Position * AST) list,_> = many ((position .>>. ast) .>> spaces)
(*you can opt in to parse position information for every bit
you parse just by modifiying only the combined parser *)
Update: FParsec has a predefined parser for positions:
http://www.quanttec.com/fparsec/reference/charparsers.html#members.getPosition

Related

FParsec match string which have one of 2 patterns

I'm trying to learn FParsec and am trying to match strings which follow on of two patterns.
The string can either be an ordanary string like "string" or it can be a string with one dot in it, like "st.ring".
The parser should look like this: Parser<(string Option * string),unit>. The first string is optional depending of if the string is splitted by a dot or not. The optional string represent the part of the string which is before the ".".
I have tried a few different things but I feel this attempt was the closes:
let charstilldot = manyCharsTill anyChar (pstring ".")
let parser = opt(charstilldot) .>>. (many1Chars anyChar)
This works with input like this "st.ring" but not "string" since not dot exists in the latter.
I would verry much appriciate some help, thank you!
EDIT:
I have solution which basicly parse the arguments in order and swap the arguments depending of their is a dot or not in the string
let colTargetWithoutDot : Parser<string Option,unit> = spaces |>> fun _ -> None
let colTargetWithDot = (pstring "." >>. alphastring) |>> Some
let specificColumn = alphastring .>>. (colTargetWithDot <|> colTargetWithoutDot) |>> (fun (h,t) ->
match h,t with
| h,None -> (None,h)
| h,Some(t) -> (Some(h),t))
However this is not pretty so I would still appriciate another solution!
I think the main problem here is that charstilldot consumes characters even when it fails. In that situation, many1chars then fails because the entire input has already been consumed. The easiest way to address this is by using attempt to rollback when there is no dot:
let charstilldot = attempt (manyCharsTill anyChar (pstring "."))
let parser = opt(charstilldot) .>>. (many1Chars anyChar)
Result:
"str.ing" -> (Some "str", "ing")
"string" -> (None, "string")
I think there are other good solutions as well, but I've tried to give you one that requires the least change to your current code.

Parsing int or float with FParsec

I'm trying to parse a file, using FParsec, which consists of either float or int values. I'm facing two problems that I can't find a good solution for.
1
Both pint32 and pfloat will successfully parse the same string, but give different answers, e.g pint32 will return 3 when parsing the string "3.0" and pfloat will return 3.0 when parsing the same string. Is it possible to try parsing a floating point value using pint32 and have it fail if the string is "3.0"?
In other words, is there a way to make the following code work:
let parseFloatOrInt lines =
let rec loop intvalues floatvalues lines =
match lines with
| [] -> floatvalues, intvalues
| line::rest ->
match run floatWs line with
| Success (r, _, _) -> loop intvalues (r::floatvalues) rest
| Failure _ ->
match run intWs line with
| Success (r, _, _) -> loop (r::intvalues) floatvalues rest
| Failure _ -> loop intvalues floatvalues rest
loop [] [] lines
This piece of code will correctly place all floating point values in the floatvalues list, but because pfloat returns "3.0" when parsing the string "3", all integer values will also be placed in the floatvalues list.
2
The above code example seems a bit clumsy to me, so I'm guessing there must be a better way to do it. I considered combining them using choice, however both parsers must return the same type for that to work. I guess I could make a discriminated union with one option for float and one for int and convert the output from pint32 and pfloat using the |>> operator. However, I'm wondering if there is a better solution?
You're on the right path thinking about defining domain data and separating definition of parsers and their usage on source data. This seems to be a good approach, because as your real-life project grows further, you would probably need more data types.
Here's how I would write it:
/// The resulting type, or DSL
type MyData =
| IntValue of int
| FloatValue of float
| Error // special case for all parse failures
// Then, let's define individual parsers:
let pMyInt =
pint32
|>> IntValue
// this is an alternative version of float parser.
// it ensures that the value has non-zero fractional part.
// caveat: the naive approach would treat values like 42.0 as integer
let pMyFloat =
pfloat
>>= (fun x -> if x % 1 = 0 then fail "Not a float" else preturn (FloatValue x))
let pError =
// this parser must consume some input,
// otherwise combined with `many` it would hang in a dead loop
skipAnyChar
>>. preturn Error
// Now, the combined parser:
let pCombined =
[ pMyFloat; pMyInt; pError ] // note, future parsers will be added here;
// mind the order as float supersedes the int,
// and Error must be the last
|> List.map (fun p -> p .>> ws) // I'm too lazy to add whitespase skipping
// into each individual parser
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Note, the code above is capable working with any sources: strings, streams, or whatever. Your real app may need to work with files, but unit testing can be simplified by using just string list.
// Now, applying the parser somewhere in the code:
let maybeParseResult =
match run pCombined myStringData with
| Success(result, _, _) -> Some result
| Failure(_, _, _) -> None // or anything that indicates general parse failure
UPD. I have edited the code according to comments. pMyFloat was updated to ensure that the parsed value has non-zero fractional part.
FParsec has the numberLiteral parser that can be used to solve the problem.
As a start you can use the example available at the link above:
open FParsec
open FParsec.Primitives
open FParsec.CharParsers
type Number = Int of int64
| Float of float
// -?[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?
let numberFormat = NumberLiteralOptions.AllowMinusSign
||| NumberLiteralOptions.AllowFraction
||| NumberLiteralOptions.AllowExponent
let pnumber : Parser<Number, unit> =
numberLiteral numberFormat "number"
|>> fun nl ->
if nl.IsInteger then Int (int64 nl.String)
else Float (float nl.String)```

How to add a condition that a parsed number must satisfy in FParsec?

I am trying to parse an int32 with FParsec but have an additional restriction that the number must be less than some maximum value. Is their a way to perform this without writing my own custom parser (as below) and/or is my custom parser (below) the appropriate way of achieving the requirements.
I ask because most of the built-in library functions seem to revolve around a char satisfying certain predicates and not any other type.
let pRow: Parser<int> =
let error = messageError ("int parsed larger than maxRows")
let mutable res = Reply(Error, error)
fun stream ->
let reply = pint32 stream
if reply.Status = Ok && reply.Result <= 1000000 then
res <- reply
res
UPDATE
Below is an attempt at a more fitting FParsec solution based on the direction given in the comment below:
let pRow2: Parser<int> =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
Is this the correct way to do it?
You've done an excellent research and almost answered your own question.
Generally, there are two approaches:
Unconditionally parse out an int and let the further code to check it for validity;
Use a guard rule bound to the parser. In this case (>>=) is the right tool;
In order to make a good choice, ask yourself whether an integer that failed to pass the guard rule has to "give another chance" by triggering another parser?
Here's what I mean. Usually, in real-life projects, parsers are combined in some chains. If one parser fails, the following one is attempted. For example, in this question, some programming language is parsed, so it needs something like:
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
Theoretically, your DSL may need to differentiate a "small int value" from another type:
/// The resulting type, or DSL
type Output =
| SmallValue of int
| LargeValueAndString of int * string
| Comment of string
let pSmallValue =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
|>> SmallValue
let pLargeValueAndString =
pint32 .>> ws .>>. (manyTill ws)
|>> LargeValueAndString
let pComment =
manyTill ws
|>> Comment
let pCombined =
[ pSmallValue; pLargeValueAndString; pComment]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Built this way, pCombined will return:
"42 ABC" gets parsed as [ SmallValue 42 ; Comment "ABC" ]
"1234567 ABC" gets parsed as [ LargeValueAndString(1234567, "ABC") ]
As we see, the guard rule impacts how the parsers are applied, so the guard rule has to be within the parsing process.
If, however, you don't need such complication (e.g., an int is parsed unconditionally), your first snippet is just fine.

F#, FParsec, and Calling a Stream Parser Recursively

I'm developing a multi-part MIME parser using F# and FParsec. I'm developing iteratively, and so this is highly unrefined, brittle code--it only solves my first immediate problem. Red, Green, Refactor.
I'm required to parse a stream rather than a string, which is really throwing me for a loop. Given that constraint, to the best of my understanding, I need to call a parser recursively. How to do that is beyond my ken, at least with the way I've proceeded thus far.
namespace MultipartMIMEParser
open FParsec
open System.IO
type private Post = { contentType : string
; boundary : string
; subtype : string
; content : string }
type MParser (s:Stream) =
let ($) f x = f x
let ascii = System.Text.Encoding.ASCII
let str cs = System.String.Concat (cs:char list)
let q = "\""
let qP = pstring q
let pSemicolon = pstring ";"
let manyNoDoubleQuote = many $ noneOf q
let enquoted = between qP qP manyNoDoubleQuote |>> str
let skip = skipStringCI
let pContentType = skip "content-type: "
>>. manyTill anyChar (attempt $ preturn () .>> pSemicolon)
|>> str
let pBoundary = skip " boundary=" >>. enquoted
let pSubtype = opt $ pSemicolon >>. skip " type=" >>. enquoted
let pContent = many anyChar |>> str // TODO: The content parser needs to recurse on the stream.
let pStream = pipe4 pContentType pBoundary pSubtype pContent
$ fun c b t s -> { contentType=c; boundary=b; subtype=t; content=s }
let result s = match runParserOnStream pStream () "" s ascii with
| Success (r,_,_) -> r
| Failure (e,_,_) -> failwith (sprintf "%A" e)
let r = result s
member p.ContentType = r.contentType
member p.Boundary = r.boundary
member p.ContentSubtype = r.subtype
member p.Content = r.content
The first line of the example POST follows:
content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"
It spans a single line in the file. Further sub-parts in the content include content-type values that span multiple lines, so I know I'll have to refine my parsers if I am to reuse them.
Somehow I've got to call pContent with the (string?) results of pBoundary so that I can split the rest of the stream on the appropriate boundaries, and then somehow return multiple parts for the content of the post, each of which will be a separate post, with headers and content (which will obviously have to be something other than a string). My head is spinning. This code already seems far too complex to parse a single line.
Much appreciation for insight and wisdom!
This is a fragment that might get you going in the right direction.
Get your parsers to spit out something with the same base type. I prefer to use F#'s discriminated unions for this purpose. If you really do need to push values into a Post type, then walk the returned AST tree. That's just the way I'd approach it.
#if INTERACTIVE
#r"""..\..\FParsecCS.dll""" // ... edit path as appropriate to bin/debug, etc.
#r"""..\..\FParsec.dll"""
#endif
let packet = #"content-type: Multipart/related; boundary=""RN-Http-Body-Boundary""; type=""multipart/related""
--RN-Http-Body-Boundary
Message-ID: <25845033.1160080657073.JavaMail.webmethods#exshaw>
Mime-Version: 1.0
Content-Type: multipart/related; type=""application/xml"";
boundary=""----=_Part_235_11184805.1160080657052""
------=_Part_235_11184805.1160080657052
Content-Type: Application/XML
Content-Transfer-Encoding: binary
Content-Location: RN-Preamble
Content-ID: <1430586.1160080657050.JavaMail.webmethods#exshaw>"
//XML document begins here...
type AST =
| Document of AST list
| Header of AST list
/// ie. Content-Type is the tag, and it consists of a list of key value pairs
| Tag of string * AST list
| KeyValue of string * string
| Body of string
The AST DU above could represent a first pass of the example data you posted in your other question. It could be finer grained than that, but simpler is normally better. I mean, the ultimate destination in your example is a Post type, and you could achieve that with some simple pattern matching.

how parse the between of when the right could come after a repeating pattern?

How would you use existing FParsec functionality to find a repeating consecutive pattern in the rightmost tag?
It's a legitimate possibility in this context. Pre-parsing + escaping might work, but is there a better solution? Do we need to write a new forward combinator, and if so, what does it look like?
#r"""bin\debug\FParsecCS.dll"""
#r"""bin\debug\FParsec.dll"""
open FParsec
let str = pstring
let phraseEscape = pchar '\\' >>. pchar '"'
let phraseChar = phraseEscape <|> (noneOf "|\"\r\n]") // <- this right square bracket needs to be removed
let phrase = manyChars phraseChar
let wrapped = between (str"[[") (str"]]".>>newline) phrase
run wrapped "[[some text]]\n" // <- works fine
// !! problem
run wrapped "[[array[] d]]\n" // <- that means we can't make ']' invalid in phraseChar
// !! problem
run wrapped "[[array[]]]\n" // <- and this means that the first ]] gets match leaving a floating one to break the parser
Sorry to be answering my own question, but...
See composable function phraseTill, and the pend parser that is passed to it of (notFollowedBy(s"]]]")>>.(s"]]"))
#r"""bin\debug\FParsecCS.dll"""
#r"""bin\debug\FParsec.dll"""
open FParsec
let s = pstring
let phraseChar = (noneOf "\r\n")
let phrase = manyChars phraseChar
/// keep eating characters until the pend parser is successful
let phraseTill pend = manyCharsTill phraseChar pend
/// when not followed by tipple, a double will truly be the end
let repeatedTo repeatedPtrn ptrn = notFollowedBy(s repeatedPtrn)>>.(s ptrn)
let wrapped = (s"[[")>>.phraseTill (repeatedTo "]]]" "]]")
run wrapped "[[some text]]]"
run wrapped "[[some text]]"
NB. if you try this out in FSharp Interactive (FSI), make sure you have at least one "run wrapped" line when you send your text to FSI to be evaluated (ie. right-click 'Execute In Interactive'). The type only gets inferred / pinned on application in this example. We could have provided explicit definitions at the risk of being more verbose.

Resources