I'm transforming some OCaml code to F# having a problem with the OCaml let...and... which only exists in F# by using a recursive function.
I have the given OCaml code:
let matches s = let chars = explode s in fun c -> mem c chars
let space = matches " \t\n\r"
and punctuiation = matches "() [] {},"
and symbolic = matches "~'!##$%^&*-+=|\\:;<>.?/"
and numeric = matches "0123456789"
and alphanumeric = matches "abcdefghijklmopqrstuvwxyz_'ABCDEFGHIJKLMNOPQRSTUVWXYZ"
which I want to use in these two methods:
let rec lexwhile prop inp = match inp with
c::cs when prop c -> let tok,rest = lexwhile prop cs in c+tok,rest
|_->"",inp
let rec lex inp =
match snd(lexwhile space inp)with
[]->[]
|c::cs ->let prop = if alphanumeric(c) then alphanumeric
else if symbolic(c) then symbolic
else fun c ->false in
let toktl,rest = lexwhile prop cs in
(c+toktl)::lex rest
Has someone any idea how I have to change it that so that I can use it?
It looks like you are trying to translate "Handbook of Practical Logic and Automated Reasoning".
Did you see: An F# version of the book code is now available! Thanks to Eric Taucher, Jack Pappas and Anh-Dung Phan.
You need to look at intro.fs
// pg. 17
// ------------------------------------------------------------------------- //
// Lexical analysis. //
// ------------------------------------------------------------------------- //
let matches s =
let chars =
explode s
fun c -> mem c chars
let space = matches " \t\n\r"
let punctuation = matches "()[]{},"
let symbolic = matches "~`!##$%^&*-+=|\\:;<>.?/"
let numeric = matches "0123456789"
let alphanumeric = matches "abcdefghijklmnopqrstuvwxyz_'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
let rec lexwhile prop inp =
match inp with
| c :: cs when prop c ->
let tok, rest = lexwhile prop cs
c + tok, rest
| _ -> "", inp
let rec lex inp =
match snd <| lexwhile space inp with
| [] -> []
| c :: cs ->
let prop =
if alphanumeric c then alphanumeric
else if symbolic c then symbolic
else fun c -> false
let toktl, rest = lexwhile prop cs
(c + toktl) :: lex rest
I asked many questions here while working on the translation and prefixed them with Converting OCaml to F#:. If you look in the comments you will see how the three of us came to work on this project.
You can just write:
let explode s = [for c in s -> string c]
let matches str strc =
let eStr = explode str
List.contains strc eStr
let space = matches " \t\n\r"
let punctuation = matches "() [] {},"
let symbolic = matches "~'!##$%^&*-+=|\\:;<>.?/"
let numeric = matches "0123456789"
let alphanumeric = matches "abcdefghijklmopqrstuvwxyz_'ABCDEFGHIJKLMNOPQRSTUVWXYZ"
F# tends to be written using lightweight, rather than verbose, syntax so you don't generally need to use the in, begin and end keywords. See: https://msdn.microsoft.com/en-us/library/dd233199.aspx for details about the differences.
Personally, I would probably refactor all of those string -> bool functions into active patterns, e.g.:
let (|Alphanumeric|_|) str =
match matches "abcdefghijklmopqrstuvwxyz_'ABCDEFGHIJKLMNOPQRSTUVWXYZ" str with
|true -> Some str
|false -> None
let (|Symbolic|_|) str =
match matches "~'!##$%^&*-+=|\\:;<>.?/" str with
|true -> Some str
|false -> None
Then you can pattern match, e.g.:
match c with
|Alphanumeric _ -> // alphanumeric case
|Symbolic _ -> // symbolic case
|_ -> // other cases
Related
I have a problem where during the parsing of a stream I get to point where the next N characters need to be parsed by applying a specfic parser multiple times (in sequence).
(stripped down toy) Example:
17<tag><anothertag><a42...
^
|- I'm here
Let's say the 17 indicates that the next N=17 characters make up tags, so I need to repetetively apply my "tagParser" but stop after 17 chars and not consume the rest even if it looks like a tag because that has a different meaning and will be parsed by another parser.
I cannot use many or many1 because that would eat the stream beyond those N characters.
Nor can I use parray because I do not know how many successful applications of that parser are there within the N characters.
I was looking into manyMinMaxSatisfy but could not figure out how to make use of it in this case.
Is there a way to cut N chars of a stream and feed them to some parser? Or is there a way to invoke many applications but up to N chars?
Thanks.
You can use getPosition to make sure you don't go past the specified number of characters. I threw this together (using F# 6) and it seems to work, although simpler/faster solutions may be possible:
let manyLimit nChars p =
parse {
let! startPos = getPosition
let rec loop values =
parse {
let! curPos = getPosition
let nRemain = (startPos.Index + nChars) - curPos.Index
if nRemain = 0 then
return values
elif nRemain > 0 then
let! value = p
return! loop (value :: values)
else
return! fail $"limit exceeded by {-nRemain} chars"
}
let! values = loop []
return values |> List.rev
}
Test code:
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let parser =
parse {
let! nChars = pint64
let! tags = manyLimit nChars ptag
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"
Output is:
Success: (["tag"; "anothertag"], "<a42...")
Quite low-level parser, that operates on raw Reply objects. It reads count of chars, creates substring to feed to tags parser and consumes rest. There's should be an easier way, but I don't have much experience with FParsec
open FParsec
type Tag = Tag of string
let pTag = // parses tag string and constructs 'Tag' object
skipChar '<' >>. many1Satisfy isLetter .>> skipChar '>'
|>> Tag
let pCountPrefixedTags stream =
let count = pint32 stream // read chars count
if count.Status = Ok then
let count = count.Result
// take exactly 'count' chars
let tags = manyMinMaxSatisfy count count (fun _ -> true) stream
if tags.Status = Ok then
// parse substring with tags
let res = run (many1 pTag) tags.Result
match res with
| Success (res, _, _) -> Reply(res)
| Failure (_, error, _) -> Reply(ReplyStatus.Error, error.Messages)
else
Reply(tags.Status, tags.Error)
else
Reply(count.Status, count.Error)
let consumeStream =
many1Satisfy (fun _ -> true)
run (pCountPrefixedTags .>>. consumeStream) "17<tag><anothertag><notTag..."
|> printfn "%A" // Success: ([Tag "tag"; Tag "anothertag"], "<notTag...")
You also can do this without going down to stream level.
open FParsec
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let tagsFromChars (l: char[]) =
let s = new System.String(l)
match run (many ptag) s with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> []
let parser =
parse {
let! nChars = pint32
let! tags = parray nChars anyChar |>> tagsFromChars
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"
I'm trying to learn functional programming and having difficulty expressing a file parsing task functionally.
Let's say I have a text file with the following format:
val_0: <--- "header"
key_0_0 <--- these keys should be set to the "header" or val0
key_0_1
key_0_2
...
...
val_n:
...
key_n_m
How can I end up with a hash table with all keys set to their associated value?
EDIT: My solution. Can anyone improve it?
open Core.Std
let contains s1 s2 =
let re = Str.regexp_string s2 in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
let read_db f =
let tbl = Caml.Hashtbl.create 123456 in
let lines = In_channel.read_lines f in
let src = ref "" in
List.iter ~f:(fun g -> if contains g ":" then src := else Caml.Hashtbl.add tbl g !src) lines;
tbl
Here's my solution, just for comparison.
let line_opt ic =
try Some (input_line ic) with End_of_file -> None
let fold_lines_in f init fn =
let ic = open_in fn in
let rec go accum =
match line_opt ic with
| None -> accum
| Some line -> go (f accum line)
in
let res = go init in
close_in ic;
res
let hashtable_of_file fn =
let ht = Hashtbl.create 16 in
let itab label line =
let len = String.length line in
if line.[len - 1] = ':' then
String.sub line 0 (len - 1)
else
let () = Hashtbl.add ht line label in
label
in
let _ = fold_lines_in itab "" fn in
ht
Update
(Fixed non-tail-recursive fold implementation, sorry.)
Waving from the future as this requires OCaml library features that were not available at the time the question was asked.
String.ends_with was added in OCaml 4.13.0. And sequences in OCaml 4.07.0.
Setting aside all of the file reading, we can do this without the imperative style Hashtbl using just Map and list folds.
We'll build up a list using fold_left by keeping the current key and a reversed association list of values, then later reversing both the association list and the values in each sublist, and then converting to a Map.
module SM = Map.Make (String)
let data = "val_0:\nhello\nworld\nval_1:\nfoo\nval_2:\nbar"
let data' = String.split_on_char '\n'
data'
|> List.fold_left
(fun (cur_key, lst as acc) line ->
let label = String.ends_with ~suffix:":" line in
match cur_key with
| None when label -> (Some line, (line, [])::lst)
| Some _ when label -> (Some line, (line, [])::lst)
| None -> acc
| Some cur_key' -> (cur_key, let ((key, lst')::tl) = lst in (key, line::lst')::tl)
)
(None, [])
|> snd
|> List.map (fun (k, v) -> (k, List.rev v))
|> List.rev
|> List.to_seq
|> SM.of_seq
If we get run SM.bindings on this value we can see that it has worked:
[("val_0:", ["hello"; "world"]); ("val_1:", ["foo"]); ("val_2:", ["bar"])]
Okay, since my last question elicited no responses, I'm forging ahead in a different direction. Lol!
I can't find any examples beyond the official documentation on managing user state, or accessing the results of a prior parser.
N.b. This code does not compile.
namespace MultipartMIMEParser
open FParsec
open System.IO
type Header = { name : string
; value : string
; addl : (string * string) list option }
type Content = Content of string
| Post of Post list
and Post = { headers : Header list
; content : Content }
type private UserState = { Boundary : string }
with static member Default = { Boundary="" }
module internal P =
let ($) f x = f x
let undefined = failwith "Undefined."
let ascii = System.Text.Encoding.ASCII
let str cs = System.String.Concat (cs:char list)
let makeHeader ((n,v),nvps) = { name=n; value=v; addl=nvps}
let runP p s = match runParserOnStream p UserState.Default "" s ascii with
| Success (r,_,_) -> r
| Failure (e,_,_) -> failwith (sprintf "%A" e)
let blankField = parray 2 newline
let delimited d e =
let pEnd = preturn () .>> e
let part = spaces >>. (manyTill $ noneOf d $ (attempt (preturn () .>> pstring d) <|> pEnd)) |>> str
in part .>>. part
let delimited3 firstDelimiter secondDelimiter thirdDelimiter endMarker =
delimited firstDelimiter endMarker
.>>. opt (many (delimited secondDelimiter endMarker
>>. delimited thirdDelimiter endMarker))
// TODO: This is the parser I'm asking about.
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
in delimited3 ":" ";" "=" blankField
|>> makeHeader
>>. fun stream -> if includesBoundary // How do I access the output from makeHeader here?
then stream.UserState <- setBoundary b // I need b to be read from the output of makeHeader.
Reply ()
else Reply ()
let pHeaders = manyTill pHeader $ attempt (preturn () .>> blankField)
// N.b. This is the mess I'm currently wrestling with. It does not compile, and is
// not sound yet.
let rec pContent boundary =
match boundary with
| "" -> // Content is text.
let line = restOfLine false
in pipe2 pHeaders (manyTill line $ attempt (preturn () .>> blankField))
$ fun h c -> { headers=h
; content=Content $ System.String.Join (System.Environment.NewLine,c) }
| _ -> // Content contains boundaries.
let b = "--"+boundary
let p = pipe2 pHeaders (pContent b) $ fun h c -> { headers=h; content=c }
in skipString b >>. manyTill p (attempt (preturn () .>> blankField))
let pStream = runP (pipe2 pHeaders pContent $ fun h c -> { headers=h; content=c })
type MParser (s:Stream) =
let r = P.pStream s
let findHeader name =
match r.headers |> List.tryFind (fun h -> h.name.ToLower() = name) with
| Some h -> h.value
| None -> ""
member p.Boundary =
let isBoundary ((s:string),_) = s.ToLower() = "boundary"
let header = r.headers
|> List.tryFind (fun h -> if h.addl.IsSome
then h.addl.Value |> List.exists isBoundary
else false)
in match header with
| Some h -> h.addl.Value |> List.find isBoundary |> snd
| None -> ""
member p.ContentID = findHeader "content-id"
member p.ContentLocation = findHeader "content-location"
member p.ContentSubtype = findHeader "type"
member p.ContentTransferEncoding = findHeader "content-transfer-encoding"
member p.ContentType = findHeader "content-type"
member p.Content = r.content
member p.Headers = r.headers
member p.MessageID = findHeader "message-id"
member p.MimeVersion = findHeader "mime-version"
A truncated example of the POST I am trying to parse follows:
content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"
--RN-Http-Body-Boundary
Message-ID: <25845033.1160080657073.JavaMail.webmethods#exshaw>
Mime-Version: 1.0
Content-Type: multipart/related; type="application/xml";
boundary="----=_Part_235_11184805.1160080657052"
------=_Part_235_11184805.1160080657052
Content-Type: Application/XML
Content-Transfer-Encoding: binary
Content-Location: RN-Preamble
Content-ID: <1430586.1160080657050.JavaMail.webmethods#exshaw>
XML document begins here...
So basically, what you want to do in pHeader is to use the parser as a monad, rather than an applicative. Based on your code style you come from Haskell so I'll assume you know these words. Something like this then:
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
in delimited3 ":" ";" "=" blankField
|>> makeHeader
>>= fun header stream ->
if includesBoundary header
then let b = undefined // some expression including header, if I understood correctly
stream.UserState <- setBoundary b
Reply ()
else Reply ()
Or you can write it in a computation expression (which would correspond to do-notation in Haskell):
let pHeader =
let includesBoundary s = undefined
let setBoundary b = { Boundary=b }
parse {
let! header =
delimited3 ":" ";" "=" blankField
|>> makeHeader
return! fun stream ->
if includesBoundary header
then let b = undefined // some expression including header, if I understood correctly
stream.UserState <- setBoundary b
Reply ()
else Reply ()
}
I have the following program that runs. It takes a line of text and splits it into two parts, the first is an identifier and the second is the remainder of the line. My parser for the identifier (factID) takes any string of characters as the identifier, which is not (quite) what I want. What I want is a parser that only succeeds when it encounters two consecutive upper case letters. So for example "AA" should succeed while "A", "A1" or "AAA" should not.
What I can't figure out is how construct a parser that looks for a fixed length token. I thought perhaps CharParsers.next2CharsSatisfy might be the function I am looking for, but I can't figure out how to properly use it.
open FParsec
let test p str =
match run p str with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
let ws = spaces
let str_ws s = pstring s .>> ws
type StringConstant = StringConstant of string * string
let factID =
let isIdentifierFirstChar c = isLetter c
let isIdentifierChar c = isLetter c
many1Satisfy2L isIdentifierFirstChar isIdentifierChar "factID"
let factText =
let isG c = isLetter c || isDigit c || c = ' ' || c = '.'
manySatisfy isG
let factParse = pipe3 factID (str_ws " ") factText
(fun id _ str -> StringConstant(id, str))
[<EntryPoint>]
let main argv =
test factParse "AA This is some text." // This should pass
test factParse "A1 This is some text." // This should fail
test factParse "AAA This is some text." // This passes but I want it to fail
0 // return an integer exit code
I think this would do it
let pFactID = manyMinMaxSatisfy 2 2 Char.IsUpper
I've started learning FParsec. It has a very flexible way to parse numbers; I can provide a set of number formats I want to use:
type Number =
| Numeral of int
| Decimal of float
| Hexadecimal of int
| Binary of int
let numberFormat = NumberLiteralOptions.AllowFraction
||| NumberLiteralOptions.AllowHexadecimal
||| NumberLiteralOptions.AllowBinary
let pnumber =
numberLiteral numberFormat "number"
|>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String)
elif num.IsBinary then Binary (int num.String)
elif num.IsInteger then Numeral (int num.String)
else Decimal (float num.String)
However, the language I'm trying to parse is a bit strange. A number could be numeral (non-negative int), decimal (non-negative float), hexadecimal (with prefix #x) or binary (with prefix #b):
numeral: 0, 2
decimal: 0.2, 2.0
hexadecimal: #xA04, #x611ff
binary: #b100, #b001
Right now I have to do parsing twice by substituting # by 0 (if necessary) to make use of pnumber:
let number: Parser<_, unit> =
let isDotOrDigit c = isDigit c || c = '.'
let numOrDec = many1Satisfy2 isDigit isDotOrDigit
let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s"
let str = spaces >>. numOrDec <|> hexOrBin
str |>> fun s -> match run pnumber s with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> failwith errorMsg
What is a better way of parsing in this case? Or how can I alter FParsec's CharStream to be able to make conditional parsing easier?
Parsing numbers can be pretty messy if you want to generate good error messages and properly check for overflows.
The following is a simple FParsec implementation of your number parser:
let numeralOrDecimal : Parser<_, unit> =
// note: doesn't parse a float exponent suffix
numberLiteral NumberLiteralOptions.AllowFraction "number"
|>> fun num ->
// raises an exception on overflow
if num.IsInteger then Numeral(int num.String)
else Decimal(float num.String)
let hexNumber =
pstring "#x" >>. many1SatisfyL isHex "hex digit"
|>> fun hexStr ->
// raises an exception on overflow
Hexadecimal(System.Convert.ToInt32(hexStr, 16))
let binaryNumber =
pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
|>> fun hexStr ->
// raises an exception on overflow
Binary(System.Convert.ToInt32(hexStr, 2))
let number =
choiceL [numeralOrDecimal
hexNumber
binaryNumber]
"number literal"
Generating good error messages on overflows would complicate this implementation a bit, as you would ideally also need to backtrack after the error, so that the error position ends up at the start of the number literal (see the numberLiteral docs for an example).
A simple way to gracefully handle possible overflow exception is to use a little exception handling combinator like the following:
let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> =
fun stream ->
let state = stream.State
try
p stream
with e -> // catching all exceptions is somewhat dangerous
stream.BacktrackTo(state)
Reply(FatalError, messageError e.Message)
You could then write
let number = mayThrow (choiceL [...] "number literal")
I'm not sure what you meant to say with "alter FParsec's CharStream to be able to make conditional parsing easier", but the following sample demonstrates how you could write a low-level implementation that only uses the CharStream methods directly.
type NumberStyles = System.Globalization.NumberStyles
let invariantCulture = System.Globalization.CultureInfo.InvariantCulture
let number: Parser<Number, unit> =
let expectedNumber = expected "number"
let inline isBinary c = c = '0' || c = '1'
let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9
let hexStringToInt (str: string) = // does no argument or overflow checking
let mutable n = 0
for c in str do
n <- n*16 + hex2int c
n
let binStringToInt (str: string) = // does no argument or overflow checking
let mutable n = 0
for c in str do
n <- n*2 + (int c - int '0')
n
let findIndexOfFirstNonNull (str: string) =
let mutable i = 0
while i < str.Length && str.[i] = '0' do
i <- i + 1
i
let isHexFun = id isHex // tricks the compiler into caching the function object
let isDigitFun = id isDigit
let isBinaryFun = id isBinary
fun stream ->
let start = stream.IndexToken
let cs = stream.Peek2()
match cs.Char0, cs.Char1 with
| '#', 'x' ->
stream.Skip(2)
let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false)
if str.Length <> 0 then
let i = findIndexOfFirstNonNull str
let length = str.Length - i
if length < 8 || (length = 8 && str.[i] <= '7') then
Reply(Hexadecimal(hexStringToInt str))
else
stream.Seek(start)
Reply(Error, messageError "hex number literal is too large for 32-bit int")
else
Reply(Error, expected "hex digit")
| '#', 'b' ->
stream.Skip(2)
let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false)
if str.Length <> 0 then
let i = findIndexOfFirstNonNull str
let length = str.Length - i
if length < 32 then
Reply(Binary(binStringToInt str))
else
stream.Seek(start)
Reply(Error, messageError "binary number literal is too large for 32-bit int")
else
Reply(Error, expected "binary digit")
| c, _ ->
if not (isDigit c) then Reply(Error, expectedNumber)
else
stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore
if stream.Skip('.') then
let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun)
if n2 <> 0 then
// we don't parse any exponent, as in the other example
let mutable result = 0.
if System.Double.TryParse(stream.ReadFrom(start),
NumberStyles.AllowDecimalPoint,
invariantCulture,
&result)
then Reply(Decimal(result))
else
stream.Seek(start)
Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue")
else
Reply(Error, expected "digit")
else
let decimalString = stream.ReadFrom(start)
let mutable result = 0
if System.Int32.TryParse(stream.ReadFrom(start),
NumberStyles.None,
invariantCulture,
&result)
then Reply(Numeral(result))
else
stream.Seek(start)
Reply(Error, messageError "decimal number literal is too large for 32-bit int")
While this implementation parses hex and binary numbers without the help of system methods, it eventually delegates the parsing of decimal numbers to the Int32.TryParse and Double.TryParse methods.
As I said: it's messy.