FParsec: Keeping line and column numbers - f#

What is the best way to extract the line and column numbers from a given parser so they can be added to an AST, for example?
Thanks!

You can use getPosition, which is a parser that consumes no input and returns the current position. For example:
type WithPos<'T> = { value: 'T; start: Position; finish: Position }
module Position =
/// Get the previous position on the same line.
let leftOf (p: Position) =
if p.Column > 1L then
Position(p.StreamName, p.Index - 1L, p.Line, p.Column - 1L)
else
p
/// Wrap a parser to include the position
let withPos (p: Parser<'T, 'U>) : Parser<WithPos<'T>, 'U> =
// Get the position before and after parsing
pipe3 getPosition p getPosition <| fun start value finish ->
{
value = value
start = start
finish = Position.leftOf finish
}
// Example use:
let s = pstring "test" |> withPos
printfn "%A" <| runParserOnString s () "" "test"
// Prints:
// Success: {value = "test";
// start = (Ln: 1, Col: 1);
// finish = (Ln: 1, Col: 4);}

Related

fparsec - limit number of characters that a parser is applied to

I have a problem where during the parsing of a stream I get to point where the next N characters need to be parsed by applying a specfic parser multiple times (in sequence).
(stripped down toy) Example:
17<tag><anothertag><a42...
^
|- I'm here
Let's say the 17 indicates that the next N=17 characters make up tags, so I need to repetetively apply my "tagParser" but stop after 17 chars and not consume the rest even if it looks like a tag because that has a different meaning and will be parsed by another parser.
I cannot use many or many1 because that would eat the stream beyond those N characters.
Nor can I use parray because I do not know how many successful applications of that parser are there within the N characters.
I was looking into manyMinMaxSatisfy but could not figure out how to make use of it in this case.
Is there a way to cut N chars of a stream and feed them to some parser? Or is there a way to invoke many applications but up to N chars?
Thanks.
You can use getPosition to make sure you don't go past the specified number of characters. I threw this together (using F# 6) and it seems to work, although simpler/faster solutions may be possible:
let manyLimit nChars p =
parse {
let! startPos = getPosition
let rec loop values =
parse {
let! curPos = getPosition
let nRemain = (startPos.Index + nChars) - curPos.Index
if nRemain = 0 then
return values
elif nRemain > 0 then
let! value = p
return! loop (value :: values)
else
return! fail $"limit exceeded by {-nRemain} chars"
}
let! values = loop []
return values |> List.rev
}
Test code:
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let parser =
parse {
let! nChars = pint64
let! tags = manyLimit nChars ptag
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"
Output is:
Success: (["tag"; "anothertag"], "<a42...")
Quite low-level parser, that operates on raw Reply objects. It reads count of chars, creates substring to feed to tags parser and consumes rest. There's should be an easier way, but I don't have much experience with FParsec
open FParsec
type Tag = Tag of string
let pTag = // parses tag string and constructs 'Tag' object
skipChar '<' >>. many1Satisfy isLetter .>> skipChar '>'
|>> Tag
let pCountPrefixedTags stream =
let count = pint32 stream // read chars count
if count.Status = Ok then
let count = count.Result
// take exactly 'count' chars
let tags = manyMinMaxSatisfy count count (fun _ -> true) stream
if tags.Status = Ok then
// parse substring with tags
let res = run (many1 pTag) tags.Result
match res with
| Success (res, _, _) -> Reply(res)
| Failure (_, error, _) -> Reply(ReplyStatus.Error, error.Messages)
else
Reply(tags.Status, tags.Error)
else
Reply(count.Status, count.Error)
let consumeStream =
many1Satisfy (fun _ -> true)
run (pCountPrefixedTags .>>. consumeStream) "17<tag><anothertag><notTag..."
|> printfn "%A" // Success: ([Tag "tag"; Tag "anothertag"], "<notTag...")
You also can do this without going down to stream level.
open FParsec
let ptag =
between
(skipChar '<')
(skipChar '>')
(manySatisfy (fun c -> c <> '>'))
let tagsFromChars (l: char[]) =
let s = new System.String(l)
match run (many ptag) s with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> []
let parser =
parse {
let! nChars = pint32
let! tags = parray nChars anyChar |>> tagsFromChars
let! rest = restOfLine true
return tags, rest
}
run parser "17<tag><anothertag><a42..."
|> printfn "%A"

Indentation has the last word on error messages (UserState) - FParsec

I have a small indentation management module with FParsec (found here); it works wonderfully well, but the only concern is that, when an error is encountered in the stream to be parsed, most of the time, FParsec returns the error message from the indentation manager, i.e. the UserState (correct me if I'm wrong on this point); which is problematic because it makes the errors very blurry, and all the same... How can I display indentation errors only when they are necessary?
Here is the module used for indentation:
module IndentParser
open FParsec
type Indentation =
| Fail
| Any
| Greater of Position
| Exact of Position
| AtLeast of Position
| StartIndent of Position
with
member this.Position = match this with
| Any | Fail -> None
| Greater p -> Some p
| Exact p -> Some p
| AtLeast p -> Some p
| StartIndent p -> Some p
type IndentState<'T> = { Indent : Indentation; UserState : 'T }
type CharStream<'T> = FParsec.CharStream<IndentState<'T>>
type IndentParser<'T, 'UserState> = Parser<'T, IndentState<'UserState>>
let indentState u = {Indent = Any; UserState = u}
let runParser p u s = runParserOnString p (indentState u) "" s
let runParserOnFile p u path = runParserOnFile p (indentState u) path System.Text.Encoding.UTF8
let getIndentation : IndentParser<_,_> =
fun stream -> match stream.UserState with
| {Indent = i} -> Reply i
let getUserState : IndentParser<_,_> =
fun stream -> match stream.UserState with
| {UserState = u} -> Reply u
let putIndentation newi : IndentParser<unit, _> =
fun stream ->
stream.UserState <- {stream.UserState with Indent = newi}
Reply(Unchecked.defaultof<unit>)
let failf fmt = fail << sprintf fmt
let acceptable i (pos : Position) =
match i with
| Any _ -> true
| Fail -> false
| Greater bp -> bp.Column < pos.Column
| Exact ep -> ep.Column = pos.Column
| AtLeast ap -> ap.Column <= pos.Column
| StartIndent _ -> true
let nestableIn i o =
match i, o with
| Greater i, Greater o -> o.Column < i.Column
| Greater i, Exact o -> o.Column < i.Column
| Exact i, Exact o -> o.Column = i.Column
| Exact i, Greater o -> o.Column <= i.Column
| _, _ -> true
let tokeniser p = parse {
let! pos = getPosition
let! i = getIndentation
if acceptable i pos then return! p
else return! fail "incorrect indentation"
}
let nestP i o p = parse {
do! putIndentation i
let! x = p
do! notFollowedBy (tokeniser anyChar) <?> (sprintf "unterminated %A" i)
do! putIndentation o
return x
}
let nest indentor p = parse {
let! outerI = getIndentation
let! curPos = getPosition
let innerI = indentor curPos
if nestableIn innerI outerI
then return! nestP innerI outerI p
else return! nestP Fail outerI p
}
let nestWithPos indentor pos p = parse {
let! outerI = getIndentation
let innerI = indentor pos
if nestableIn innerI outerI
then return! nestP innerI outerI p
else return! nestP Fail outerI p
}
let neglectIndent p = parse {
let! o = getIndentation
do! putIndentation Any
let! x = p
do! putIndentation o
return x
}
let checkIndent<'u> : IndentParser<unit, 'u> = tokeniser (preturn ())
let indented<'a,'u> i (p : Parser<'a,_>) : IndentParser<_, 'u> = parse {
do! putIndentation i
do! spaces
return! tokeniser p
}
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is on the same line as the defined position (`pos`).
let exact<'a,'u> pos p: IndentParser<'a, 'u> = indented (Exact pos) p
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is further away than the defined position (`pos`).
let greater<'a,'u> pos p: IndentParser<'a, 'u> = indented (Greater pos) p
/// Allows to check if the position of the parser currently being analyzed (`p`)
/// is on the same OR line further than the defined position (`pos`).
let atLeast<'a,'u> pos p: IndentParser<'a, 'u> = indented (AtLeast pos) p
/// Simply check if the parser (`p`) exists, regardless of its position in the text to be analyzed.
let any<'a,'u> pos p: IndentParser<'a, 'u> = indented Any p
let newline<'u> : IndentParser<unit, 'u> = many (skipAnyOf " \t" <?> "whitespace") >>. newline |>> ignore
let rec blockOf p = parse {
do! spaces
let! pos = getPosition
let! x = exact pos p
let! xs = attempt (exact pos <| blockOf p) <|> preturn []
return x::xs
}
and here is an example of the problem encountered:
open FParsec
open IndentParser
// ---------- AST ----------
type Statement
= Let of string * Expr
and Expr
= Tuple of Expr list
| Literal of Literal
and Literal
= Int of int
| Float of float
| Char of char
// ---------- Parser ----------
let inline pstr's s = stringReturn s s <?> sprintf "`%s`" s
let inline pstr'u s = stringReturn s () <?> sprintf "`%s`" s
let identifier = manySatisfy (fun c -> isLetter c || c = ''')
let comment = pstr'u "//" >>. skipRestOfLine true <?> ""
let numberFormat =
NumberLiteralOptions.AllowBinary
||| NumberLiteralOptions.AllowMinusSign
||| NumberLiteralOptions.AllowHexadecimal
||| NumberLiteralOptions.AllowOctal
||| NumberLiteralOptions.AllowPlusSign
||| NumberLiteralOptions.AllowFraction
let number<'u> : IndentParser<Literal, 'u> =
(numberLiteral numberFormat "number" |>> fun nl ->
if nl.IsInteger then Int(int nl.String)
else Float(float nl.String))
let char<'u> : IndentParser<Literal, 'u> =
((between (pstr'u "'") (pstr'u "'")
(satisfy (fun c -> c <> '\'')) <?> "char literal") |>> Char)
let rec let'parser =
parse { let! pos = getPosition
do! exact pos (pstr'u "let" <?> "let statement")
let! name = greater pos identifier <?> "identifier"
do! greater pos (pstr'u "=" <?> "value assignment")
let! value = greater pos expression
return Let(name, value) }
and tuple'parser =
parse { let! pos = getPosition
do! exact pos (pstr'u "(" <?> "tuple")
let! uplets = greater pos (sepBy1 expression (pstr'u ","))
do! greater pos (pstr'u ")" <?> "right parenthese")
return Tuple uplets }
and literal'parser = attempt number <|> char |>> Literal
and expression =
spaces >>? (attempt tuple'parser <|> literal'parser)
and statement = spaces >>? let'parser .>>? spaces .>>? (attempt comment <|> (spaces >>% ()))
// ---------- Test ----------
System.Console.Clear()
let res = runParser (spaces >>? blockOf statement .>>? (spaces .>>? eof)) () #"
let foo = (0, 1) // it works well
let bar = 887 // it works well
let oof = 'x' // it works well
let rab = // it fail with 'incorrect indentation' (without this comment)
let ofo = (0, 2, // it fail with 'incorrect indentation' (without this comment)
"
printfn "%A" res
It's really annoying...
Would someone explain to me how to solve this problem?

How to execute specific functions with input pattern in F# language

I'm kinda new to F# and trying out a simple calculator app. I take input from the user, and I want to the specific functions to be executed as per the input.
Right now, whenever I take any input from user, the program executes top to bottom. I want it to execute only specific functions matching with the input. Like if input is 6 then body of scientificFun() should be executed. Right now it executes all functions. Please help, I'm kinda stuck on this one!
The code is
open System
let mutable ok = true
while ok do
Console.WriteLine("Choose a operation:\n1.Addition\n2.Substraction\n3.Multiplication\n4.Division\n5.Modulo\n6.Scientific")
let input= Console.ReadLine()
let add =
Console.WriteLine("Ok, how many numbers?")
let mutable count = int32(Console.ReadLine())
let numberArray = Array.create count 0.0
for i in 0 .. numberArray.Length - 1 do
let no = float(Console.ReadLine())
Array.set numberArray i no
Array.sum numberArray
let sub x y = x - y
let mul x y = x * y
let div x y = x / y
let MOD x y = x % y
let scientificFun() =
printfn("1. Exponential")
match input with
| "1" -> printfn("The Result is: %f") (add)
| "6" -> (scientificFun())
| _-> printfn("Choose between 1 and 6")
Console.WriteLine("Would you like to use the calculator again? y/n")
let ans = Console.ReadLine()
if ans = "n" then
ok <- false
else Console.Clear()
You should define add as function: let add() = or let add inputNumbers =
Otherwise this simplified version below only executes the functions corresponding to the input number:
open System
[<EntryPoint>]
let main argv =
// define your functions
let hellofun() =
printfn "%A" "hello"
let worldfun() =
printfn "%A" "world"
let mutable cont = true
let run() = // set up the while loop
while cont do
printfn "%A" "\nChoose an operation:\n 1 hellofunc\n 2 worldfunc\n 3 exit"
let input = Console.ReadLine() // get the input
match input with // pattern match on the input to call the correct function
| "1" -> hellofun()
| "2" -> worldfun()
| "3" -> cont <- false;()
| _ -> failwith "Unknown input"
run() // kick-off the loop
0
The [<EntryPoint>] let main argv = is only necessary if you compile it. Otherwise just execute run().

Use FParsec to parse a self-describing input

I'm using FParsec to parse an input that describes its own format. For example, consider this input:
int,str,int:4,'hello',3
The first part of the input (before the colon) describes the format of the second part of the input. In this case, the format is int, str, int, which means that the actual data consists of three comma-separated values of the given types, so the result should be 4, "hello", 3.
What is the best way to parse something like this with FParsec?
I've pasted my best effort below, but I'm not happy with it. Is there a better way to do this that is cleaner, less stateful, and less reliant on the parse monad? I think this depends on smarter management of UserState, but I don't know how to do it. Thanks.
open FParsec
type State = { Formats : string[]; Index : int32 }
with static member Default = { Formats = [||]; Index = 0 }
type Value =
| Integer of int
| String of string
let parseFormat : Parser<_, State> =
parse {
let! formats =
sepBy
(pstring "int" <|> pstring "str")
(skipString ",")
|>> Array.ofList
do! updateUserState (fun state -> { state with Formats = formats })
}
let parseValue format =
match format with
| "int" -> pint32 |>> Integer
| "str" ->
between
(skipString "'")
(skipString "'")
(manySatisfy (fun c -> c <> '\''))
|>> String
| _ -> failwith "Unexpected"
let parseValueByState =
parse {
let! state = getUserState
let format = state.Formats.[state.Index]
do! setUserState { state with Index = state.Index + 1}
return! parseValue format
}
let parseData =
sepBy
parseValueByState
(skipString ",")
let parse =
parseFormat
>>. skipString ":"
>>. parseData
[<EntryPoint>]
let main argv =
let result = runParserOnString parse State.Default "" "int,str,int:4,'hello',3"
printfn "%A" result
0
There seem to be several problems with the original code, so I took my liberty to rewrite it from scratch.
First, several library functions that may appear useful in other FParsec-related projects:
/// Simple Map
/// usage: let z = Map ["hello" => 1; "bye" => 2]
let (=>) x y = x,y
let makeMap x = new Map<_,_>(x)
/// A handy construct allowing NOT to write lengthy type definitions
/// and also avoid Value Restriction error
type Parser<'t> = Parser<'t, UserState>
/// A list combinator, inspired by FParsec's (>>=) combinator
let (<<+) (p1: Parser<'T list>) (p2: Parser<'T>) =
p1 >>= fun x -> p2 >>= fun y -> preturn (y::x)
/// Runs all parsers listed in the source list;
/// All but the trailing one are also combined with a separator
let allOfSepBy separator parsers : Parser<'T list> =
let rec fold state =
function
| [] -> pzero
| hd::[] -> state <<+ hd
| hd::tl -> fold (state <<+ (hd .>> separator)) tl
fold (preturn []) parsers
|>> List.rev // reverse the list since we appended to the top
Now, the main code. The basic idea is to run parsing in three steps:
Parse out the keys (which are plain ASCII strings)
Map these keys to actual Value parsers
Run these parsers in order
The rest seems to be commented within the code. :)
/// The resulting type
type Output =
| Integer of int
| String of string
/// tag to parser mappings
let mappings =
[
"int" => (pint32 |>> Integer)
"str" => (
manySatisfy (fun c -> c <> '\'')
|> between (skipChar ''') (skipChar ''')
|>> String
)
]
|> makeMap
let myProcess : Parser<Output list> =
let pKeys = // First, we parse out the keys
many1Satisfy isAsciiLower // Parse one key; keys are always ASCII strings
|> sepBy <| (skipChar ',') // many keys separated by comma
.>> (skipChar ':') // all this with trailing semicolon
let pValues = fun keys ->
keys // take the keys list
|> List.map // find the required Value parser
// (NO ERROR CHECK for bad keys)
(fun p -> Map.find p mappings)
|> allOfSepBy (skipChar ',') // they must run in order, comma-separated
pKeys >>= pValues
Run on string: int,int,str,int,str:4,42,'hello',3,'foobar'
Returned: [Integer 4; Integer 42; String "hello"; Integer 3; String "foobar"]
#bytebuster beat me to it but I still post my solution. The technique is similar to #bytebuster.
Thanks for an interesting question.
In compilers I believe the preferred technique is to parse the text into an AST and on that run a type-checker. For this example a potentially simpler technique would be that parsing the type definitions returns a set of parsers for the values. These parsers are then applied on the rest of the string.
open FParsec
type Value =
| Integer of int
| String of string
type ValueParser = Parser<Value, unit>
let parseIntValue : Parser<Value, unit> =
pint32 |>> Integer
let parseStringValue : Parser<Value, unit> =
between
(skipChar '\'')
(skipChar '\'')
(manySatisfy (fun c -> c <> '\''))
<?> "string"
|>> String
let parseValueParser : Parser<ValueParser, unit> =
choice
[
skipString "int" >>% parseIntValue
skipString "str" >>% parseStringValue
]
let parseValueParsers : Parser<ValueParser list, unit> =
sepBy1
parseValueParser
(skipChar ',')
// Runs a list of parsers 'ps' separated by 'sep' parser
let sepByList (ps : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust ps =
match ps with
| [] -> preturn []
| h::t ->
adjust h >>= fun v -> loop (fun pp -> sep >>. pp) t >>= fun vs -> preturn (v::vs)
loop id ps
let parseLine : Parser<Value list, unit> =
parseValueParsers .>> skipChar ':' >>= (fun vps -> sepByList vps (skipChar ',')) .>> eof
[<EntryPoint>]
let main argv =
let s = "int,str,int:4,'hello',3"
let r = run parseLine s
printfn "%A" r
0
Parsing int,str,int:4,'hello',3 yields Success: [Integer 4; String "hello";Integer 3].
Parsing int,str,str:4,'hello',3 (incorrect) yields:
Failure:
Error in Ln: 1 Col: 23
int,str,str:4,'hello',3
^
Expecting: string
I rewrote #FuleSnabel's sepByList as follows to help me understand it better. Does this look right?
let sepByList (parsers : Parser<'T, unit> list) (sep : Parser<unit, unit>) : Parser<'T list, unit> =
let rec loop adjust parsers =
parse {
match parsers with
| [] -> return []
| parser :: tail ->
let! value = adjust parser
let! values = loop (fun parser -> sep >>. parser) tail
return value :: values
}
loop id parsers

how to unwrap union value in list in f#

You know that to unwrap a value of a single union type you have to do this:
type Foo = Foo of int*string
let processFoo foo =
let (Foo (t1,t2)) = foo
printfn "%A %A" t1 t2
but my question is: if there a way to do that for lists ?:
let processFooList (foolist:Foo list ) =
let ??? = foolist // how to get a int*string list
...
thanks.
The best way would be to use a function combined with List.map like so
let processFooList (foolist:Foo list ) = foolist |> List.map (function |Foo(t1,t2)->t1,t2)
There's no predefined active pattern for converting the list from Foo to int * string, but you could combine the Named Pattern §7.2 (deconstruct single case union) with the projection into your own single case Active Pattern §7.2.3.
let asTuple (Foo(t1, t2)) = t1, t2 // extract tuple from single Foo
let (|FooList|) = List.map asTuple // apply to list
Use as function argument:
let processFooList (FooList fooList) = // now you can extract tuples from Foo list
... // fooList is an (int * string) list
Use in let-binding:
let (FooList fooList) =
[ Foo(1, "a"); Foo(2, "b") ]
printfn "%A" fooList // prints [(1, "a"); (2, "b")]
Distilling/summarising/restating/reposting the other two answers, your cited line:
let ??? = foolist // how to get a int*string list
Can become:
let ``???`` = foolist |> List.map (function |Foo(x,y) -> x,y)
If you're writing a transformation, you can do the matching in the params having defined an Active Pattern using either of the following:
let (|FooList|) = List.map <| fun (Foo(t1, t2)) -> t1,t2
let (|FooList|) = List.map <| function |Foo(t1, t2) -> t1,t2
which can then be consumed as follows:
let processFooList (fooList:Foo list ) =
// do something with fooList

Resources