Monadic parse with uu-parsinglib - parsing

I'm trying to create a Monadic parser using uu_parsinglib. I thought I had it covered, but I'm getting some unexpected results in testing
A cut down example of my parser is:
pType :: Parser ASTType
pType = addLength 0 $
do (Amb n_list) <- pName
let r_list = filter attributeFilter n_list
case r_list of
(ASTName_IdName a : [] ) -> return (ASTType a)
(ASTName_TypeName a : [] ) -> return (ASTType a)
_ -> pFail
where nameFilter :: ASTName' -> Bool
nameFilter a =
case a of
(ASTName_IDName _) -> True
(ASTName_TypeName _) -> True
_ -> False
data ASTType = ASTType ASTName
data ASTName = Amb [ASTName']
data ASTName' =
ASTName_IDName ASTName
ASTName_TypeName ASTName
ASTName_OtherName ASTName
ASTName_Simple String
pName is an ambiguous parser. What I want type parser to do is apply a post filter, and return all alternatives that satisfy nameFilter, wrapped as ASTType.
If there are none, it should fail.
(I realise the example I've given will fail if there is more than one valid match in the list, but the example serves its purpose)
Now, this all works as far as I can see. The problem lies when you use it in more complicated Grammars, where odd matches seem to occur. What I suspect is the problem is the addLength 0 part
What I would like to do is separate out the monadic and applicative parts. Create a monadic parser with the filtering component, and then apply pName using the <**> operator.
Alternatively
I'd settle for a really good explanation of what addLength is doing.

I've put together a fudge/workaround to use for monadic parsing with uu-parsinglib. The only way I ever use Monadic parsers is to analysis a overly generous initial parser, and selectively fail its results.
bind' :: Parser a -> (a -> Parser b) -> Parser b
bind' a#(P _ _ _ l') b = let (P t nep e _) = (a >>= b) in P t nep e l'
The important thing to remember when using this parser is that
a -> M b
must consume no input. It must either return a transformed version of a, or fail.
WARNING
Testing on this is only minimal currently, and its behaviour is not enforced by type. It is a fudge.

Related

Pattern match list cons quotation

I am trying to write a generalised condition evaluator, similar to what the Lisp/Scheme people call cond, using quotations because they are the easiest way to get call-by-name semantics. I'm having trouble pattern-matching against the list cons operation, and can't seem to find out exactly how to represent it. Here's what I have so far:
open FSharp.Quotations.Evaluator
open Microsoft.FSharp.Quotations
open Microsoft.FSharp.Quotations.Patterns
let rec cond = function
| NewUnionCase (Cons, [NewTuple [condition; value]; tail]) ->
if QuotationEvaluator.Evaluate <| Expr.Cast(condition)
then QuotationEvaluator.Evaluate <| Expr.Cast(value)
else cond tail
| _ -> raise <| MatchFailureException ("cond", 0, 0
The problem is with the Cons identifier in the first branch of the pattern match--it doesn't exist, and I can't figure out how to represent the list :: data constructor.
What is the correct way to pattern match against the list cons data constructor?
I don't think there is any easy way of writing Cons in the pattern directly, but you can use when clause to check whether the union case is a case named "Cons" of the list<T> type:
let rec cond = function
| NewUnionCase (c, [NewTuple [condition; value]; tail])
when c.Name = "Cons" && c.DeclaringType.IsGenericType &&
c.DeclaringType.GetGenericTypeDefinition() = typedefof<_ list> ->
Some(condition, value, tail)
| _ ->
None

Parsing int or float with FParsec

I'm trying to parse a file, using FParsec, which consists of either float or int values. I'm facing two problems that I can't find a good solution for.
1
Both pint32 and pfloat will successfully parse the same string, but give different answers, e.g pint32 will return 3 when parsing the string "3.0" and pfloat will return 3.0 when parsing the same string. Is it possible to try parsing a floating point value using pint32 and have it fail if the string is "3.0"?
In other words, is there a way to make the following code work:
let parseFloatOrInt lines =
let rec loop intvalues floatvalues lines =
match lines with
| [] -> floatvalues, intvalues
| line::rest ->
match run floatWs line with
| Success (r, _, _) -> loop intvalues (r::floatvalues) rest
| Failure _ ->
match run intWs line with
| Success (r, _, _) -> loop (r::intvalues) floatvalues rest
| Failure _ -> loop intvalues floatvalues rest
loop [] [] lines
This piece of code will correctly place all floating point values in the floatvalues list, but because pfloat returns "3.0" when parsing the string "3", all integer values will also be placed in the floatvalues list.
2
The above code example seems a bit clumsy to me, so I'm guessing there must be a better way to do it. I considered combining them using choice, however both parsers must return the same type for that to work. I guess I could make a discriminated union with one option for float and one for int and convert the output from pint32 and pfloat using the |>> operator. However, I'm wondering if there is a better solution?
You're on the right path thinking about defining domain data and separating definition of parsers and their usage on source data. This seems to be a good approach, because as your real-life project grows further, you would probably need more data types.
Here's how I would write it:
/// The resulting type, or DSL
type MyData =
| IntValue of int
| FloatValue of float
| Error // special case for all parse failures
// Then, let's define individual parsers:
let pMyInt =
pint32
|>> IntValue
// this is an alternative version of float parser.
// it ensures that the value has non-zero fractional part.
// caveat: the naive approach would treat values like 42.0 as integer
let pMyFloat =
pfloat
>>= (fun x -> if x % 1 = 0 then fail "Not a float" else preturn (FloatValue x))
let pError =
// this parser must consume some input,
// otherwise combined with `many` it would hang in a dead loop
skipAnyChar
>>. preturn Error
// Now, the combined parser:
let pCombined =
[ pMyFloat; pMyInt; pError ] // note, future parsers will be added here;
// mind the order as float supersedes the int,
// and Error must be the last
|> List.map (fun p -> p .>> ws) // I'm too lazy to add whitespase skipping
// into each individual parser
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Note, the code above is capable working with any sources: strings, streams, or whatever. Your real app may need to work with files, but unit testing can be simplified by using just string list.
// Now, applying the parser somewhere in the code:
let maybeParseResult =
match run pCombined myStringData with
| Success(result, _, _) -> Some result
| Failure(_, _, _) -> None // or anything that indicates general parse failure
UPD. I have edited the code according to comments. pMyFloat was updated to ensure that the parsed value has non-zero fractional part.
FParsec has the numberLiteral parser that can be used to solve the problem.
As a start you can use the example available at the link above:
open FParsec
open FParsec.Primitives
open FParsec.CharParsers
type Number = Int of int64
| Float of float
// -?[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?
let numberFormat = NumberLiteralOptions.AllowMinusSign
||| NumberLiteralOptions.AllowFraction
||| NumberLiteralOptions.AllowExponent
let pnumber : Parser<Number, unit> =
numberLiteral numberFormat "number"
|>> fun nl ->
if nl.IsInteger then Int (int64 nl.String)
else Float (float nl.String)```

How to add a condition that a parsed number must satisfy in FParsec?

I am trying to parse an int32 with FParsec but have an additional restriction that the number must be less than some maximum value. Is their a way to perform this without writing my own custom parser (as below) and/or is my custom parser (below) the appropriate way of achieving the requirements.
I ask because most of the built-in library functions seem to revolve around a char satisfying certain predicates and not any other type.
let pRow: Parser<int> =
let error = messageError ("int parsed larger than maxRows")
let mutable res = Reply(Error, error)
fun stream ->
let reply = pint32 stream
if reply.Status = Ok && reply.Result <= 1000000 then
res <- reply
res
UPDATE
Below is an attempt at a more fitting FParsec solution based on the direction given in the comment below:
let pRow2: Parser<int> =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
Is this the correct way to do it?
You've done an excellent research and almost answered your own question.
Generally, there are two approaches:
Unconditionally parse out an int and let the further code to check it for validity;
Use a guard rule bound to the parser. In this case (>>=) is the right tool;
In order to make a good choice, ask yourself whether an integer that failed to pass the guard rule has to "give another chance" by triggering another parser?
Here's what I mean. Usually, in real-life projects, parsers are combined in some chains. If one parser fails, the following one is attempted. For example, in this question, some programming language is parsed, so it needs something like:
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
Theoretically, your DSL may need to differentiate a "small int value" from another type:
/// The resulting type, or DSL
type Output =
| SmallValue of int
| LargeValueAndString of int * string
| Comment of string
let pSmallValue =
pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
|>> SmallValue
let pLargeValueAndString =
pint32 .>> ws .>>. (manyTill ws)
|>> LargeValueAndString
let pComment =
manyTill ws
|>> Comment
let pCombined =
[ pSmallValue; pLargeValueAndString; pComment]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Built this way, pCombined will return:
"42 ABC" gets parsed as [ SmallValue 42 ; Comment "ABC" ]
"1234567 ABC" gets parsed as [ LargeValueAndString(1234567, "ABC") ]
As we see, the guard rule impacts how the parsers are applied, so the guard rule has to be within the parsing process.
If, however, you don't need such complication (e.g., an int is parsed unconditionally), your first snippet is just fine.

stopword removal in F#

I am trying to write a code to remove stopwords like "the", "this" in a string list etc.
I wrote this code:
let rec public stopword (a : string list, b :string list) =
match [a.Head] with
|["the"]|["this"] -> stopword (a.Tail, b)
|[] -> b
|_ -> stopword (a.Tail, b#[a.Head])
I ran this in the interactive:
stopword (["this";"is";"the"], []);;
I got this error:
This expression was expected to have type string list but here has type 'a * 'b
Match expressions in F# are very powerful, although the syntax is confusing at first
You need to match the list like so:
let rec stopword a =
match a with
|"the"::t |"this"::t -> stopword t
|h::t ->h::(stopword t)
|[] -> []
The actual error is due to the function expecting a tuple argument. You would have to call the function with:
let result = stopword (["this";"is";"the"], [])
Edit: since the original question was changed, the above answer is not valid anymore; the logical error in the actual function is that you end up with a single element list of which the tail is taken, resulting in an empty list. On the next recursive call the function chokes on trying to get the head of this empty list
The function in itself is not correctly implemented though and much more complicated than necessary.
let isNoStopword (word:string) =
match word with
| "the"|"this" -> false
| _ -> true
let removeStopword (a : string list) =
a |> List.filter(isNoStopword)
let test = removeStopword ["this";"is";"the"]
Others have mentioned the power of pattern matching in this case. In practice, you usually have a set of stopwords you want to remove. And the when guard allows us to pattern match quite naturally:
let rec removeStopwords (stopwords: Set<string>) = function
| x::xs when Set.contains x stopwords -> removeStopwords stopwords xs
| x::xs -> x::(removeStopwords stopwords xs)
| [] -> []
The problem with this function and #John's answer is that they are not tail-recursive. They run out of stack on a long list consisting of a few stopwords. It's a good idea to use high-order functions in List module which are tail-recursive:
let removeStopwords (stopwords: Set<string>) xs =
xs |> List.filter (stopwords.Contains >> not)

How to implement delay in the maybe computation builder?

Here is what I have so far:
type Maybe<'a> = option<'a>
let succeed x = Some(x)
let fail = None
let bind rest p =
match p with
| None -> fail
| Some r -> rest r
let rec whileLoop cond body =
if cond() then
match body() with
| Some() ->
whileLoop cond body
| None ->
fail
else
succeed()
let forLoop (xs : 'T seq) f =
using (xs.GetEnumerator()) (fun it ->
whileLoop
(fun () -> it.MoveNext())
(fun () -> it.Current |> f)
)
whileLoop works fine to support for loops, but I don't see how to get while loops supported. Part of the problem is that the translation of while loops uses delay, which I could not figure out in this case. The obvious implementation below is probably wrong, as it does not delay the computation, but runs it instead!
let delay f = f()
Not having delay also hinders try...with and try...finally.
There are actually two different ways of implementing continuation builders in F#. One is to represent delayed computations using the monadic type (if it supports some way of representing delayed computations, like Async<'T> or the unit -> option<'T> type as shown by kkm.
However, you can also use the flexibility of F# computation expressions and use a different type as a return value of Delay. Then you need to modify the Combine operation accordingly and also implement Run member, but it all works out quite nicely:
type OptionBuilder() =
member x.Bind(v, f) = Option.bind f v
member x.Return(v) = Some v
member x.Zero() = Some ()
member x.Combine(v, f:unit -> _) = Option.bind f v
member x.Delay(f : unit -> 'T) = f
member x.Run(f) = f()
member x.While(cond, f) =
if cond() then x.Bind(f(), fun _ -> x.While(cond, f))
else x.Zero()
let maybe = OptionBuilder()
The trick is that F# compiler uses Delay when you have a computation that needs to be delayed - that is: 1) to wrap the whole computation, 2) when you sequentially compose computations, e.g. using if inside the computation and 3) to delay bodies of while or for.
In the above definition, the Delay member returns unit -> M<'a> instead of M<'a>, but that's perfectly fine because Combine and While take unit -> M<'a> as their second argument. Moreover, by adding Run that evaluates the function, the result of maybe { .. } block (a delayed function) is evaluated, because the whole block is passed to Run:
// As usual, the type of 'res' is 'Option<int>'
let res = maybe {
// The whole body is passed to `Delay` and then to `Run`
let! a = Some 3
let b = ref 0
while !b < 10 do
let! n = Some () // This body will be delayed & passed to While
incr b
if a = 3 then printfn "got 3"
else printfn "got something else"
// Code following `if` is delayed and passed to Combine
return a }
This is a way to define computation builder for non-delayed types that is most likely more efficient than wrapping type inside a function (as in kkm's solution) and it does not require defining a special delayed version of the type.
Note that this problem does not happen in e.g. Haskell, because that is a lazy language, so it does not need to delay computations explicitly. I think that the F# translation is quite elegant as it allows dealing with both types that are delayed (using Delay that returns M<'a>) and types that represent just an immediate result (using Delay that returns a function & Run).
According to monadic identities, your delay should always be equivalent to
let delay f = bind (return ()) f
Since
val bind : M<'T> -> ('T -> M<'R>) -> M<'R>
val return : 'T -> M<'T>
the delay has the signature of
val delay : (unit -> M<'R>) -> M<'R>
'T being type-bound to unit. Note that your bind function has its arguments reversed from the customary order bind p rest. This is technically same but does complicate reading code.
Since you are defining the monadic type as type Maybe<'a> = option<'a>, there is no delaying a computation, as the type does not wrap any computation at all, only a value. So you definition of delay as let delay f = f() is theoretically correct. But it is not adequate for a while loop: the "body" of the loop will be computed before its "test condition," really before the bind is bound. To avoid this, you redefine your monad with an extra layer of delay: instead of wrapping a value, you wrap a computation that takes a unit and computes the value.
type Maybe<'a> = unit -> option<'a>
let return x = fun () -> Some(x)
let fail = fun() -> None
let bind p rest =
match p() with
| None -> fail
| Some r -> rest r
Note that the wrapped computation is not run until inside the bind function, i. e. not run until after the arguments to bind are bound themselves.
With the above expression, delay is correctly simplified to
let delay f = fun () -> f()

Resources