Folding Set to make a new Set - f#

Say I have a type Prop for propositions:
type Prop =
| P of string
| Disjunction of Prop * Prop
| Conjunction of Prop * Prop
| Negation of Prop
Where:
• A "p" representing the atom P,
• Disjunction(A "P", A "q") representing the proposition P ∨ q.
• Conjunction(A "P", A "q") representing the proposition P ∧ q.
• Negation(A "P") representing the proposition ¬P.
I'm supposed to use a set-based representation of formulas in disjunctive normal form. Since conjunction is commutative, associative and (a ∧ a) is equivalent to a it is convenient to represent a basic conjunct bc by its set of literals litOf(bc).
bc is defined as: A literal is an atom or the negation of an atom and a basic conjunct is a conjunction of literals
This leads me to the function for litOf:
let litOf bc =
Set.fold (fun acc (Con(x, y)) -> Set.add (x, y) acc) Set.empty bc
I'm pretty sure my litOf is wrong, and I get an error on the (Con(x,y)) part saying: "Incomplete pattern m
atches on this expression. For example, the value 'Dis (_, _)' may indicate a cas
e not covered by the pattern(s).", which I also have no clue what actually means in this context.
Any hints to how I can procede?

I assume your example type Prop changed on the way from keyboard to here, and orginally looked like this:
type Prop =
| P of string
| Dis of Prop * Prop
| Con of Prop * Prop
| Neg of Prop
There are several things that tripped you up:
Set.fold operates on input that is a set, and does something for each element in the set. In your case, the input is a boolean clause, and the output is a set.
You did not fully define what constitutes a literal. For a conjunction, the set of literals is the union of the literals on the left and on the right side. But what about a disjunction? The compiler error message means exactly that.
Here's what I think you are after:
let rec literals = function
| P s -> Set.singleton s
| Dis (x, y) -> Set.union (literals x) (literals y)
| Con (x, y) -> Set.union (literals x) (literals y)
| Neg x -> literals x
With that, you will get
> literals (Dis (P "A", Neg (Con (P "B", Con (P "A", P "C")))))
val it : Set<string> = set ["A"; "B"; "C"]

Related

Is there a way to fix an expression with operators in it after parsing, using a table of associativities and precedences?

I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e​) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.

Taking two streams and combining them in OCaml

I want to take two streams of integers in increasing order and combine them into one stream that contains no duplicates and should be in increasing order. I have defined the functionality for streams in the following manner:
type 'a susp = Susp of (unit -> 'a)
let force (Susp f) = f()
type 'a str = {hd : 'a ; tl : ('a str) susp }
let merge s1 s2 = (* must implement *)
The first function suspends computation by wrapping a computation within a function, and the second function evaluates the function and provides me with the result of the computation.
I want to emulate the logic of how you go about combining lists, i.e. match on both lists and check which elements are greater, lesser, or equal and then append (cons) the integers such that the resulting list is sorted.
However, I know I cannot just do this with streams of course as I cannot traverse it like a list, so I think I would need to go integer by integer, compare, and then suspend the computation and keep doing this to build the resulting stream.
I am at a bit of a loss how to implement such logic however, assuming it is how I should be going about this, so if somebody could point me in the right direction that would be great.
Thank you!
If the the input sequences are sorted, there is not much difference between merging lists and sequences. Consider the following merge function on lists:
let rec merge s t =
match s, t with
| x :: s , [] | [], x :: s -> x :: s
| [], [] -> s
| x :: s', y :: t' ->
if x < y then
x :: (merge s' t)
else if x = y then
x :: (merge s' t')
else
y :: (merge s t')
This function is only using two properties of lists:
the ability to split the potential first element from the rest of the list
the ability to add an element to the front of the list
This suggests that we could rewrite this function as a functor over the signature
module type seq = sig
type 'a t
(* if the seq is non-empty we split the seq into head and tail *)
val next: 'a t -> ('a * 'a t) option
(* add back to the front *)
val cons: 'a -> 'a t -> 'a t
end
Then if we replace the pattern matching on the list with a call to next, and the cons operation with a call to cons, the previous function is transformed into:
module Merge(Any_seq: seq ) = struct
open Any_seq
let rec merge s t =
match next s, next t with
| Some(x,s), None | None, Some (x,s) ->
cons x s
| None, None -> s
| Some (x,s'), Some (y,t') ->
if x < y then
cons x (merge s' t)
else if x = y then
cons x (merge s' t')
else
cons y (merge s t')
end
Then, with list, our implementation was:
module List_core = struct
type 'a t = 'a list
let cons = List.cons
let next = function
| [] -> None
| a :: q -> Some(a,q)
end
module List_implem = Merge(List_core)
which can be tested with
let test = List_implem.merge [1;5;6] [2;4;9]
Implementing the same function for your stream type is then just a matter of writing a similar Stream_core module for stream.

How to avoid stack overflow during CPS conversion?

I'm writing a transformation from Scheme subset to CPS language. It is implemented in F#. On big input programs conversion fails by stack overflow.
I'm using some sort of algorithm described in the paper Compiling with Continuations.
I've tried to increase maximum stack size of the working thread up to 50 MB, then it works.
Maybe there some way to modify the algorithm, so that I won't need to tune stack size?
For example, the algorithm transforms
(foo (bar 1) (bar 2))
to
(let ((c1 (cont (r1)
(let ((c2 (cont (r2)
(foo halt r1 r2))))
(bar c2 2)))))
(bar c1 1))
where halt is a final continuation which finishes the program.
Maybe your actual problems has simple solutions to avoid heavy stack consumption, so please don't mind adding details. However, without more knowledge about your particular code, here is a general approach to reduce the stack consumption in a recursive programs, based on trampolines and continuations.
Walker
Here is a typical recursive function that is not trivially tail-recursive, written in Common Lisp because I don't know F#:
(defun walk (form transform join)
(typecase form
(cons (funcall join
(walk (car form) transform join)
(walk (cdr form) transform join)))
(t (funcall transform form))))
The code is however quite simple, hopefully, and walks a tree made of cons cells:
if the form is a cons-cell, recursively walk on the car (resp. cdr) and join the results
Otherwise, apply a transform on the value
For example:
(walk '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b))))
=> (3 2 (2 1) 0)
The code walks the form, and retain only numbers, but preserves (non-empty) nesting.
Calling trace on walk with the above example shows a maximal depth of 8 nested calls.
Continuations and trampoline
Here is an adapted version, called
walk/then, that walks a form as previously, and when a result is
available, calls then on it. Here then is a continuation.
The function also returns a thunk, i.e. a parameterless closure.
What happens is that when we return the closure, the stack is unwound,
and when we apply the thunk it will
start from a fresh stack, but having advanced in the computation
(I usually picture someone walking up an escalator that goes down).
The fact that we return a thunk to reduce the number of stack frames is part of the trampoline.
The then function takes a value, namely
the result that the current walk eventually will return.
The result is thus passed down the stack, and what is
returned at each step is a thunk function.
Nesting continuations allows to capture the complex behaviour of transform/join, by pushing the remaining parts of the computation in nested continuations.
(defun walk/then (form transform join then)
(typecase form
(cons (lambda ()
(walk/then (car form) transform join
(lambda (v)
(walk/then (cdr form) transform join
(lambda (w)
(funcall then (funcall join v w))))))))
(t (funcall then (funcall transform form)))))
For example, (walk/then (car form) transform join (lambda (v) ...)) reads as follows: walk the car of form with
arguments transform and join, and eventually call (lambda (v) ...) on the result; namely, walk down the cdr, and then join both results; eventually, call the input then on the joined result.
What is missing is a way to continually call the returned thunk until exhaustion; here is it
with a loop, but this could easily be a tail-recursive function:
(loop for res =
(walk/then '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b)))
#'identity)
then (typecase res (function (funcall res)) (t res))
while (functionp res)
finally (return res))
The above returns (3 2 (2 1) 0), and the depth of the trace never goes over 2 when tracing walk/then.
See Eli Bendersky's article for another take at this, in Python.
I've converted algorithm to trampoline form. It looks like FSM.
There is a loop, which looks at the current state, makes some manipulations, and goes to another state. Also it uses two stacks for different kind of continuations.
Here is input language (it is a subset of the language I used originally) :
// Input language consists of only variables and function applications
type Expr =
| Var of string
| App of Expr * Expr list
Here is target language:
// CPS form - each function gets a continuation,
// added continuation definitions and continuation applications
type Norm =
| LetCont of name : string * args : string list * body : Norm * inner : Norm
| FuncCall of func : string * cont : string * args : string list
| ContCall of cont : string * args : string list
Here is original algorithm:
// Usual way to make CPS conversion.
let rec transform expr cont =
match expr with
| App(func, args) ->
transformMany (func :: args) (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
LetCont(c, [r], cont r, FuncCall(func', c, args')))
| Var(v) -> cont v
and transformMany exprs cont =
match exprs with
| e :: rest ->
transform e (fun e' ->
transformMany rest (fun rest' ->
cont (e' :: rest')))
| _ -> cont []
let transformTop expr =
transform expr (fun var -> ContCall("halt", [var]))
Here is modified version:
type Action =
| ContinuationVar of Expr * (string -> Action)
| ContinuationExpr of string * (Norm -> Action)
| TransformMany of string list * Expr list * (string list -> Action)
| Result of Norm
| Variable of string
// Make one action at time and return to top loop
let rec transform2 expr =
match expr with
| App(func, args) ->
TransformMany([], func :: args, (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
ContinuationExpr(r, fun expr ->
Result(LetCont(c, [r], expr, FuncCall(func', c, args'))))))
| Var(v) -> Variable(v)
// We have two stacks here:
// contsVar for continuations accepting variables
// contsExpr for continuations accepting expressions
let transformTop2 expr =
let rec loop contsVar contsExpr action =
match action with
| ContinuationVar(expr, cont) ->
loop (cont :: contsVar) contsExpr (transform2 expr)
| ContinuationExpr(var, contExpr) ->
let contVar = List.head contsVar
let contsVar' = List.tail contsVar
loop contsVar' (contExpr :: contsExpr) (contVar var)
| TransformMany(vars, e :: exprs, cont) ->
loop contsVar contsExpr (ContinuationVar(e, fun var ->
TransformMany(var :: vars, exprs, cont)))
| TransformMany(vars, [], cont) ->
loop contsVar contsExpr (cont (List.rev vars))
| Result(r) ->
match contsExpr with
| cont :: rest -> loop contsVar rest (cont r)
| _ -> r
| Variable(v) ->
match contsVar with
| cont :: rest -> loop rest contsExpr (cont v)
| _ -> failwith "must not be empty"
let initial = ContinuationVar(expr, fun var -> Result(ContCall("halt", [var])))
loop [] [] initial

Packrat parsing (memoization via laziness) in OCaml

I'm implementing a packrat parser in OCaml, as per the Master Thesis by B. Ford. My parser should receive a data structure that represents the grammar of a language and parse given sequences of symbols.
I'm stuck with the memoization part. The original thesis uses Haskell's lazy evaluation to accomplish linear time complexity. I want to do this (memoization via laziness) in OCaml, but don't know how to do it.
So, how do you memoize functions by lazy evaluations in OCaml?
EDIT: I know what lazy evaluation is and how to exploit it in OCaml. The question is how to use it to memoize functions.
EDIT: The data structure I wrote that represents grammars is:
type ('a, 'b, 'c) expr =
| Empty of 'c
| Term of 'a * ('a -> 'c)
| NTerm of 'b
| Juxta of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr * ('c -> 'c -> 'c)
| Alter of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr
| Pred of ('a, 'b, 'c) expr * 'c
| NPred of ('a, 'b, 'c) expr * 'c
type ('a, 'b, 'c) grammar = ('a * ('a, 'b, 'c) expr) list
The (not-memoized) function that parse a list of symbols is:
let rec parse g v xs = parse' g (List.assoc v g) xs
and parse' g e xs =
match e with
| Empty y -> Parsed (y, xs)
| Term (x, f) ->
begin
match xs with
| x' :: xs when x = x' -> Parsed (f x, xs)
| _ -> NoParse
end
| NTerm v' -> parse g v' xs
| Juxta (e1, e2, f) ->
begin
match parse' g e1 xs with
| Parsed (y, xs) ->
begin
match parse' g e2 xs with
| Parsed (y', xs) -> Parsed (f y y', xs)
| p -> p
end
| p -> p
end
( and so on )
where the type of the return value of parse is defined by
type ('a, 'c) result = Parsed of 'c * ('a list) | NoParse
For example, the grammar of basic arithmetic expressions can be specified as g, in:
type nt = Add | Mult | Prim | Dec | Expr
let zero _ = 0
let g =
[(Expr, Juxta (NTerm Add, Term ('$', zero), fun x _ -> x));
(Add, Alter (Juxta (NTerm Mult, Juxta (Term ('+', zero), NTerm Add, fun _ x -> x), (+)), NTerm Mult));
(Mult, Alter (Juxta (NTerm Prim, Juxta (Term ('*', zero), NTerm Mult, fun _ x -> x), ( * )), NTerm Prim));
(Prim, Alter (Juxta (Term ('<', zero), Juxta (NTerm Dec, Term ('>', zero), fun x _ -> x), fun _ x -> x), NTerm Dec));
(Dec, List.fold_left (fun acc d -> Alter (Term (d, (fun c -> int_of_char c - 48)), acc)) (Term ('0', zero)) ['1';'2';'3';])]
The idea of using lazyness for memoization is use not functions, but data structures, for memoization. Lazyness means that when you write let x = foo in some_expr, foo will not be evaluated immediately, but only as far as some_expr needs it, but that different occurences of xin some_expr will share the same trunk: as soon as one of them force computation, the result is available to all of them.
This does not work for functions: if you write let f x = foo in some_expr, and call f several times in some_expr, well, each call will be evaluated independently, there is not a shared thunk to store the results.
So you can get memoization by using a data structure instead of a function. Typically, this is done using an associative data structure: instead of computing a a -> b function, you compute a Table a b, where Table is some map from the arguments to the results. One example is this Haskell presentation of fibonacci:
fib n = fibTable !! n
fibTable = [0,1] ++ map (\n -> fib (n - 1) + fib (n - 2)) [2..]
(You can also write that with tail and zip, but this doesn't make the point clearer.)
See that you do not memoize a function, but a list: it is the list fibTable that does the memoization. You can write this in OCaml as well, for example using the LazyList module of the Batteries library:
open Batteries
module LL = LazyList
let from_2 = LL.seq 2 ((+) 1) (fun _ -> true)
let rec fib n = LL.at fib_table (n - 1) + LL.at fib_table (n - 2)
and fib_table = lazy (LL.Cons (0, LL.cons 1 <| LL.map fib from_2))
However, there is little interest in doing so: as you have seen in the example above, OCaml does not particularly favor call-by-need evaluation -- it's reasonable to use, but not terribly convenient as it was forced to be in Haskell. It is actually equally simple to directly write the cache structure by direct mutation:
open Batteries
let fib =
let fib_table = DynArray.of_list [0; 1] in
let get_fib n = DynArray.get fib_table n in
fun n ->
for i = DynArray.length fib_table to n do
DynArray.add fib_table (get_fib (i - 1) + get_fib (i - 2))
done;
get_fib n
This example may be ill-chosen, because you need a dynamic structure to store the cache. In the packrat parser case, you're tabulating parsing on a known input text, so you can use plain arrays (indexed by the grammar rules): you would have an array of ('a, 'c) result option for each rule, of the size of the input length and initialized to None. Eg. juxta.(n) represents the result of trying the rule Juxta from input position n, or None if this has not yet been tried.
Lazyness is a nice way to present this kind of memoization, but is not always expressive enough: if you need, say, to partially free some part of your result cache to lower memory usage, you will have difficulties if you started from a lazy presentation. See this blog post for a remark on this.
Why do you want to memoize functions? What you want to memoize is, I believe, the parsing result for a given (parsing) expression and a given position in the input stream. You could for instance use Ocaml's Hashtables for that.
The lazy keyword.
Here you can find some great examples.
If it fits your use case, you can also use OCaml streams instead of manually generating thunks.

Getting values from a union case

i have something like
type A =
| X of string
| Y of int
i have a sequence of X types, [X "foo"; X "boo"; X "moo"]
is there a shortcut for doing a map to convert it to ["foo"; "boo"; "moo"] without doing a match?
Thanks!
I wouldn't generally use a solution that doesn't handle all cases of the pattern matching (e.g. when using fun (X str) -> .... It is always a good idea to add handler, even if it just reported a more informative error (such as, that the case was unexpected because it was filtered before).
You can extend kvb's solution using the function syntax (which is like fun with pattern matching):
List.map (function (X s) -> s | _ -> failwith "Unexpected case")
Alternatively, if you wanted to ignore Y values (so that [X "a"; Y 1; X "b"] becomes ["a"; "b"]) you can use List.choose function:
List.choose (function (X s) -> Some s | _ -> None)
To write this using list comprehensions, you'd need to use full-blown pattern matching using match, so it would be a bit longer than using higher-order functions.
You can use this:
List.map (fun (X s) -> s)
This does perform a match behind the scenes, but the syntax is nicer. You'll get a warning since the pattern is partial (that is, the function clearly won't be able to handle Y _ instances).
Just for grins, you could do this:
let get<'T> = function
| X s -> box s :?> 'T
| Y i -> box i :?> 'T
To get a list of X or Y, but not both.
[X "foo"; X "boo"; X "moo"] |> List.map get<string>
[Y 0; Y 1; Y 2] |> List.map get<int>
[Y 0; X "boo"; Y 2] |> List.map get<int> //oh snap!
If you're okay with boxed values, this works for mixed lists:
let get = function
| X s -> box s
| Y i -> box i
[Y 0; X "boo"; Y 2] |> List.map get //no problem
I'm assuming you're interested in unorthodox solutions since you're asking how to avoid pattern matching. ;-)
Here is another general-purpose, yet baroque, solution.
Similar to #kvb's solution, you can use a list comprehension with an incomplete pattern match:
let xl = [X "foo"; X "boo"; X "moo"]
[for X s in xl -> s]
You must match. If you do that a lot, define functions getX and getY:
let getX (X v) = v
let getY (Y v) = v
But be aware that they will raise exceptions if given a value of the wrong case.

Resources