I'm writing a transformation from Scheme subset to CPS language. It is implemented in F#. On big input programs conversion fails by stack overflow.
I'm using some sort of algorithm described in the paper Compiling with Continuations.
I've tried to increase maximum stack size of the working thread up to 50 MB, then it works.
Maybe there some way to modify the algorithm, so that I won't need to tune stack size?
For example, the algorithm transforms
(foo (bar 1) (bar 2))
to
(let ((c1 (cont (r1)
(let ((c2 (cont (r2)
(foo halt r1 r2))))
(bar c2 2)))))
(bar c1 1))
where halt is a final continuation which finishes the program.
Maybe your actual problems has simple solutions to avoid heavy stack consumption, so please don't mind adding details. However, without more knowledge about your particular code, here is a general approach to reduce the stack consumption in a recursive programs, based on trampolines and continuations.
Walker
Here is a typical recursive function that is not trivially tail-recursive, written in Common Lisp because I don't know F#:
(defun walk (form transform join)
(typecase form
(cons (funcall join
(walk (car form) transform join)
(walk (cdr form) transform join)))
(t (funcall transform form))))
The code is however quite simple, hopefully, and walks a tree made of cons cells:
if the form is a cons-cell, recursively walk on the car (resp. cdr) and join the results
Otherwise, apply a transform on the value
For example:
(walk '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b))))
=> (3 2 (2 1) 0)
The code walks the form, and retain only numbers, but preserves (non-empty) nesting.
Calling trace on walk with the above example shows a maximal depth of 8 nested calls.
Continuations and trampoline
Here is an adapted version, called
walk/then, that walks a form as previously, and when a result is
available, calls then on it. Here then is a continuation.
The function also returns a thunk, i.e. a parameterless closure.
What happens is that when we return the closure, the stack is unwound,
and when we apply the thunk it will
start from a fresh stack, but having advanced in the computation
(I usually picture someone walking up an escalator that goes down).
The fact that we return a thunk to reduce the number of stack frames is part of the trampoline.
The then function takes a value, namely
the result that the current walk eventually will return.
The result is thus passed down the stack, and what is
returned at each step is a thunk function.
Nesting continuations allows to capture the complex behaviour of transform/join, by pushing the remaining parts of the computation in nested continuations.
(defun walk/then (form transform join then)
(typecase form
(cons (lambda ()
(walk/then (car form) transform join
(lambda (v)
(walk/then (cdr form) transform join
(lambda (w)
(funcall then (funcall join v w))))))))
(t (funcall then (funcall transform form)))))
For example, (walk/then (car form) transform join (lambda (v) ...)) reads as follows: walk the car of form with
arguments transform and join, and eventually call (lambda (v) ...) on the result; namely, walk down the cdr, and then join both results; eventually, call the input then on the joined result.
What is missing is a way to continually call the returned thunk until exhaustion; here is it
with a loop, but this could easily be a tail-recursive function:
(loop for res =
(walk/then '(a (b c d) 3 2 (a 2 1) 0)
(lambda (u) (and (numberp u) u))
(lambda (a b) (if a (cons a b) (or a b)))
#'identity)
then (typecase res (function (funcall res)) (t res))
while (functionp res)
finally (return res))
The above returns (3 2 (2 1) 0), and the depth of the trace never goes over 2 when tracing walk/then.
See Eli Bendersky's article for another take at this, in Python.
I've converted algorithm to trampoline form. It looks like FSM.
There is a loop, which looks at the current state, makes some manipulations, and goes to another state. Also it uses two stacks for different kind of continuations.
Here is input language (it is a subset of the language I used originally) :
// Input language consists of only variables and function applications
type Expr =
| Var of string
| App of Expr * Expr list
Here is target language:
// CPS form - each function gets a continuation,
// added continuation definitions and continuation applications
type Norm =
| LetCont of name : string * args : string list * body : Norm * inner : Norm
| FuncCall of func : string * cont : string * args : string list
| ContCall of cont : string * args : string list
Here is original algorithm:
// Usual way to make CPS conversion.
let rec transform expr cont =
match expr with
| App(func, args) ->
transformMany (func :: args) (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
LetCont(c, [r], cont r, FuncCall(func', c, args')))
| Var(v) -> cont v
and transformMany exprs cont =
match exprs with
| e :: rest ->
transform e (fun e' ->
transformMany rest (fun rest' ->
cont (e' :: rest')))
| _ -> cont []
let transformTop expr =
transform expr (fun var -> ContCall("halt", [var]))
Here is modified version:
type Action =
| ContinuationVar of Expr * (string -> Action)
| ContinuationExpr of string * (Norm -> Action)
| TransformMany of string list * Expr list * (string list -> Action)
| Result of Norm
| Variable of string
// Make one action at time and return to top loop
let rec transform2 expr =
match expr with
| App(func, args) ->
TransformMany([], func :: args, (fun vars ->
let func' = List.head vars
let args' = List.tail vars
let c = fresh()
let r = fresh()
ContinuationExpr(r, fun expr ->
Result(LetCont(c, [r], expr, FuncCall(func', c, args'))))))
| Var(v) -> Variable(v)
// We have two stacks here:
// contsVar for continuations accepting variables
// contsExpr for continuations accepting expressions
let transformTop2 expr =
let rec loop contsVar contsExpr action =
match action with
| ContinuationVar(expr, cont) ->
loop (cont :: contsVar) contsExpr (transform2 expr)
| ContinuationExpr(var, contExpr) ->
let contVar = List.head contsVar
let contsVar' = List.tail contsVar
loop contsVar' (contExpr :: contsExpr) (contVar var)
| TransformMany(vars, e :: exprs, cont) ->
loop contsVar contsExpr (ContinuationVar(e, fun var ->
TransformMany(var :: vars, exprs, cont)))
| TransformMany(vars, [], cont) ->
loop contsVar contsExpr (cont (List.rev vars))
| Result(r) ->
match contsExpr with
| cont :: rest -> loop contsVar rest (cont r)
| _ -> r
| Variable(v) ->
match contsVar with
| cont :: rest -> loop rest contsExpr (cont v)
| _ -> failwith "must not be empty"
let initial = ContinuationVar(expr, fun var -> Result(ContCall("halt", [var])))
loop [] [] initial
Related
I want to take two streams of integers in increasing order and combine them into one stream that contains no duplicates and should be in increasing order. I have defined the functionality for streams in the following manner:
type 'a susp = Susp of (unit -> 'a)
let force (Susp f) = f()
type 'a str = {hd : 'a ; tl : ('a str) susp }
let merge s1 s2 = (* must implement *)
The first function suspends computation by wrapping a computation within a function, and the second function evaluates the function and provides me with the result of the computation.
I want to emulate the logic of how you go about combining lists, i.e. match on both lists and check which elements are greater, lesser, or equal and then append (cons) the integers such that the resulting list is sorted.
However, I know I cannot just do this with streams of course as I cannot traverse it like a list, so I think I would need to go integer by integer, compare, and then suspend the computation and keep doing this to build the resulting stream.
I am at a bit of a loss how to implement such logic however, assuming it is how I should be going about this, so if somebody could point me in the right direction that would be great.
Thank you!
If the the input sequences are sorted, there is not much difference between merging lists and sequences. Consider the following merge function on lists:
let rec merge s t =
match s, t with
| x :: s , [] | [], x :: s -> x :: s
| [], [] -> s
| x :: s', y :: t' ->
if x < y then
x :: (merge s' t)
else if x = y then
x :: (merge s' t')
else
y :: (merge s t')
This function is only using two properties of lists:
the ability to split the potential first element from the rest of the list
the ability to add an element to the front of the list
This suggests that we could rewrite this function as a functor over the signature
module type seq = sig
type 'a t
(* if the seq is non-empty we split the seq into head and tail *)
val next: 'a t -> ('a * 'a t) option
(* add back to the front *)
val cons: 'a -> 'a t -> 'a t
end
Then if we replace the pattern matching on the list with a call to next, and the cons operation with a call to cons, the previous function is transformed into:
module Merge(Any_seq: seq ) = struct
open Any_seq
let rec merge s t =
match next s, next t with
| Some(x,s), None | None, Some (x,s) ->
cons x s
| None, None -> s
| Some (x,s'), Some (y,t') ->
if x < y then
cons x (merge s' t)
else if x = y then
cons x (merge s' t')
else
cons y (merge s t')
end
Then, with list, our implementation was:
module List_core = struct
type 'a t = 'a list
let cons = List.cons
let next = function
| [] -> None
| a :: q -> Some(a,q)
end
module List_implem = Merge(List_core)
which can be tested with
let test = List_implem.merge [1;5;6] [2;4;9]
Implementing the same function for your stream type is then just a matter of writing a similar Stream_core module for stream.
It is powerful technique using recursion because its strong describable feature. Tail recursion provides more powerful computation than normal recursion because it changes recursion into iteration. Continuation-Passing Style (CPS) can change lots of loop codes into tail recursion. Continuation Monad provides recursion syntax but in essence it is tail recursion, which is iteration. It is supposed to reasonable use Continuation Monad for 100000 factorial. Here is the code.
type ContinuationBuilder() =
member b.Bind(x, f) = fun k -> x (fun x -> f x k)
member b.Return x = fun k -> k x
member b.ReturnFrom x = x
(*
type ContinuationBuilder =
class
new : unit -> ContinuationBuilder
member Bind : x:(('d -> 'e) -> 'f) * f:('d -> 'g -> 'e) -> ('g -> 'f)
member Return : x:'b -> (('b -> 'c) -> 'c)
member ReturnFrom : x:'a -> 'a
end
*)
let cont = ContinuationBuilder()
//val cont : ContinuationBuilder
let fac n =
let rec loop n =
cont {
match n with
| n when n = 0I -> return 1I
| _ -> let! x = fun f -> f n
let! y = loop (n - 1I)
return x * y
}
loop n (fun x -> x)
let x2 = fac 100000I
There is wrong message: "Process is terminated due to StackOverflowException."
What is wrong with 100000 factorial using ContinuationMonad?
You need to compile the project in Release mode or check the "Generate tail calls" option in project properties (or use --tailcalls+ if you're running the compiler via command line).
By default, tail call optimization is not enabled in Debug mode. The reason is that, if tail-calls are enabled, you will not see as useful information about stack traces. So, disabling them by default gives you more pleasant debugging experience (even in Debug mode, the compiler optimizes tail-recursive functions that call themselves, which handles most situations).
You probably need to add this memeber to your monad builder:
member this.Delay(mk) = fun c -> mk () c
I'm implementing a packrat parser in OCaml, as per the Master Thesis by B. Ford. My parser should receive a data structure that represents the grammar of a language and parse given sequences of symbols.
I'm stuck with the memoization part. The original thesis uses Haskell's lazy evaluation to accomplish linear time complexity. I want to do this (memoization via laziness) in OCaml, but don't know how to do it.
So, how do you memoize functions by lazy evaluations in OCaml?
EDIT: I know what lazy evaluation is and how to exploit it in OCaml. The question is how to use it to memoize functions.
EDIT: The data structure I wrote that represents grammars is:
type ('a, 'b, 'c) expr =
| Empty of 'c
| Term of 'a * ('a -> 'c)
| NTerm of 'b
| Juxta of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr * ('c -> 'c -> 'c)
| Alter of ('a, 'b, 'c) expr * ('a, 'b, 'c) expr
| Pred of ('a, 'b, 'c) expr * 'c
| NPred of ('a, 'b, 'c) expr * 'c
type ('a, 'b, 'c) grammar = ('a * ('a, 'b, 'c) expr) list
The (not-memoized) function that parse a list of symbols is:
let rec parse g v xs = parse' g (List.assoc v g) xs
and parse' g e xs =
match e with
| Empty y -> Parsed (y, xs)
| Term (x, f) ->
begin
match xs with
| x' :: xs when x = x' -> Parsed (f x, xs)
| _ -> NoParse
end
| NTerm v' -> parse g v' xs
| Juxta (e1, e2, f) ->
begin
match parse' g e1 xs with
| Parsed (y, xs) ->
begin
match parse' g e2 xs with
| Parsed (y', xs) -> Parsed (f y y', xs)
| p -> p
end
| p -> p
end
( and so on )
where the type of the return value of parse is defined by
type ('a, 'c) result = Parsed of 'c * ('a list) | NoParse
For example, the grammar of basic arithmetic expressions can be specified as g, in:
type nt = Add | Mult | Prim | Dec | Expr
let zero _ = 0
let g =
[(Expr, Juxta (NTerm Add, Term ('$', zero), fun x _ -> x));
(Add, Alter (Juxta (NTerm Mult, Juxta (Term ('+', zero), NTerm Add, fun _ x -> x), (+)), NTerm Mult));
(Mult, Alter (Juxta (NTerm Prim, Juxta (Term ('*', zero), NTerm Mult, fun _ x -> x), ( * )), NTerm Prim));
(Prim, Alter (Juxta (Term ('<', zero), Juxta (NTerm Dec, Term ('>', zero), fun x _ -> x), fun _ x -> x), NTerm Dec));
(Dec, List.fold_left (fun acc d -> Alter (Term (d, (fun c -> int_of_char c - 48)), acc)) (Term ('0', zero)) ['1';'2';'3';])]
The idea of using lazyness for memoization is use not functions, but data structures, for memoization. Lazyness means that when you write let x = foo in some_expr, foo will not be evaluated immediately, but only as far as some_expr needs it, but that different occurences of xin some_expr will share the same trunk: as soon as one of them force computation, the result is available to all of them.
This does not work for functions: if you write let f x = foo in some_expr, and call f several times in some_expr, well, each call will be evaluated independently, there is not a shared thunk to store the results.
So you can get memoization by using a data structure instead of a function. Typically, this is done using an associative data structure: instead of computing a a -> b function, you compute a Table a b, where Table is some map from the arguments to the results. One example is this Haskell presentation of fibonacci:
fib n = fibTable !! n
fibTable = [0,1] ++ map (\n -> fib (n - 1) + fib (n - 2)) [2..]
(You can also write that with tail and zip, but this doesn't make the point clearer.)
See that you do not memoize a function, but a list: it is the list fibTable that does the memoization. You can write this in OCaml as well, for example using the LazyList module of the Batteries library:
open Batteries
module LL = LazyList
let from_2 = LL.seq 2 ((+) 1) (fun _ -> true)
let rec fib n = LL.at fib_table (n - 1) + LL.at fib_table (n - 2)
and fib_table = lazy (LL.Cons (0, LL.cons 1 <| LL.map fib from_2))
However, there is little interest in doing so: as you have seen in the example above, OCaml does not particularly favor call-by-need evaluation -- it's reasonable to use, but not terribly convenient as it was forced to be in Haskell. It is actually equally simple to directly write the cache structure by direct mutation:
open Batteries
let fib =
let fib_table = DynArray.of_list [0; 1] in
let get_fib n = DynArray.get fib_table n in
fun n ->
for i = DynArray.length fib_table to n do
DynArray.add fib_table (get_fib (i - 1) + get_fib (i - 2))
done;
get_fib n
This example may be ill-chosen, because you need a dynamic structure to store the cache. In the packrat parser case, you're tabulating parsing on a known input text, so you can use plain arrays (indexed by the grammar rules): you would have an array of ('a, 'c) result option for each rule, of the size of the input length and initialized to None. Eg. juxta.(n) represents the result of trying the rule Juxta from input position n, or None if this has not yet been tried.
Lazyness is a nice way to present this kind of memoization, but is not always expressive enough: if you need, say, to partially free some part of your result cache to lower memory usage, you will have difficulties if you started from a lazy presentation. See this blog post for a remark on this.
Why do you want to memoize functions? What you want to memoize is, I believe, the parsing result for a given (parsing) expression and a given position in the input stream. You could for instance use Ocaml's Hashtables for that.
The lazy keyword.
Here you can find some great examples.
If it fits your use case, you can also use OCaml streams instead of manually generating thunks.
I've been working with FParsec lately and I found that the lack of generic parsers is a major stopping point for me. My goal for this little library is simplicity as well as support for generic input. Can you think of any additions that would improve this or is anything particularly bad?
open LazyList
type State<'a, 'b> (input:LazyList<'a>, data:'b) =
member this.Input = input
member this.Data = data
type Result<'a, 'b, 'c> =
| Success of 'c * State<'a, 'b>
| Failure of string * State<'a, 'b>
type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c>
let (>>=) left right state =
match left state with
| Success (result, state) -> (right result) state
| Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state)
let (<|>) left right state =
match left state with
| Success (_, _) as result -> result
| Failure (_, _) -> right state
let (|>>) parser transform state =
match parser state with
| Success (result, state) -> Success (transform result, state)
| Failure (message, _) -> Failure (message, state)
let (<?>) parser errorMessage state =
match parser state with
| Success (_, _) as result -> result
| Failure (_, _) -> Failure (errorMessage, state)
type ParseMonad() =
member this.Bind (f, g) = f >>= g
member this.Return x s = Success(x, s)
member this.Zero () s = Failure("", s)
member this.Delay (f:unit -> Parser<_,_,_>) = f()
let parse = ParseMonad()
Backtracking
Surprisingly it didn't take too much code to implement what you describe. It is a bit sloppy but seems to work quite well.
let (>>=) left right state =
seq {
for res in left state do
match res with
| Success(v, s) ->
let v =
right v s
|> List.tryFind (
fun res ->
match res with
| Success (_, _) -> true
| _ -> false
)
match v with
| Some v -> yield v
| None -> ()
} |> Seq.toList
let (<|>) left right state =
left state # right state
Backtracking Part 2
Switched around the code to use lazy lists and tail-call optimized recursion.
let (>>=) left right state =
let rec readRight lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
let rec readLeft lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) ->
match readRight (right r s) with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> readLeft xs
| Failure (m, s) -> readLeft xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
readLeft (left state)
let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state =
LazyList.delayed (fun () -> left state)
|> LazyList.append
<| LazyList.delayed (fun () -> right state)
I think that one important design decision that you'll need to make is whether you want to support backtracking in your parsers or not (I don't remember much about parsing theory, but this probably specifies the types of languages that your parser can handle).
Backtracking. In your implementation, a parser can either fail (the Failure case) or produce exactly one result (the Success case). An alternative option is to generate zero or more results (for example, represent results as seq<'c>). Sorry if this is something you already considered :-), but anyway...
The difference is that your parser always follows the first possible option. For example, if you write something like the following:
let! s1 = (str "ab" <|> str "a")
let! s2 = str "bcd"
Using your implementation, this will fail for input "abcd". It will choose the first branch of the <|> operator, which will then process first two characters and the next parser in the sequence will fail. An implementation based on sequences would be able to backtrack and follow the second path in <|> and parse the input.
Combine. Another idea that occurs to me is that you could also add Combine member to your parser computation builder. This is a bit subtle (because you need to understand how computation expressions are translated), but it can be sometimes useful. If you add:
member x.Combine(a, b) = a <|> b
member x.ReturnFrom(p) = p
You can then write recursive parsers nicely:
let rec many p acc =
parser { let! r = p // Parse 'p' at least once
return! many p (r::acc) // Try parsing 'p' multiple times
return r::acc |> List.rev } // If fails, return the result
I am new to F# and was reading about tail recursive functions and was hoping someone could give me two different implementations of a function foo - one that is tail recursive and one that isn't so that I can better understand the principle.
Start with a simple task, like mapping items from 'a to 'b in a list. We want to write a function which has the signature
val map: ('a -> 'b) -> 'a list -> 'b list
Where
map (fun x -> x * 2) [1;2;3;4;5] == [2;4;6;8;10]
Start with non-tail recursive version:
let rec map f = function
| [] -> []
| x::xs -> f x::map f xs
This isn't tail recursive because function still has work to do after making the recursive call. :: is syntactic sugar for List.Cons(f x, map f xs).
The function's non-recursive nature might be a little more obvious if I re-wrote the last line as | x::xs -> let temp = map f xs; f x::temp -- obviously its doing work after the recursive call.
Use an accumulator variable to make it tail recursive:
let map f l =
let rec loop acc = function
| [] -> List.rev acc
| x::xs -> loop (f x::acc) xs
loop [] l
Here's we're building up a new list in a variable acc. Since the list gets built up in reverse, we need to reverse the output list before giving it back to the user.
If you're in for a little mind warp, you can use continuation passing to write the code more succinctly:
let map f l =
let rec loop cont = function
| [] -> cont []
| x::xs -> loop ( fun acc -> cont (f x::acc) ) xs
loop id l
Since the call to loop and cont are the last functions called with no additional work, they're tail-recursive.
This works because the continuation cont is captured by a new continuation, which in turn is captured by another, resulting in a sort of tree-like data structure as follows:
(fun acc -> (f 1)::acc)
((fun acc -> (f 2)::acc)
((fun acc -> (f 3)::acc)
((fun acc -> (f 4)::acc)
((fun acc -> (f 5)::acc)
(id [])))))
which builds up a list in-order without requiring you to reverse it.
For what its worth, start writing functions in non-tail recursive way, they're easier to read and work with.
If you have a big list to go through, use an accumulator variable.
If you can't find a way to use an accumulator in a convenient way and you don't have any other options at your disposal, use continuations. I personally consider non-trivial, heavy use of continuations hard to read.
An attempt at a shorter explanation than in the other examples:
let rec foo n =
match n with
| 0 -> 0
| _ -> 2 + foo (n-1)
let rec bar acc n =
match n with
| 0 -> acc
| _ -> bar (acc+2) (n-1)
Here, foo is not tail-recursive, because foo has to call foo recursively in order to evaluate 2+foo(n-1) and return it.
However, bar ís tail-recursive, because bar doesn't have to use the return value of the recursive call in order to return a value. It can just let the recursively called bar return its value immediately (without returning all the way up though the calling stack). The compiler sees this and optimized this by rewriting the recursion into a loop.
Changing the last line in bar into something like | _ -> 2 + (bar (acc+2) (n-1)) would again destroy the function being tail-recursive, since 2 + leads to an action that needs to be done after the recursive call is finished.
Here is a more obvious example, compare it to what you would normally do for a factorial.
let factorial n =
let rec fact n acc =
match n with
| 0 -> acc
| _ -> fact (n-1) (acc*n)
fact n 1
This one is a bit complex, but the idea is that you have an accumulator that keeps a running tally, rather than modifying the return value.
Additionally, this style of wrapping is usually a good idea, that way your caller doesn't need to worry about seeding the accumulator (note that fact is local to the function)
I'm learning F# too.
The following are non-tail recursive and tail recursive function to calculate the fibonacci numbers.
Non-tail recursive version
let rec fib = function
| n when n < 2 -> 1
| n -> fib(n-1) + fib(n-2);;
Tail recursive version
let fib n =
let rec tfib n1 n2 = function
| 0 -> n1
| n -> tfib n2 (n2 + n1) (n - 1)
tfib 0 1 n;;
Note: since the fibanacci number could grow really fast you could replace last line tfib 0 1 n to
tfib 0I 1I n to take advantage of Numerics.BigInteger Structure in F#
Also, when testing, don't forget that indirect tail recursion (tailcall) is turned off by default when compiling in Debug mode. This can cause tailcall recursion to overflow the stack in Debug mode but not in Release mode.