How to combine state and continuation monads in F# - f#

I'm trying to sum a tree using the Task Parallel Library where child tasks are spawned only until the tree is traversed until a certain depth, and otherwise it sums the remaining child nodes using continuation passing style, to avoid stack overflows.
However, the code looks pretty ugly - it would be nice to use a state monad to carry the current depth around, but the state monad isn't tail recursive. Alternatively, how would I modify the continuation monad to carry around the state? Or create a combination of the state and continuation monads?
let sumTreeParallelDepthCont tree cont =
let rec sumRec tree depth cont =
let newDepth = depth - 1
match tree with
| Leaf(num) -> cont num
| Branch(left, right) ->
if depth <= 0 then
sumTreeContMonad left (fun leftM ->
sumTreeContMonad right (fun rightM ->
cont (leftM + rightM )))
else
let leftTask = Task.Factory.StartNew(fun () ->
let leftResult = ref 0
sumRec left newDepth (fun leftM ->
leftResult := leftM)
!leftResult
)
let rightTask = Task.Factory.StartNew(fun () ->
let rightResult = ref 0
sumRec right newDepth (fun rightM ->
rightResult := rightM)
!rightResult
)
cont (leftTask.Result + rightTask.Result)
sumRec tree 4 cont // 4 levels deep
I've got a little more detail on this blog post: http://taumuon-jabuka.blogspot.co.uk/2012/06/more-playing-with-monads.html

I think it is important to first understand what your requirements are.
The sequential version of the algorithm does not need to keep the depth (because it always processes the rest of the tree). However, it needs to use continuations because the tree can be large.
The parallel version, on the other hand, needs to keep the depth (because you only want to make limited number of recursive calls), but it does not need to use continuations (because the depth is quite limited and when you start a new task, it does not keep the stack anyway).
This means that you don't really need to combine the two aspects at all. Then you can rewrite the parallel version in a quite straightforward way:
let sumTreeParallelDepthCont tree =
let rec sumRec tree depth =
match tree with
| Leaf(num) -> num
| tree when depth <= 0 ->
sumTreeContMonad tree id
| Branch(left, right) ->
let leftTask = Task.Factory.StartNew(fun () -> sumRec left (depth + 1))
let rightResult = sumRec right (depth + 1)
leftTask.Result + rightResult
sumRec tree 4 // 4 levels deep
There is no need to duplicate the code from sumTreeContMonad because you can just call it on the current tree in the case tree when depth <= 0.
This also avoids using reference cells by creating Task<int> instead of Task and I modified the algorithm to only spawn one background task and do the second part of the work on the current thread.

In my eyes, the depth looks fine, the ugly bit is the ref cells and assignments. I am unclear why you need them; I think just passing id (identity function) as the cont parameter means that sumRec will return the value, and then you won't need the ref cells. (I may be wrong, this is analysis-at-a-glance.)
(I also would get rid of newDepth and just inline (depth-1) at the recursive call sites, as a matter of style.)
Finally, I've no idea what sumTreeContMonad is, but it appears that you could just use sumRec t -1 k instead of sumTreeContMonad t k and it would work the same.
(If your blog had code, rather than pictures of code, I might just post my own code with these refinements, but I don't feel like transcribing the data types and such. Why post pictures?)

Related

checking if enough elements in a F# list

Right now I have a few instances like this:
let doIt someList =
if someList |> List.truncate 2 |> List.length >= 2 then
someList[0] + someList[1]
else
0
I need to grab the top 2 elements of a list quite often to see changes, but in some cases I don't have enough elements and I need to make sure there are at least 2.
The best way I've found so far is to truncate the list before getting its length, but this creates allocations for no reason.
Is there a better method?
I think I would suggest pattern matching in this case:
let doIt someList =
match someList with
| a :: b :: _ -> a + b
| _ -> 0
Here, a and b are the ints in the list, while _ represents a discarded of list int. This way you don't have to pull the first two elements out of the list with an index, as they are already available as a and b. The last case of the match catches any pattern that was not matched earlier, such as cases with zero, one or three-or-more elements.
This should be a cheap operation, as F# lists are implemented as a singly linked list. So [a;b;c;d] would be represented as a::(b::(c::(d::[]))). a and b are matched, while the rest (c::(d::[])) is left untouched (and is put in the _ slot). It does not need to create a new list to do so.

How to count number of non-empty nodes in binary tree in F#

Consider the binary tree algebraic datatype
type btree = Empty | Node of btree * int * btree
and a new datatype deļ¬ned as follows:
type finding = NotFound | Found of int
Heres my code so far:
let s = Node (Node(Empty, 5, Node(Empty, 2, Empty)), 3, Node (Empty, 6, Empty))
(*
(3)
/ \
(5) (6)
/ \ | \
() (2) () ()
/ \
() ()
*)
(* size: btree -> int *)
let rec size t =
match t with
Empty -> false
| Node (t1, m, t2) -> if (m != Empty) then sum+1 || (size t1) || (size t2)
let num = occurs s
printfn "There are %i nodes in the tree" num
This probably isn't close, I took a function that would find if an integer existed in a tree and tried changing the code for what I was trying to do.
I am very new to using F# and would appreciate any help. I am trying to count all non empty nodes in the tree. For example the tree I'm using should print the value 4.
I did not run the compiler on your code, but I believe this does even compile.
However your idea to use a pattern match in a recursive function is good.
As rmunn commented, you want to determine the number of nodes in each case:
An empty tree has no nodes, hence the result is zero.
A non-empty tree, has at least the root node plus the count of its left and right subtrees.
So something along the lines of the following should work
let rec size t =
match t with
| Empty -> 0
| Node (t1, _, t2) -> 1 + (size t1) + (size t2)
The most important detail here is, that you do not need a global variable sum to store any intermediate values. The whole idea of a recursive function is that those intermediate values are the results of recursive calls.
As a remark, your tree in the comment should look like this, I believe.
(*
(3)
/ \
(5) (6)
/ \ | \
() (2) () ()
/ \
() ()
*)
Edit: I misread the misaligned () as leaves of an empty tree, where in fact they are leaves of the subtree (2). So it was just an ASCII art issue :-)
Friedrich already posted a simple version of the size function that will work for most trees. However, the solution is not "tail-recursive", so it can cause a Stack Overflow for large trees. In functional programming languages like F#, recursion is often the preferred technique for things like counting and other aggregate functions. However, recursive functions generally consume a stack frame for each recursive call. This means that for large structures, the call stack can be exhausted before the function completes. In order to avoid this problem, compilers can optimize functions that are considered "tail-recursive" so that they use only one stack frame regardless of how many times they recurse. Unfortunately, this optimization cannot just be implemented for any recursive algorithm. It requires that the recursive call be the last thing that the function does, thereby ensuring that the compiler does not have to worry about jumping back into the function after the call, allowing it to overwrite the stack frame instead of adding another one.
In order to change the size function to be tail-recursive, we need some way to avoid having to call it twice in the case of a non-empty node, so that the call can be the last step of the function, instead of the addition between the two calls in Friedrich's solution. This can be accomplished using a couple different techniques, generally either using an accumulator or using Continuation Passing Style. The simpler solution is often to use an accumulator to keep track of the total size instead of having it be the return value, while Continuation Passing Style is a more general solution that can handle more complex recursive algorithms.
In order to make an accumulator pattern work for a tree where we have to sum both the left and right sub-trees, we need some way to make one tail-call at the end of the function, while still making sure that both sub-trees are evaluated. A simple way to do that is to also accumulate the right sub-trees in addition to the total count, so we can make subsequent tail-calls to evaluate those trees while evaluating the left sub-trees first. That solution might look something like this:
let size t =
let rec size acc ts = function
| Empty ->
match ts with
| [] -> acc
| head :: tail -> head |> size acc tail
| Node (t1, _, t2) ->
t1 |> size (acc + 1) (t2 :: ts)
t |> size 0 []
This adds the acc parameter and the ts parameter to represent the total count and remaining unevaluated sub-trees. When we hit a populated node, we evaluate the left sub-tree while adding the right sub-tree to our list of trees to evaluate later. When we hit the an empty node, we start evaluating any ts we've accumulated, until we have no further populated nodes or unevaluated sub-trees. This isn't the best possible solution for computing the tree-size, and most real solutions would use Continuation Passing Style to make it tail-recusive, but that should make a good exercise as you get more familiar with the language.

why do continuations avoid stackoverflow?

I've been trying to understand continuations / CPS and from what I can gather it builds up a delayed computation, once we get to the end of the list we invoke the final computation.
What I don't understand is why CPS prevents stackoverflow when it seems analogous to building up a nested function as per the naive approach in Example 1. Sorry for the long post but tried to show the idea (and possibly where it goes wrong) from basics:
So:
let list1 = [1;2;3]
Example 1: "Naive approach"
let rec sumList = function
|[] -> 0
|h::t -> h + sumList t
So when this runs, iteratively it results in:
1 + sumList [2;3]
1 + (2 + sumList [3])
1 + (2 + (3 + 0))
So the nesting (and overflow issues) can be overcome by Tail Recursion - running an accumulator i.e.
"Example 2: Tail Recursion"
let sumListACC lst =
let rec loop l acc =
match l with
|[] -> acc
|h::t -> loop t (h + acc)
loop lst 0
i.e,
sumList[2;3] (1+0)
sumList[3] (2+1)
sumList[] (3+3)
So because the accumulator is evaluated at each step, there is no nesting and we avoid bursting the stack. Clear!
Next comes CPS, I understand this is required when we already have an accumulator but the function is not tail recursive e.g. with Foldback. Although not required in the above example, applying CPS to this problem gives:
"Example 3: CPS"
let sumListCPS lst =
let rec loop l cont =
match l with
|[] -> cont 0
|h::t -> loop t (fun x -> cont( h + x))
loop lst (fun x -> x)
To my understanding, iteratively this could be written as:
loop[2;3] (fun x -> cont (1+x))
loop[3] (fun x ->cont (1+x) -> cont(2+x))
loop[] (fun x -> cont (1+x) -> cont(2+x) -> cont (3+x)
which then reduces sequentially from the right with the final x = 0 i.e:
cont(1+x)-> cont(2+x) -> cont (3+0)
cont(1+x)-> cont(2+x) -> 3
cont(1+x) -> cont (2+3)
...
cont (1+5) -> 6
which I suppose is analogous to:
cont(1+cont(2+cont(3+0)))
(1+(2+(3+0)))
correction to original post - realised that it is evaluated from the right, as for example replacing cont(h +x) with cont(h+2*x) yields 17 for the above example consistent with: (1+2*(2+2*(3+2*0)))
i.e. exactly where we started in example 1, based on this since we still need to keep track of where we came from why does using it prevent the overflow issue that example 1 suffers from?
As I know it doesn't, where have I gone wrong?
I've read the following posts (multiple times) but the above confusion remains.
http://www.markhneedham.com/blog/2009/06/22/f-continuation-passing-style/
http://codebetter.com/matthewpodwysocki/2008/08/13/recursing-on-recursion-continuation-passing/
http://lorgonblog.wordpress.com/2008/04/05/catamorphisms-part-one/
What happens is quite simple.
.NET (and other platforms, but we're discussing F# right now) stores information in two locations: the stack (for value types, for pointer to objects, and for keeping track of function calls) and the heap (for objects).
In regular non-tail recursion, you keep track of your progress in the stack (quite obviously). In CPS, you keep track of your progress in lambda functions (which are on the heap!), and tail recursion optimization makes sure that the stack stays clear of any tracking.
As the heap is significantly larger than the stack, it is (in some cases) better to move the tracking from the stack to the heap - via CPS.

Custom Operator for Lag or Standard Deviation

What is the proper way to extend the available operators when using RX?
I'd like to build out some operations that I think would be useful.
The first operation is simply the standard deviation of a series.
The second operation is the nth lagged value i.e. if we are lagging 2 and our series is A B C D E F when F is pushed the lag would be D when A is pushed the lag would be null/empty when B is pushed the lag would be null/empty when C is pushed the Lag would be A
Would it make sense to base these types of operators off of the built-ins from rx.codeplex.com or is there an easier way?
In idiomatic Rx, arbitrary delays can be composed by Zip.
let lag (count : int) o =
let someo = Observable.map Some o
let delayed = Observable.Repeat(None, count).Concat(someo)
Observable.Zip(someo, delayed, (fun c d -> d))
As for a rolling buffer, the most efficient way is to simply use a Queue/ResizeArray of fixed size.
let rollingBuffer (size : int) o =
Observable.Create(fun (observer : IObserver<_>) ->
let buffer = new Queue<_>(size)
o |> Observable.subscribe(fun v ->
buffer.Enqueue(v)
if buffer.Count = size then
observer.OnNext(buffer.ToArray())
buffer.Dequeue() |> ignore
)
)
For numbers |> rollingBuffer 3 |> log:
seq [0L; 1L; 2L]
seq [1L; 2L; 3L]
seq [2L; 3L; 4L]
seq [3L; 4L; 5L]
...
For pairing adjacent values, you can just use Observable.pairwise
let delta (a, b) = b - a
let deltaStream = numbers |> Observable.pairwise |> Observable.map(delta)
Observable.Scan is more concise if you want to apply a rolling calculation .
Some of these are easier than others (as usual). For a 'lag' by count (rather than time) you just create a sliding window by using Observable.Buffer equivalent to the size of 'lag', then take the first element of the result list.
So far lag = 3, the function is:
obs.Buffer(3,1).Select(l => l.[0])
This is pretty straightforward to turn into an extension function. I don't know if it is efficient in that it reuses the same list, but in most cases that shouldn't matter. I know you want F#, the translation is straightforward.
For running aggregates, you can usually use Observable.Scan to get a 'running' value. This is calculated based on all values seen so far (and is pretty straightforward to implement) - ie all you have to implement each subsequent element is the previous aggregate and the new element.
If for whatever reason you need a running aggregate based on a sliding window, then we get into more difficult domain. Here you first need an operation that can give you a sliding window - this is covered by Buffer above. However, then you need to know which values have been removed from this window, and which have been added.
As such, I recommend a new Observable function that maintains an internal window based on existing window + new value, and returns new window + removed value + added value. You can write this using Observable.Scan (I recommend an internal Queue for efficient implementation). It should take a function for determining which values to remove given a new value (this way it can be parameterised for sliding by time or by count).
At that point, Observable.Scan can again be used to take the old aggregate + window + removed values + added value and give a new aggregate.
Hope this helps, I do realise it's a lot of words. If you can confirm the requirement, I can help out with the actual extension method for that specific use case.
For lag, you could do something like
module Observable =
let lag n obs =
let buf = System.Collections.Generic.Queue()
obs |> Observable.map (fun x ->
buf.Enqueue(x)
if buf.Count > n then Some(buf.Dequeue())
else None)
This:
Observable.Range(1, 9)
|> Observable.lag 2
|> Observable.subscribe (printfn "%A")
|> ignore
prints:
<null>
<null>
Some 1
Some 2
Some 3
Some 4
Some 5
Some 6
Some 7

Pairwise Sequence Processing to compare db tables

Consider the following Use case:
I want to iterate through 2 db tables in parallel and find differences and gaps/missing records in either table. Assume that 1) pk of table is an Int ID field; 2) the tables are read in ID order; 3) records may be missing from either table (with corresponding sequence gaps).
I'd like to do this in a single pass over each db - using lazy reads. (My initial version of this program uses sequence objects and the data reader - unfortunately makes multiple passes over each db).
I've thought of using pairwise sequence processing and use Seq.skip within the iterations to try and keep the table processing in sync. However apparently this is very slow as I Seq.skip has a high overhead (creating new sequences under the hood) so this could be a problem with a large table (say 200k recs).
I imagine this is a common design pattern (compare concurrent data streams from different sources) and am interested in feedback/comments/links to similar projects.
Anyone care to comment?
Here's my (completely untested) take, doing a single pass over both tables:
let findDifferences readerA readerB =
let idsA, idsB =
let getIds (reader:System.Data.Common.DbDataReader) =
reader |> LazyList.unfold (fun reader ->
if reader.Read ()
then Some (reader.GetInt32 0, reader)
else None)
getIds readerA, getIds readerB
let onlyInA, onlyInB = ResizeArray<_>(), ResizeArray<_>()
let rec impl a b =
let inline handleOnlyInA idA as' = onlyInA.Add idA; impl as' b
let inline handleOnlyInB idB bs' = onlyInB.Add idB; impl a bs'
match a, b with
| LazyList.Cons (idA, as'), LazyList.Cons (idB, bs') ->
if idA < idB then handleOnlyInA idA as'
elif idA > idB then handleOnlyInB idB bs'
else impl as' bs'
| LazyList.Nil, LazyList.Nil -> () // termination condition
| LazyList.Cons (idA, as'), _ -> handleOnlyInA idA as'
| _, LazyList.Cons (idB, bs') -> handleOnlyInB idB bs'
impl idsA idsB
onlyInA.ToArray (), onlyInB.ToArray ()
This takes two DataReaders (one for each table) and returns two int[]s which indicate the IDs that were only present in their respective table. The code assumes that the ID field is of type int and is at ordinal index 0.
Also note that this code uses LazyList from the F# PowerPack, so you'll need to get that if you don't already have it. If you're targeting .NET 4.0 then I strongly recommend getting the .NET 4.0 binaries which I've built and hosted here, as the binaries from the F# PowerPack site only target .NET 2.0 and sometimes don't play nice with VS2010 SP1 (see this thread for more info: Problem with F# Powerpack. Method not found error).
When you use sequences, any lazy function adds some overhead on the sequence. Calling Seq.skip thousands of times on the same sequence will clearly be slow.
You can use Seq.zip or Seq.map2 to process two sequences at a time:
> Seq.map2 (+) [1..3] [10..12];;
val it : seq<int> = seq [11; 13; 15]
If the Seq module is not enough, you might need to write your own function.
I'm not sure if I understand what you try to do, but this sample function might help you:
let fct (s1: seq<_>) (s2: seq<_>) =
use e1 = s1.GetEnumerator()
use e2 = s2.GetEnumerator()
let rec walk () =
// do some stuff with the element of both sequences
printfn "%d %d" e1.Current e2.Current
if cond1 then // move in both sequences
if e1.MoveNext() && e2.MoveNext() then walk ()
else () // end of a sequence
elif cond2 then // move to the next element of s1
if e1.MoveNext() then walk()
else () // end of s1
elif cond3 then // move to the next element of s2
if e2.MoveNext() then walk ()
else () // end of s2
// we need at least one element in each sequence
if e1.MoveNext() && e2.MoveNext() then walk()
Edit :
The previous function was meant to extend functionality of the Seq module, and you'll probably want to make it a high-order function. As ildjarn said, using LazyList can lead to cleaner code:
let rec merge (l1: LazyList<_>) (l2: LazyList<_>) =
match l1, l2 with
| LazyList.Cons(h1, t1), LazyList.Cons(h2, t2) ->
if h1 <= h2 then LazyList.cons h1 (merge t1 l2)
else LazyList.cons h2 (merge l1 t2)
| LazyList.Nil, l2 -> l2
| _ -> l1
merge (LazyList.ofSeq [1; 4; 5; 7]) (LazyList.ofSeq [1; 2; 3; 6; 8; 9])
But I still think you should separate the iteration of your data, from the processing. Writing a high-order function to iterate is a good idea (at the end, it's not annoying if the iterator function code uses mutable enumerators).

Resources