Custom Operator for Lag or Standard Deviation - f#

What is the proper way to extend the available operators when using RX?
I'd like to build out some operations that I think would be useful.
The first operation is simply the standard deviation of a series.
The second operation is the nth lagged value i.e. if we are lagging 2 and our series is A B C D E F when F is pushed the lag would be D when A is pushed the lag would be null/empty when B is pushed the lag would be null/empty when C is pushed the Lag would be A
Would it make sense to base these types of operators off of the built-ins from rx.codeplex.com or is there an easier way?

In idiomatic Rx, arbitrary delays can be composed by Zip.
let lag (count : int) o =
let someo = Observable.map Some o
let delayed = Observable.Repeat(None, count).Concat(someo)
Observable.Zip(someo, delayed, (fun c d -> d))
As for a rolling buffer, the most efficient way is to simply use a Queue/ResizeArray of fixed size.
let rollingBuffer (size : int) o =
Observable.Create(fun (observer : IObserver<_>) ->
let buffer = new Queue<_>(size)
o |> Observable.subscribe(fun v ->
buffer.Enqueue(v)
if buffer.Count = size then
observer.OnNext(buffer.ToArray())
buffer.Dequeue() |> ignore
)
)
For numbers |> rollingBuffer 3 |> log:
seq [0L; 1L; 2L]
seq [1L; 2L; 3L]
seq [2L; 3L; 4L]
seq [3L; 4L; 5L]
...
For pairing adjacent values, you can just use Observable.pairwise
let delta (a, b) = b - a
let deltaStream = numbers |> Observable.pairwise |> Observable.map(delta)
Observable.Scan is more concise if you want to apply a rolling calculation .

Some of these are easier than others (as usual). For a 'lag' by count (rather than time) you just create a sliding window by using Observable.Buffer equivalent to the size of 'lag', then take the first element of the result list.
So far lag = 3, the function is:
obs.Buffer(3,1).Select(l => l.[0])
This is pretty straightforward to turn into an extension function. I don't know if it is efficient in that it reuses the same list, but in most cases that shouldn't matter. I know you want F#, the translation is straightforward.
For running aggregates, you can usually use Observable.Scan to get a 'running' value. This is calculated based on all values seen so far (and is pretty straightforward to implement) - ie all you have to implement each subsequent element is the previous aggregate and the new element.
If for whatever reason you need a running aggregate based on a sliding window, then we get into more difficult domain. Here you first need an operation that can give you a sliding window - this is covered by Buffer above. However, then you need to know which values have been removed from this window, and which have been added.
As such, I recommend a new Observable function that maintains an internal window based on existing window + new value, and returns new window + removed value + added value. You can write this using Observable.Scan (I recommend an internal Queue for efficient implementation). It should take a function for determining which values to remove given a new value (this way it can be parameterised for sliding by time or by count).
At that point, Observable.Scan can again be used to take the old aggregate + window + removed values + added value and give a new aggregate.
Hope this helps, I do realise it's a lot of words. If you can confirm the requirement, I can help out with the actual extension method for that specific use case.

For lag, you could do something like
module Observable =
let lag n obs =
let buf = System.Collections.Generic.Queue()
obs |> Observable.map (fun x ->
buf.Enqueue(x)
if buf.Count > n then Some(buf.Dequeue())
else None)
This:
Observable.Range(1, 9)
|> Observable.lag 2
|> Observable.subscribe (printfn "%A")
|> ignore
prints:
<null>
<null>
Some 1
Some 2
Some 3
Some 4
Some 5
Some 6
Some 7

Related

What's the most "functional" way to select a subset from this array?

I'd like to get more comfortable with functional programming, and the first educational task I've set myself is converting a program that computes audio frequencies from C# to F#. The meat of the original application is a big "for" loop that selects a subset of the values in a large array; which values are taken depends on the last accepted value and a ranked list of the values seen since then. There are a few variables that persist between iterations to track progress toward determining the next value.
My first attempt at making this loop more "functional" involved a tail-recursive function whose arguments included the array, the result set so far, the ranked list of values recently seen, and a few other items that need to persist between executions. This seems clunky, and I don't feel like I've gained anything by turning everything that used to be a variable into a parameter on this recursive function.
How would a functional programming master approach this kind of task? Is this an exceptional situation in which a "pure" functional approach doesn't quite fit, and am I wrong for eschewing mutable variables just because I feel they reduce the "purity" of my function? Maybe they don't make it less pure since they only exist inside that function's scope. I don't have a feel for that yet.
Here's an attempted distillation of the code, with some "let" statements and the actual components of state removed ("temp" is the intermediate result array that needs to be processed):
let fif (_,_,_,_,fif) = fif
temp
|> Array.fold (fun (a, b, c, tentativeNextVals, acc) curVal ->
if (hasProperty curVal c) then
// do not consider current value
(a, b, c, Seq.empty, acc)
else
if (hasOtherProperty curVal b) then
// add current value to tentative list
(a, b, c, tentativeNextVals.Concat [curVal], acc)
else
// accept a new value
let newAcceptedVal = chooseNextVal (tentativeNextVals.Concat [curVal])
(newC, newB, newC, Seq.empty, acc.Concat [newAcceptedVal])
) (0,0,0,Seq.empty,Seq.empty)
|> fif
Something like this using fold?
let filter list =
List.fold (fun statevar element -> if condition statevar then statevar else element) initialvalue list
Try using Seq.skip and Seq.take:
let subset (min, max) seq =
seq
|> Seq.skip (min)
|> Seq.take (max - min)
This function will accept arrays but return a sequence, so you can convert it back using Array.ofSeq.
PS: If your goal is to keep your program functional, the most important rule is this: avoid mutability as much as you can. This means that you probably shouldn't be using arrays; use lists which are immutable. If you're using an array for it's fast random access, go for it; just be sure to never set indices.

How to make this loop more functional without bringing too much overheads

for i in a..b do
res <- res * myarray.[i]
res
Do I have to use like
Array.fold (*) 1 (Array.sub myarray a (b - a + 1))
, which I believe is rather slow and not that concise?
Don't know if you'll find it any better, but you could do:
Seq.fold (fun r i -> r * myarray.[i]) 1 {a .. b}
Daniel's solution is pretty neat and I think it should be nearly as efficient as the for loop, because it does not need to clone the array.
If you wanted a more concise solution, then you can use indexer instead of Array.sub, which does need to clone some part of the array, but it looks quite neat:
myarray.[a .. b] |> Seq.fold (*) 1
This clones a part of the array because the slicing operation returns an array. You could define your own slicing operation that returns the elements as seq<'T> (and thus does not clone the whole array):
module Array =
let slice a b (arr:'T[]) =
seq { for i in a .. b -> arr.[i] }
Using this function, you could write:
myarray |> Array.slice a b |> Seq.fold (*) 1
I believe this more directly expresses the functionality that you're trying to implement. As always with performance - you should measure the performance to see if you need to make such optimizations or if the first version is fast enough for your purpose.
If you're concerned with speed then I'd shy away from using seq unless you're prototyping. Either stick with the for loop or rewrite as a recursive function. The example you gave is simplistic and sometimes more complex problems are better represented as recursion.
let rec rangeProduct a b total (array : _[]) =
if a <= b then
rangeProduct (a + 1) b (total * array.[a]) array
else
total
let res = myArray |> rangeProduct a b res
There is no overhead here, it's as fast as possible, there is no mutation, and it's functional.

How to combine state and continuation monads in F#

I'm trying to sum a tree using the Task Parallel Library where child tasks are spawned only until the tree is traversed until a certain depth, and otherwise it sums the remaining child nodes using continuation passing style, to avoid stack overflows.
However, the code looks pretty ugly - it would be nice to use a state monad to carry the current depth around, but the state monad isn't tail recursive. Alternatively, how would I modify the continuation monad to carry around the state? Or create a combination of the state and continuation monads?
let sumTreeParallelDepthCont tree cont =
let rec sumRec tree depth cont =
let newDepth = depth - 1
match tree with
| Leaf(num) -> cont num
| Branch(left, right) ->
if depth <= 0 then
sumTreeContMonad left (fun leftM ->
sumTreeContMonad right (fun rightM ->
cont (leftM + rightM )))
else
let leftTask = Task.Factory.StartNew(fun () ->
let leftResult = ref 0
sumRec left newDepth (fun leftM ->
leftResult := leftM)
!leftResult
)
let rightTask = Task.Factory.StartNew(fun () ->
let rightResult = ref 0
sumRec right newDepth (fun rightM ->
rightResult := rightM)
!rightResult
)
cont (leftTask.Result + rightTask.Result)
sumRec tree 4 cont // 4 levels deep
I've got a little more detail on this blog post: http://taumuon-jabuka.blogspot.co.uk/2012/06/more-playing-with-monads.html
I think it is important to first understand what your requirements are.
The sequential version of the algorithm does not need to keep the depth (because it always processes the rest of the tree). However, it needs to use continuations because the tree can be large.
The parallel version, on the other hand, needs to keep the depth (because you only want to make limited number of recursive calls), but it does not need to use continuations (because the depth is quite limited and when you start a new task, it does not keep the stack anyway).
This means that you don't really need to combine the two aspects at all. Then you can rewrite the parallel version in a quite straightforward way:
let sumTreeParallelDepthCont tree =
let rec sumRec tree depth =
match tree with
| Leaf(num) -> num
| tree when depth <= 0 ->
sumTreeContMonad tree id
| Branch(left, right) ->
let leftTask = Task.Factory.StartNew(fun () -> sumRec left (depth + 1))
let rightResult = sumRec right (depth + 1)
leftTask.Result + rightResult
sumRec tree 4 // 4 levels deep
There is no need to duplicate the code from sumTreeContMonad because you can just call it on the current tree in the case tree when depth <= 0.
This also avoids using reference cells by creating Task<int> instead of Task and I modified the algorithm to only spawn one background task and do the second part of the work on the current thread.
In my eyes, the depth looks fine, the ugly bit is the ref cells and assignments. I am unclear why you need them; I think just passing id (identity function) as the cont parameter means that sumRec will return the value, and then you won't need the ref cells. (I may be wrong, this is analysis-at-a-glance.)
(I also would get rid of newDepth and just inline (depth-1) at the recursive call sites, as a matter of style.)
Finally, I've no idea what sumTreeContMonad is, but it appears that you could just use sumRec t -1 k instead of sumTreeContMonad t k and it would work the same.
(If your blog had code, rather than pictures of code, I might just post my own code with these refinements, but I don't feel like transcribing the data types and such. Why post pictures?)

Apply several aggregate functions with one enumeration

Let's assume I have a series of functions that work on a sequence, and I want to use them together in the following fashion:
let meanAndStandardDeviation data =
let m = mean data
let sd = standardDeviation data
(m, sd)
The code above is going to enumerate the sequence twice. I am interested in a function that will give the same result but enumerate the sequence only once. This function will be something like this:
magicFunction (mean, standardDeviation) data
where the input is a tuple of functions and a sequence and the ouput is the same with the function above.
Is this possible if the functions mean and stadardDeviation are black boxes and I cannot change their implementation?
If I wrote mean and standardDeviation myself, is there a way to make them work together? Maybe somehow making them keep yielding the input to the next function and hand over the result when they are done?
The only way to do this using just a single iteration when the functions are black boxes is to use the Seq.cache function (which evaluates the sequence once and stores the results in memory) or to convert the sequence to other in-memory representation.
When a function takes seq<T> as an argument, you don't even have a guarantee that it will evaluate it just once - and usual implementations of standard deviation would first calculate the average and then iterate over the sequence again to calculate the squares of errors.
I'm not sure if you can calculate standard deviation with just a single pass. However, it is possible to do that if the functions are expressed using fold. For example, calculating maximum and average using two passes looks like this:
let maxv = Seq.fold max Int32.MinValue input
let minv = Seq.fold min Int32.MaxValue input
You can do that using a single pass like this:
Seq.fold (fun (s1, s2) v ->
(max s1 v, min s2 v)) (Int32.MinValue, Int32.MaxValue) input
The lambda function is a bit ugly, but you can define a combinator to compose two functions:
let par f g (i, j) v = (f i v, g j v)
Seq.fold (par max min) (Int32.MinValue, Int32.MaxValue) input
This approach works for functions that can be defined using fold, which means that they consist of some initial value (Int32.MinValue in the first example) and then some function that is used to update the initial (previous) state when it gets the next value (and then possibly some post-processing of the result). In general, it should be possible to rewrite single-pass functions in this style, but I'm not sure if this can be done for standard deviation. It can be definitely done for mean:
let (count, sum) = Seq.fold (fun (count, sum) v ->
(count + 1.0, sum + v)) (0.0, 0.0) input
let mean = sum / count
What we're talking about here is a function with the following signature:
(seq<'a> -> 'b) * (seq<'a> -> 'c) -> seq<'a> -> ('b * 'c)
There is no straightforward way that I can think of that will achieve the above using a single iteration of the sequence if that is the signature of the functions. Well, no way that is more efficient than:
let magicFunc (f1:seq<'a>->'b, f2:seq<'a>->'c) (s:seq<'a>) =
let cached = s |> Seq.cache
(f1 cached, f2 cached)
That ensures a single iteration of the sequence itself (perhaps there are side effects, or it's slow), but does so by essentially caching the results. The cache is still iterated another time. Is there anything wrong with that? What are you trying to achieve?

How to efficiently find out if a sequence has at least n items?

Just naively using Seq.length may be not good enough as will blow up on infinite sequences.
Getting more fancy with using something like ss |> Seq.truncate n |> Seq.length will work, but behind the scene would involve double traversing of the argument sequence chunk by IEnumerator's MoveNext().
The best approach I was able to come up with so far is:
let hasAtLeast n (ss: seq<_>) =
let mutable result = true
use e = ss.GetEnumerator()
for _ in 1 .. n do result <- e.MoveNext()
result
This involves only single sequence traverse (more accurately, performing e.MoveNext() n times) and correctly handles boundary cases of empty and infinite sequences. I can further throw in few small improvements like explicit processing of specific cases for lists, arrays, and ICollections, or some cutting on traverse length, but wonder if any more effective approach to the problem exists that I may be missing?
Thank you for your help.
EDIT: Having on hand 5 overall implementation variants of hasAtLeast function (2 my own, 2 suggested by Daniel and one suggested by Ankur) I've arranged a marathon between these. Results that are tie for all implementations prove that Guvante is right: a simplest composition of existing algorithms would be the best, there is no point here in overengineering.
Further throwing in the readability factor I'd use either my own pure F#-based
let hasAtLeast n (ss: seq<_>) =
Seq.length (Seq.truncate n ss) >= n
or suggested by Ankur the fully equivalent Linq-based one that capitalizes on .NET integration
let hasAtLeast n (ss: seq<_>) =
ss.Take(n).Count() >= n
Here's a short, functional solution:
let hasAtLeast n items =
items
|> Seq.mapi (fun i x -> (i + 1), x)
|> Seq.exists (fun (i, _) -> i = n)
Example:
let items = Seq.initInfinite id
items |> hasAtLeast 10000
And here's an optimally efficient one:
let hasAtLeast n (items:seq<_>) =
use e = items.GetEnumerator()
let rec loop n =
if n = 0 then true
elif e.MoveNext() then loop (n - 1)
else false
loop n
Functional programming breaks up work loads into small chunks that do very generic tasks that do one simple thing. Determining if there are at least n items in a sequence is not a simple task.
You already found both the solutions to this "problem", composition of existing algorithms, which works for the majority of cases, and creating your own algorithm to solve the issue.
However I have to wonder whether your first solution wouldn't work. MoveNext() is only called n times on the original method for certain, Current is never called, and even if MoveNext() is called on some wrapper class the performance implications are likely tiny unless n is huge.
EDIT:
I was curious so I wrote a simple program to test out the timing of the two methods. The truncate method was quicker for a simple infinite sequence and one that had Sleep(1). It looks like I was right when your correction sounded like overengineering.
I think clarification is needed to explain what is happening in those methods. Seq.truncate takes a sequence and returns a sequence. Other than saving the value of n it doesn't do anything until enumeration. During enumeration it counts and stops after n values. Seq.length takes an enumeration and counts, returning the count when it ends. So the enumeration is only enumerated once, and the amount of overhead is a couple of method calls and two counters.
Using Linq this would be as simple as:
let hasAtLeast n (ss: seq<_>) =
ss.Take(n).Count() >= n
Seq take method blows up if there are not enough elements.
Example usage to show it traverse seq only once and till required elements:
seq { for i = 0 to 5 do
printfn "Generating %d" i
yield i }
|> hasAtLeast 4 |> printfn "%A"

Resources