What's the most "functional" way to select a subset from this array? - f#

I'd like to get more comfortable with functional programming, and the first educational task I've set myself is converting a program that computes audio frequencies from C# to F#. The meat of the original application is a big "for" loop that selects a subset of the values in a large array; which values are taken depends on the last accepted value and a ranked list of the values seen since then. There are a few variables that persist between iterations to track progress toward determining the next value.
My first attempt at making this loop more "functional" involved a tail-recursive function whose arguments included the array, the result set so far, the ranked list of values recently seen, and a few other items that need to persist between executions. This seems clunky, and I don't feel like I've gained anything by turning everything that used to be a variable into a parameter on this recursive function.
How would a functional programming master approach this kind of task? Is this an exceptional situation in which a "pure" functional approach doesn't quite fit, and am I wrong for eschewing mutable variables just because I feel they reduce the "purity" of my function? Maybe they don't make it less pure since they only exist inside that function's scope. I don't have a feel for that yet.
Here's an attempted distillation of the code, with some "let" statements and the actual components of state removed ("temp" is the intermediate result array that needs to be processed):
let fif (_,_,_,_,fif) = fif
temp
|> Array.fold (fun (a, b, c, tentativeNextVals, acc) curVal ->
if (hasProperty curVal c) then
// do not consider current value
(a, b, c, Seq.empty, acc)
else
if (hasOtherProperty curVal b) then
// add current value to tentative list
(a, b, c, tentativeNextVals.Concat [curVal], acc)
else
// accept a new value
let newAcceptedVal = chooseNextVal (tentativeNextVals.Concat [curVal])
(newC, newB, newC, Seq.empty, acc.Concat [newAcceptedVal])
) (0,0,0,Seq.empty,Seq.empty)
|> fif

Something like this using fold?
let filter list =
List.fold (fun statevar element -> if condition statevar then statevar else element) initialvalue list

Try using Seq.skip and Seq.take:
let subset (min, max) seq =
seq
|> Seq.skip (min)
|> Seq.take (max - min)
This function will accept arrays but return a sequence, so you can convert it back using Array.ofSeq.
PS: If your goal is to keep your program functional, the most important rule is this: avoid mutability as much as you can. This means that you probably shouldn't be using arrays; use lists which are immutable. If you're using an array for it's fast random access, go for it; just be sure to never set indices.

Related

Determine if all elements of a list belong to the same DU case

I have a discriminated union with 10-15 cases, all having data in the form of int<'a>:
type MyUnionType =
| Case1 of int<someUnit>
| Case2 of int<someUnit>
|
...
| CaseN of int<someOtherUnit>
I am new to functional programming and am struggling to write a function with the following signature:
mySum:MyUnionType option list -> MyUnionType option
The function should sum all the ints iff all the Some elements belong to the same DU case. For example:
mySum [Some (Case1 2<a>), Some (Case1 3<a>), None] = Some Case1 5<a>
mySum [Some (Case1 2<a>), Some (Case2 3<a>), None] = None
mySum [None] = None
I know about Option.map and List.choose and the likes that can help here, but I'm struggling with determining whether all elements belong to the same case.
Is there an elegant and FP-idiomatic way to write this function? (If it simplifies matters, you can assume the list is never empty.)
(Though I don't have a clear grasp on monoids/monads/morphisms yet, don't be afraid to use the words if relevant, though please stop a bit short of zygohistomorphic prepromorphisms).
First, the code I'm about to present you will be greatly simplified if you remove all the None cases from the list before summing it. So for the rest of my answer, I'm going to assume that you've run your list through a List.choose id step first to get rid of all the None values.
The simplest way to think about this is probably to break it down into a series of single steps. You start by taking the first item of the list to initialize your "sum so far" value. (If there was no first item after running the list through List.choose id, then the list was either empty or contained only Nones, so the sum in that case will be None). Now, if that was the only item of the list, then you've already found the sum of the entire list. Otherwise, you look at the first item of the rest of the list, and ask the following question:
Is that item the same DU case as the sum so far?
If the answer is yes, then you add its value to the sum so far, and keep going through the loop. If the answer is no, then you make the "sum so far" value a None value instead of Some (case). So really, the "is it the same as the sum so far" question is actually two questions:
Is the "sum so far" a real value? (I.e., not None)?
Is the item I'm looking at the same DU case as the sum so far?
If the answer to both of these questions is "yes", then you add up the two values to get a new "sum so far" value. If it's "no", then you just set the "sum so far" to None, and your eventual result will be None as well.
Translating that approach into code looks like this:
let addToSum sumSoFar nextItem =
match sumSoFar with
| None -> None // Short-circuit if we previously found a mismatch
| Some x ->
match x, nextItem with
| Case1 a, Case1 b -> Some (Case1 (a + b))
| Case2 a, Case2 b -> Some (Case2 (a + b))
// ...
| CaseN a, CaseN b -> Some (CaseN (a + b))
| _ -> None // Mismatch
Now you need a function to apply a "combining" operation like that to the whole list. (A "combining" operation is any operation that takes two items of the same type and produces a single item of that same type; addition is one such operation, but so is multiplication, and a bunch of other things). There are two basic "apply this combining operation to the whole list" functions in F#, reduce and fold. The difference is that reduce takes the first item of the list as the initial "sum so far" value, and cannot work on an empty list. Whereas fold requires you to supply the initial value of its "sum so far" accumulator, but it can work on an empty list (for an empty list, the result of fold will simply be the initial "sum so far" value that you provided). In your case, since you don't initially know the type that your "sum so far" value should hold, you have to use reduce. So I'd suggest something like this:
let sumMyList values =
values |> List.choose id |> List.reduce addToSum
Except that List.reduce can't handle an empty list, and if the list you have is entirely None cases, that would blow up. (Can you see why?) So I'll add one more step to it, to handle empty lists:
let reduceSafely filteredValues =
match filteredValues with
| [] -> None
| _ -> filteredValues |> List.reduce addToSum
let sumMyList values =
values |> List.choose id |> reduceSafely
That should get you what you're looking for. And hopefully it's also given you insight into the process of designing a functional solution to your problems.
P.S. I recommend the F# track at http://exercism.io/ if you want more practice in figuring out the functional approach to problem-solving. I learned a lot running through those exercises!

Query Expressions and Lazy Evaluation

I am hoping to understand how query expressions are really evaluated. I have a situation where I'm using a query expression to access a large amount of data from a database. I then interact with this data via a GUI. For example the user might supply an additive factor that I want to apply to one column and then plot. What I'm not clear on is how to structure this so that the same data isn't being pulled from the database each time the GUI updates.
For example:
let a state= query{...}
let results = a "ALASKA"
let calcoutput y = results |> Seq.map (fun x -> x.Temperature + y)
or
let calcoutput state y = a state |> Seq.map (fun x -> x.Temperature + y)
I am not clear if these are actually the same code, and if so am I pulling data from the DB each time I execute calcoutput with a different y (it appears so). Should I be casting the "results" sequence as a List and then using that to avoid this?
You can use Seq.cache function.
http://msdn.microsoft.com/en-us/library/ee370430.aspx
Quote: "This result sequence will have the same elements as the input sequence. The result can be enumerated multiple times. The input sequence is enumerated at most once and only as far as is necessary. Caching a sequence is typically useful when repeatedly evaluating items in the original sequence is computationally expensive or if iterating the sequence causes side-effects that the user does not want to be repeated multiple times."

Is there an existing function to apply a function to each member of a tuple?

I want to apply a function to both members of a homogenous tuple, resulting in another tuple. Following on from my previous question I defined an operator that seemed to make sense to me:
let (||>>) (a,b) f = f a, f b
However, again I feel like this might be a common use case but couldn't find it in the standard library. Does it exist?
I don't think there is any standard library function that does this.
My personal preference would be to avoid too many custom operators (they make code shorter, but they make it harder to read for people who have not seen the definition before). Applying function to both elements of a tuple is logically close to the map operation on lists (which applies a function to all elements of a list), so I would probably define Tuple2.map:
module Tuple2 =
let map f (a, b) = (f a, f b)
Then you can use the function quite nicely with pipelining:
let nums = (1, 2)
nums |> Tuple2.map (fun x -> x + 1)

Custom Operator for Lag or Standard Deviation

What is the proper way to extend the available operators when using RX?
I'd like to build out some operations that I think would be useful.
The first operation is simply the standard deviation of a series.
The second operation is the nth lagged value i.e. if we are lagging 2 and our series is A B C D E F when F is pushed the lag would be D when A is pushed the lag would be null/empty when B is pushed the lag would be null/empty when C is pushed the Lag would be A
Would it make sense to base these types of operators off of the built-ins from rx.codeplex.com or is there an easier way?
In idiomatic Rx, arbitrary delays can be composed by Zip.
let lag (count : int) o =
let someo = Observable.map Some o
let delayed = Observable.Repeat(None, count).Concat(someo)
Observable.Zip(someo, delayed, (fun c d -> d))
As for a rolling buffer, the most efficient way is to simply use a Queue/ResizeArray of fixed size.
let rollingBuffer (size : int) o =
Observable.Create(fun (observer : IObserver<_>) ->
let buffer = new Queue<_>(size)
o |> Observable.subscribe(fun v ->
buffer.Enqueue(v)
if buffer.Count = size then
observer.OnNext(buffer.ToArray())
buffer.Dequeue() |> ignore
)
)
For numbers |> rollingBuffer 3 |> log:
seq [0L; 1L; 2L]
seq [1L; 2L; 3L]
seq [2L; 3L; 4L]
seq [3L; 4L; 5L]
...
For pairing adjacent values, you can just use Observable.pairwise
let delta (a, b) = b - a
let deltaStream = numbers |> Observable.pairwise |> Observable.map(delta)
Observable.Scan is more concise if you want to apply a rolling calculation .
Some of these are easier than others (as usual). For a 'lag' by count (rather than time) you just create a sliding window by using Observable.Buffer equivalent to the size of 'lag', then take the first element of the result list.
So far lag = 3, the function is:
obs.Buffer(3,1).Select(l => l.[0])
This is pretty straightforward to turn into an extension function. I don't know if it is efficient in that it reuses the same list, but in most cases that shouldn't matter. I know you want F#, the translation is straightforward.
For running aggregates, you can usually use Observable.Scan to get a 'running' value. This is calculated based on all values seen so far (and is pretty straightforward to implement) - ie all you have to implement each subsequent element is the previous aggregate and the new element.
If for whatever reason you need a running aggregate based on a sliding window, then we get into more difficult domain. Here you first need an operation that can give you a sliding window - this is covered by Buffer above. However, then you need to know which values have been removed from this window, and which have been added.
As such, I recommend a new Observable function that maintains an internal window based on existing window + new value, and returns new window + removed value + added value. You can write this using Observable.Scan (I recommend an internal Queue for efficient implementation). It should take a function for determining which values to remove given a new value (this way it can be parameterised for sliding by time or by count).
At that point, Observable.Scan can again be used to take the old aggregate + window + removed values + added value and give a new aggregate.
Hope this helps, I do realise it's a lot of words. If you can confirm the requirement, I can help out with the actual extension method for that specific use case.
For lag, you could do something like
module Observable =
let lag n obs =
let buf = System.Collections.Generic.Queue()
obs |> Observable.map (fun x ->
buf.Enqueue(x)
if buf.Count > n then Some(buf.Dequeue())
else None)
This:
Observable.Range(1, 9)
|> Observable.lag 2
|> Observable.subscribe (printfn "%A")
|> ignore
prints:
<null>
<null>
Some 1
Some 2
Some 3
Some 4
Some 5
Some 6
Some 7

Apply several aggregate functions with one enumeration

Let's assume I have a series of functions that work on a sequence, and I want to use them together in the following fashion:
let meanAndStandardDeviation data =
let m = mean data
let sd = standardDeviation data
(m, sd)
The code above is going to enumerate the sequence twice. I am interested in a function that will give the same result but enumerate the sequence only once. This function will be something like this:
magicFunction (mean, standardDeviation) data
where the input is a tuple of functions and a sequence and the ouput is the same with the function above.
Is this possible if the functions mean and stadardDeviation are black boxes and I cannot change their implementation?
If I wrote mean and standardDeviation myself, is there a way to make them work together? Maybe somehow making them keep yielding the input to the next function and hand over the result when they are done?
The only way to do this using just a single iteration when the functions are black boxes is to use the Seq.cache function (which evaluates the sequence once and stores the results in memory) or to convert the sequence to other in-memory representation.
When a function takes seq<T> as an argument, you don't even have a guarantee that it will evaluate it just once - and usual implementations of standard deviation would first calculate the average and then iterate over the sequence again to calculate the squares of errors.
I'm not sure if you can calculate standard deviation with just a single pass. However, it is possible to do that if the functions are expressed using fold. For example, calculating maximum and average using two passes looks like this:
let maxv = Seq.fold max Int32.MinValue input
let minv = Seq.fold min Int32.MaxValue input
You can do that using a single pass like this:
Seq.fold (fun (s1, s2) v ->
(max s1 v, min s2 v)) (Int32.MinValue, Int32.MaxValue) input
The lambda function is a bit ugly, but you can define a combinator to compose two functions:
let par f g (i, j) v = (f i v, g j v)
Seq.fold (par max min) (Int32.MinValue, Int32.MaxValue) input
This approach works for functions that can be defined using fold, which means that they consist of some initial value (Int32.MinValue in the first example) and then some function that is used to update the initial (previous) state when it gets the next value (and then possibly some post-processing of the result). In general, it should be possible to rewrite single-pass functions in this style, but I'm not sure if this can be done for standard deviation. It can be definitely done for mean:
let (count, sum) = Seq.fold (fun (count, sum) v ->
(count + 1.0, sum + v)) (0.0, 0.0) input
let mean = sum / count
What we're talking about here is a function with the following signature:
(seq<'a> -> 'b) * (seq<'a> -> 'c) -> seq<'a> -> ('b * 'c)
There is no straightforward way that I can think of that will achieve the above using a single iteration of the sequence if that is the signature of the functions. Well, no way that is more efficient than:
let magicFunc (f1:seq<'a>->'b, f2:seq<'a>->'c) (s:seq<'a>) =
let cached = s |> Seq.cache
(f1 cached, f2 cached)
That ensures a single iteration of the sequence itself (perhaps there are side effects, or it's slow), but does so by essentially caching the results. The cache is still iterated another time. Is there anything wrong with that? What are you trying to achieve?

Resources