Is if .. else .. an idiomatic way of writing things in F#? - f#

What would be an F# idiomatic way of writing the following ? Or would you leave this as is ?
let input = 5
let result =
if input > 0 && input < 5 then
let a = CalculateA(input)
let b = CalculateB(input)
(a+b)/2
else
CalculateC(input)

For one if ... then ... else ... I'd probably leave it like that, if you had more cases I'd either use pattern match with a when guard:
let result =
match input with
| _ when input > 0 && input < 5 -> ...
| _ -> ...
or you might also want to look at active patterns: http://msdn.microsoft.com/en-us/library/dd233248.aspx

What would be an F# idiomatic way of writing the following ? Or would you leave this as is ?
There's nothing wrong with the way you've written it but here is another alternative (inspired by Huusom):
let input = 5
let result =
if input>0 && input<5 then [A; B] else [C]
|> Seq.averageBy (fun f -> f input)

This is minor stylistic change but I find this more readable:
let input = 5
let result =
if input > 0 && input < 5 then
(calculateA input + calculateB input) / 2
else
calculateC input

This is not really an answer because Robert is correct. But it looks like you are working with series of functions, so you could write it like this:
let Calculate input =
let calc = function | [f] -> f input | fl -> fl |> List.map ((|>) input) |> List.sum |> (fun s -> s / fl.Length)
if input > 0 && input < 5
then calc [CalculateA; CalculateB]
else calc [CalculateC]
You could decompose to something with this signature: ((int -> int) list) -> ((int -> int) list) -> (int -> bool) -> int -> int and then build your function by applying the first 3 parameters.

Related

F# : How to test the equality of sequence/list elements?

I would like to test whether all of elements in a list/sequence equals something
For example,a sequence of integers.
I would like to test if ALL element of the sequence equals to the same number.
My solution so far looks like imperative programming solution.
let test seq =
if Seq.forall(fun num -> num =1) then 1
elif Seq.forall(fun num-> num = 2) then 2
else None
Your solution is fine! Checking that all elements of a sequence have some value is not something you can nicely express using pattern matching - you have to use when clause, but that's doing exactly the same thing as your code (but with longer syntax). In cases like this, there is absolutely nothing wrong with using if.
You can extend pattern matching by definining custom active patterns, which gives you a nice option here. This is fairly advanced F#, but you can define a custom pattern ForAll n that succeeds when the input is a sequence containing just n values:
let (|ForAll|_|) n seq =
if Seq.forall (fun num -> num = n) seq then Some() else None
Note that success is represented as Some and failure as None. Now, you can solve your problem very nicely using pattern matching:
let test = function
| ForAll 1 -> Some 1
| ForAll 2 -> Some 2
| _ -> None
This looks quite nice, but it's relying on more advanced features - I would do this if this is something that you need in more than one place. If I needed this just in one place, I'd go with ordinary if.
You can rewrite it using pattern matching with a guard clause:
let testList = [2;2;2]
let isOne x = x = 1
let isTwo x = x = 2
let forAll = function
| list when list |> List.forall isOne -> Some 1
| list when list |> List.forall isTwo -> Some 2
| _ -> None
let res = forAll testList //Some 2
Instead of the function you could use partial application on the equals operator.
> let yes = [1;1;1];;
val yes : int list = [1; 1; 1]
> let no = [1;2;3];;
val no : int list = [1; 2; 3]
> yes |> List.forall ((=) 1);;
val it : bool = true
> no |> List.forall ((=) 1);;
val it : bool = false
Maybe this looks more functional? And I think you should return Some 1 in your code, otherwise you'd get type errors since Option and int are not the same type...
If you want to check if all elements are equal (not just if they equal some constant), you could do this:
> [1;2] |> List.pairwise |> List.forall (fun (a,b) -> a = b)
;;
val it : bool = false
> [1;1;1] |> List.pairwise |> List.forall (fun (a,b) -> a = b)
;;
val it : bool = true
There you split your list into tuples and checks if the tuples are equal. This means transitively that all elements are equal.

Get elements between two elements in an F# collection

I'd like to take a List or Array, and given two elements in the collection, get all elements between them. But I want to do this in a circular fashion, such that given a list [1;2;3;4;5;6] and if I ask for the elements that lie between 4 then 2, I get back [5;6;1]
Being used to imperative programming I can easily do this with loops, but I imagine there may be a nicer idiomatic approach to it in F#.
Edit
Here is an approach I came up with, having found the Array.indexed function
let elementsBetween (first:int) (second:int) (elements: array<'T>) =
let diff = second - first
elements
|> Array.indexed
|> Array.filter (fun (index,element) -> if diff = 0 then false
else if diff > 0 then index > first && index < second
else if diff < 0 then index > first || index < second
else false
This approach will only work with arrays obviously but this seems pretty good. I have a feeling I could clean it up by replacing the if/then/else with pattern matching but am not sure how to do that cleanly.
You should take a look at MSDN, Collections.Seq Module for example.
Let's try to be clever:
let elementsBetween a e1 e2 =
let aa = a |> Seq.append a
let i1 = aa |> Seq.findIndex (fun e -> e = e1)
let i2 = aa |> Seq.skip i1 |> Seq.findIndex (fun e -> e = e2)
aa |> Seq.skip(i1+1) |> Seq.take(i2-1)
I am not on my normal computer with an f# compiler, so I haven't tested it yet. It should look something like this
[Edit] Thank you #FoggyFinder for showing me https://dotnetfiddle.net/. I have now tested the code below with it.
[Edit] This should find the circular range in a single pass.
let x = [1;2;3;4;5]
let findCircRange l first second =
let rec findUpTo (l':int list) f (s:int) : (int list * int list) =
match l' with
| i::tail ->
if i = s then tail, (f [])
else findUpTo tail (fun acc -> f (i::acc)) s
// In case we are passed an empty list.
| _ -> [], (f [])
let remainder, upToStart = findUpTo l id first
// concatenate the list after start with the list before start.
let newBuffer = remainder#upToStart
snd <| findUpTo newBuffer id second
let values = findCircRange x 4 2
printf "%A" values
findUpTo takes a list (l'), a function for creating a remainder list (f) and a value to look for (s). We recurse through it (tail recursion) to find the list up to the given value and the list after the given value. Wrap the buffer around by appending the end to the remainder. Pass it to the findUpTo again to find up to the end. Return the buffer up to the end.
We pass a function for accumulating found items. This technique allows us to append to the end of the list as the function calls unwind.
Of course, there is no error checking here. We are assuming that start and end do actually exist. That will be left to an exercise for the reader.
Here is a variation using your idea of diff with list and list slicing
<some list.[x .. y]
let between (first : int) (second : int) (l : 'a list) : 'a list =
if first < 0 then
failwith "first cannot be less than zero"
if second < 0 then
failwith "second cannot be less than zero"
if first > (l.Length * 2) then
failwith "first cannot be greater than length of list times 2"
if second > (l.Length * 2) then
failwith "second cannot be greater than length of list times 2"
let diff = second - first
match diff with
| 0 -> []
| _ when diff > 0 && (abs diff) < l.Length -> l.[(first + 1) .. (second - 1)]
| _ when diff > 0 -> (l#l).[(first + 1) .. (second - 1)]
| _ when diff < 0 && (abs diff) < l.Length -> l.[(second + 1) .. (second + first - 1)]
| _ when diff < 0 -> (l#l).[(second + 1) .. (second + first - 1)]

Aggregation function - f# vs c# performance

I have a function that I use a lot and hence the performance needs to be as good as possible. It takes data from excel and then sums, averages or counts over parts of the data based on whether the data is within a certain period and whether it is a peak hour (Mo-Fr 8-20).
The data is usually around 30,000 rows and 2 columns (hourly date, value). One important feature of the data is that the date column is chronologically ordered
I have three implementations, c# with extension methods (dead slow and I m not going to show it unless somebody is interested).
Then I have this f# implementation:
let ispeak dts =
let newdts = DateTime.FromOADate dts
match newdts.DayOfWeek, newdts.Hour with
| DayOfWeek.Saturday, _ | DayOfWeek.Sunday, _ -> false
| _, h when h >= 8 && h < 20 -> true
| _ -> false
let internal isbetween a std edd =
match a with
| r when r >= std && r < edd+1. -> true
| _ -> false
[<ExcelFunction(Name="aggrF")>]
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let newd =
[0 .. (Array.length data) - 1]
|> List.map (fun i -> (data.[i], data2.[i]))
|> Seq.filter (fun (date, _) ->
let dateInRange = isbetween date std edd
match pob with
| "Peak" -> ispeak date && dateInRange
| "Offpeak" -> not(ispeak date) && dateInRange
| _ -> dateInRange)
match sac with
| 0 -> newd |> Seq.averageBy (fun (_, value) -> value)
| 2 -> newd |> Seq.sumBy (fun (_, value) -> 1.0)
| _ -> newd |> Seq.sumBy (fun (_, value) -> value)
I see two issues with this:
I need to prepare the data because both date and value are double[]
I do not utilize the knowledge that dates are chronological hence I do unnecessary iterations.
Here comes now what I would call a brute force imperative c# version:
public static bool ispeak(double dats)
{
var dts = System.DateTime.FromOADate(dats);
if (dts.DayOfWeek != DayOfWeek.Sunday & dts.DayOfWeek != DayOfWeek.Saturday & dts.Hour > 7 & dts.Hour < 20)
return true;
else
return false;
}
[ExcelFunction(Description = "Aggregates HFC/EG into average or sum over period, start date inclusive, end date exclusive")]
public static double aggrI(double[] dts, double[] vals, double std, double edd, string pob, double sumavg)
{
double accsum = 0;
int acccounter = 0;
int indicator = 0;
bool peakbool = pob.Equals("Peak", StringComparison.OrdinalIgnoreCase);
bool offpeakbool = pob.Equals("Offpeak", StringComparison.OrdinalIgnoreCase);
bool basebool = pob.Equals("Base", StringComparison.OrdinalIgnoreCase);
for (int i = 0; i < vals.Length; ++i)
{
if (dts[i] >= std && dts[i] < edd + 1)
{
indicator = 1;
if (peakbool && ispeak(dts[i]))
{
accsum += vals[i];
++acccounter;
}
else if (offpeakbool && (!ispeak(dts[i])))
{
accsum += vals[i];
++acccounter;
}
else if (basebool)
{
accsum += vals[i];
++acccounter;
}
}
else if (indicator == 1)
{
break;
}
}
if (sumavg == 0)
{
return accsum / acccounter;
}
else if (sumavg == 2)
{
return acccounter;
}
else
{
return accsum;
}
}
This is much faster (I m guessing mainly because of the exit of loop when period ended) but oviously less succinct.
My questions:
Is there a way to stop iterations in the f# Seq module for sorted series?
Is there another way to speed up the f# version?
can somebody think of an even better way of doing this?
Thanks a lot!
Update: Speed comparison
I set up a test array with hourly dates from 1/1/13-31/12/15 (roughly 30,000 rows) and corresponding values. I made 150 calls spread out over the date array and repeated this 100 times - 15000 function calls:
My csharp implementation above (with string.compare outside of loop)
1.36 secs
Matthews recursion fsharp
1.55 secs
Tomas array fsharp
1m40secs
My original fsharp
2m20secs
Obviously this is always subjective to my machine but gives an idea and people asked for it...
I also think one should keep in mind this doesnt mean recursion or for loops are always faster than array.map etc, just in this case it does a lot of unnecessary iterations as it doesnt have the early exit from iterations that the c# and the f# recursion method have
Using Array instead of List and Seq makes this about 3-4 times faster. You do not need to generate a list of indices and then map over that to lookup items in the two arrays - instead you can use Array.zip to combine the two arrays into a single one and then use Array.filter.
In general, if you want performance, then using array as your data structure will make sense (unless you have a long pipeline of things). Functions like Array.zip and Array.map can calculate the entire array size, allocate it and then do efficient imperative operation (while still looking functional from the outside).
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let newd =
Array.zip data data2
|> Array.filter (fun (date, _) ->
let dateInRange = isbetween date std edd
match pob with
| "Peak" -> ispeak date && dateInRange
| "Offpeak" -> not(ispeak date) && dateInRange
| _ -> dateInRange)
match sac with
| 0 -> newd |> Array.averageBy (fun (_, value) -> value)
| 2 -> newd |> Array.sumBy (fun (_, value) -> 1.0)
| _ -> newd |> Array.sumBy (fun (_, value) -> value)
I also changed isbetween - it can be simplified into just an expression and you can mark it inline, but that does not add that much:
let inline isbetween r std edd = r >= std && r < edd+1.
Just for completeness, I tested this with the following code (using F# Interactive):
#time
let d1 = Array.init 1000000 float
let d2 = Array.init 1000000 float
aggrF d1 d2 0.0 1000000.0 "Test" 0
The original version was about ~600ms and the new version using arrays takes between 160ms and 200ms. The version by Matthew takes about ~520ms.
Aside, I spent the last two months at BlueMountain Capital working on a time series/data frame library for F# that would make this a lot simpler. It is work in progress and also the name of the library will change, but you can find it in BlueMountain GitHub. The code would look something like this (it uses the fact that the time series is ordered and uses slicing to get the relevant part before filtering):
let ts = Series(times, values)
ts.[std .. edd] |> Series.filter (fun k _ -> not (ispeak k)) |> Series.mean
Currently, this will not be as fast as direct array operations, but I'll look into that :-).
An immediate way to speed it up would be to combine these:
[0 .. (Array.length data) - 1]
|> List.map (fun i -> (data.[i], data2.[i]))
|> Seq.filter (fun (date, _) ->
into a single list comprehension, and also as the other matthew said, do a single string comparison:
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let isValidTime = match pob with
| "Peak" -> (fun x -> ispeak x)
| "Offpeak" -> (fun x -> not(ispeak x))
| _ -> (fun _ -> true)
let data = [ for i in 0 .. (Array.length data) - 1 do
let (date, value) = (data.[i], data2.[i])
if isbetween date std edd && isValidTime date then
yield (date, value)
else
() ]
match sac with
| 0 -> data |> Seq.averageBy (fun (_, value) -> value)
| 2 -> data.Length
| _ -> data |> Seq.sumBy (fun (_, value) -> value)
Or use a tail recursive function:
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let isValidTime = match pob with
| "Peak" -> (fun x -> ispeak x)
| "Offpeak" -> (fun x -> not(ispeak x))
| _ -> (fun _ -> true)
let endDate = edd + 1.0
let rec aggr i sum count =
if i >= (Array.length data) || data.[i] >= endDate then
match sac with
| 0 -> sum / float(count)
| 2 -> float(count)
| _ -> float(sum)
else if data.[i] >= std && isValidTime data.[i] then
aggr (i + 1) (sum + data2.[i]) (count + 1)
else
aggr (i + 1) sum count
aggr 0 0.0 0

Built in equality of lists using a custom comparison function?

Is there a built-in function which does the following?
let rec listsEqual xl yl f =
match xl, yl with
| [], [] -> true
| [], _ | _, [] -> false
| xh::xt, yh::yt -> if f xh yh then listsEqual xt yt f else false
Updated, further elaboration: and in general is there any way to tap in to structural comparison but using a custom comparison function?
List.forall2 : (('a -> 'b -> bool) -> 'a list -> 'b list -> bool)
But it takes f before the lists. You can create your function like this:
let listsEqual x y f =
if List.length x = List.length y then
List.forall2 f x y
else
false
Remember that List.forall2 assumes the lengths are the same.
Concerning Seq.compareWith, you wrote:
not quite, two problems 1) expects the
two sequences be of the same type, 2)
doesn't short circuit
2) is wrong, the function really does a court-circuit.
1) is true. Take Seq.compareWith from F# library, modify (or remove) the type annotation and it will work for sequences of different types.
[<CompiledName("CompareWith")>]
let compareWith (f:'T1 -> 'T2 -> int) (source1 : seq<'T1>) (source2: seq<'T2>) =
//checkNonNull "source1" source1
//checkNonNull "source2" source2
use e1 = source1.GetEnumerator()
use e2 = source2.GetEnumerator()
let rec go () =
let e1ok = e1.MoveNext()
let e2ok = e2.MoveNext()
let c = (if e1ok = e2ok then 0 else if e1ok then 1 else -1)
if c <> 0 then c else
if not e1ok || not e2ok then 0
else
let c = f e1.Current e2.Current
if c <> 0 then c else
go ()
go()
Now, you can send an email to fsbugs (# microsoft.com) and ask them to remove the type constraint in the next F# release.

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources