How can I test if a sequence is empty in F#? - f#

Consider this F# code which computes a factor of a number:
let n = 340339004337I
// A sequence of all factors:
let factors = seq { 1I .. n / 2I} |> Seq.filter (fun x -> n % x = 0I)
// Pull off the first factor from the sequence:
let factor =
if factors = seq [] then
n
else
factors |> Seq.nth 0
In other words, if factors is empty, then return n. Otherwise, pull off the first element from factors. The goal is to account for all factors between 1 and (n/2), and n itself since 1 and n are always factors of n.
The factors = seq [] test isn't working. I arrived at this syntax by looking at this:
> seq {1 .. 100} |> Seq.filter (fun x -> false) ;;
val it : seq<int> = seq []
However, I don't think seq [] is actually an empty sequence:
> Seq.empty = seq [] ;;
val it : bool = false
How can I test if a sequence is empty?

Try Seq.isEmpty.
if Seq.isEmpty yourSeqName then doSomething else doSomethingElse
By the way, Seq.empty creates an empty Seq. It doesn't test for one.

Seq.isEmpty
http://msdn.microsoft.com/en-us/library/ee353547.aspx
The problem with your = test, I presume, is that it is comparing two different objects of type IEnumerable<int> for reference-equality.

Related

How to split a sequence in F# based on another sequence in an idiomatic way

I have, in F#, 2 sequences, each containing distinct integers, strictly in ascending order: listMaxes and numbers.
If not Seq.isEmpty numbers, then it is guaranteed that not Seq.isEmpty listMaxes and Seq.last listMaxes >= Seq.last numbers.
I would like to implement in F# a function that returns a list of list of integers, whose List.length equals Seq.length listMaxes, containing the elements of numbers divided in lists, where the elements of listMaxes limit each group.
For example: called with the arguments
listMaxes = seq [ 25; 56; 65; 75; 88 ]
numbers = seq [ 10; 11; 13; 16; 20; 25; 31; 38; 46; 55; 65; 76; 88 ]
this function should return
[ [10; 11; 13; 16; 20; 25]; [31; 38; 46; 55]; [65]; List.empty; [76; 88] ]
I can implement this function, iterating over numbers only once:
let groupByListMaxes listMaxes numbers =
if Seq.isEmpty numbers then
List.replicate (Seq.length listMaxes) List.empty
else
List.ofSeq (seq {
use nbe = numbers.GetEnumerator ()
ignore (nbe.MoveNext ())
for lmax in listMaxes do
yield List.ofSeq (seq {
if nbe.Current <= lmax then
yield nbe.Current
while nbe.MoveNext () && nbe.Current <= lmax do
yield nbe.Current
})
})
But this code feels unclean, ugly, imperative, and very un-F#-y.
Is there any functional / F#-idiomatic way to achieve this?
Here's a version based on list interpretation, which is quite functional in style. You can use Seq.toList to convert between them, whenever you want to handle that. You could also use Seq.scan in conjunction with Seq.partition ((>=) max) if you want to use only library functions, but beware that it's very very easy to introduce a quadratic complexity in either computation or memory when doing that.
This is linear in both:
let splitAt value lst =
let rec loop l1 = function
| [] -> List.rev l1, []
| h :: t when h > value -> List.rev l1, (h :: t)
| h :: t -> loop (h :: l1) t
loop [] lst
let groupByListMaxes listMaxes numbers =
let rec loop acc lst = function
| [] -> List.rev acc
| h :: t ->
let out, lst' = splitAt h lst
loop (out :: acc) lst' t
loop [] numbers listMaxes
It can be done like this with pattern matching and tail recursion:
let groupByListMaxes listMaxes numbers =
let rec inner acc numbers =
function
| [] -> acc |> List.rev
| max::tail ->
let taken = numbers |> Seq.takeWhile ((>=) max) |> List.ofSeq
let n = taken |> List.length
inner (taken::acc) (numbers |> Seq.skip n) tail
inner [] numbers (listMaxes |> List.ofSeq)
Update: I also got inspired by fold and came up with the following solution that strictly refrains from converting the input sequences.
let groupByListMaxes maxes numbers =
let rec inner (acc, (cur, numbers)) max =
match numbers |> Seq.tryHead with
// Add n to the current list of n's less
// than the local max
| Some n when n <= max ->
let remaining = numbers |> Seq.tail
inner (acc, (n::cur, remaining)) max
// Complete the current list by adding it
// to the accumulated result and prepare
// the next list for fold.
| _ ->
(List.rev cur)::acc, ([], numbers)
maxes |> Seq.fold inner ([], ([], numbers)) |> fst |> List.rev
I have found a better implementation myself. Tips for improvements are still welcome.
Dealing with 2 sequences is really a pain. And I really do want to iterate over numbers only once without turning that sequence into a list. But then I realized that turning listMaxes (generally the shorter of the sequences) is less costly. That way only 1 sequence remains, and I can use Seq.fold over numbers.
What should be the state that we want to keep and change while iterating with Seq.fold over numbers? First, it should definitely include the remaining of the listMaxes, yet the previous maxes that we already have surpassed are no longer of interest. Second, the accumulated lists so far, although, like in the other answers, these can be kept in reverse order. More to the point: the state is a couple which has as second element a reversed list of reversed lists of the numbers so far.
let groupByListMaxes listMaxes numbers =
let rec folder state number =
match state with
| m :: maxes, _ when number > m ->
folder (maxes, List.empty :: snd state) number
| m :: maxes, [] ->
fst state, List.singleton (List.singleton number)
| m :: maxes, h :: t ->
fst state, (number :: h) :: t
| [], _ ->
failwith "Guaranteed not to happen"
let listMaxesList = List.ofSeq listMaxes
let initialState = listMaxesList, List.empty
let reversed = snd (Seq.fold folder initialState numbers)
let temp = List.rev (List.map List.rev reversed)
let extraLength = List.length listMaxesList - List.length temp
let extra = List.replicate extraLength List.empty
List.concat [temp; extra]
I know this is an old question but I had a very similar problem and I think this is a simple solution:
let groupByListMaxes cs xs =
List.scan (fun (_, xs) c -> List.partition (fun x -> x <= c) xs)
([], xs)
cs
|> List.skip 1
|> List.map fst

Does Seq.groupBy preserve order within groups?

I want to group a sequence and then take the first occurrence of each element in the group. When I try this
Seq.groupBy f inSeq
|> Seq.map (fun (k,s) -> (k,s|>Seq.take 1|>Seq.exactlyOne))
I find that sometimes I get a different element from s. Is this expected?
Looking at the source of the groupBy implementation -
here's the relevant bit:
// Build the groupings
seq |> iter (fun v ->
let safeKey = keyf v
let mutable prev = Unchecked.defaultof<_>
match dict.TryGetValue (safeKey, &prev) with
| true -> prev.Add v
| false ->
let prev = ResizeArray ()
dict.[safeKey] <- prev
prev.Add v)
It iterates through the source array and adds the values to the corresponding list for the key. The order of subsequences is directly affected by the order of the input sequence. For the same input sequence, we can expect groupBy to return identical output sequences. This is how tests are coded for groupBy.
If you're seeing variations in the resulting sequences, check the input sequence.
Yes, this is expected. Sequences (seq) aren't guaranteed to be pure. You can define a sequence that will yield different values every time you iterate over them. If you call Seq.take 1 twice, you can get different results.
Consider, as an example, this sequence:
open System
let r = Random ()
let s = seq { yield r.Next(0, 9) }
If you call Seq.take 1 on that, you may get different results:
> s |> Seq.take 1;;
val it : seq<int> = seq [4]
> s |> Seq.take 1;;
val it : seq<int> = seq [1]
Using Seq.head isn't going to help you either:
> s |> Seq.head;;
val it : int = 2
> s |> Seq.head;;
val it : int = 6
If you want to guarantee deterministic behaviour, use a List instead.

Take N elements from sequence with N different indexes in F#

I'm new to F# and looking for a function which take N*indexes and a sequence and gives me N elements. If I have N indexes it should be equal to concat Seq.nth index0, Seq.nth index1 .. Seq.nth indexN but it should only scan over indexN elements (O(N)) in the sequence and not index0+index1+...+indexN (O(N^2)).
To sum up, I'm looking for something like:
//For performance, the index-list should be ordered on input, be padding between elements instead of indexes or be ordered when entering the function
seq {10 .. 20} |> Seq.takeIndexes [0;5;10]
Result: 10,15,20
I could make this by using seq { yield... } and have a index-counter to tick when some element should be passed out but if F# offers a nice standard way I would rather use that.
Thanks :)...
Addition: I have made the following. It works but ain't pretty. Suggestions is welcomed
let seqTakeIndexes (indexes : int list) (xs : seq<int>) =
seq {
//Assume indexes is sorted
let e = xs.GetEnumerator()
let i = ref indexes
let curr = ref 0
while e.MoveNext() && not (!i).IsEmpty do
if !curr = List.head !i then
i := (!i).Tail
yield e.Current
curr := !curr + 1
}
When you want to access elements by index, then using sequences isn't as good idea. Sequences are designed to allow sequential iteration. I would convert the necessary part of the sequence to an array and then pick the elements by index:
let takeIndexes ns input =
// Take only elements that we need to access (sequence could be infinite)
let arr = input |> Seq.take (1 + Seq.max ns) |> Array.ofSeq
// Simply pick elements at the specified indices from the array
seq { for index in ns -> arr.[index] }
seq [10 .. 20] |> takeIndexes [0;5;10]
Regarding your implementation - I don't think it can be made significantly more elegant. This is a general problem when implementing functions that need to take values from multiple sources in an interleaved fashion - there is just no elegant way of writing those!
However, you can write this in a functional way using recursion like this:
let takeIndexes indices (xs:seq<int>) =
// Iterates over the list of indices recursively
let rec loop (xe:IEnumerator<_>) idx indices = seq {
let next = loop xe (idx + 1)
// If the sequence ends, then end as well
if xe.MoveNext() then
match indices with
| i::indices when idx = i ->
// We're passing the specified index
yield xe.Current
yield! next indices
| _ ->
// Keep waiting for the first index from the list
yield! next indices }
seq {
// Note: 'use' guarantees proper disposal of the source sequence
use xe = xs.GetEnumerator()
yield! loop xe 0 indices }
seq [10 .. 20] |> takeIndexes [0;5;10]
When you need to scan a sequence and accumulate results in O(n), you can always fall back to Seq.fold:
let takeIndices ind sq =
let selector (idxLeft, currIdx, results) elem =
match idxLeft with
| [] -> (idxLeft, currIdx, results)
| idx::moreIdx when idx = currIdx -> (moreIdx, currIdx+1, elem::results)
| idx::_ when idx <> currIdx -> (idxLeft, currIdx+1, results)
| idx::_ -> invalidOp "Can't get here."
let (_, _, results) = sq |> Seq.fold selector (ind, 0, [])
results |> List.rev
seq [10 .. 20] |> takeIndices [0;5;10]
The drawback of this solution is that it will enumerate the sequence to the end, even if it has accumulated all the desired elements already.
Here is my shot at this. This solution will only go as far as it needs into the sequence and returns the elements as a list.
let getIndices xs (s:_ seq) =
let enum = s.GetEnumerator()
let rec loop i acc = function
| h::t as xs ->
if enum.MoveNext() then
if i = h then
loop (i+1) (enum.Current::acc) t
else
loop (i+1) acc xs
else
raise (System.IndexOutOfRangeException())
| _ -> List.rev acc
loop 0 [] xs
[10..20]
|> getIndices [2;4;8]
// Returns [12;14;18]
The only assumption made here is that the index list you supply is sorted. The function won't work properly otherwise.
Is it a problem, that the returned result is sorted?
This algorithm will work linearly over the input sequence. Just the indices need to be sorted. If the sequence is large, but indices are not so many - it'll be fast.
Complexity is: N -> Max(indices), M -> count of indices: O(N + MlogM) in the worst case.
let seqTakeIndices indexes =
let rec gather prev idxs xs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { let left = xs |> Seq.skip (n - prev)
yield left |> Seq.head
yield! gather n ns left }
indexes |> List.sort |> gather 0
Here is a List.fold variant, but is more complex to read. I prefer the first:
let seqTakeIndices indices xs =
let gather (prev, xs, res) n =
let left = xs |> Seq.skip (n - prev)
n, left, (Seq.head left)::res
let _, _, res = indices |> List.sort |> List.fold gather (0, xs, [])
res
Appended: Still slower than your variant, but a lot faster than mine older variants. Because of not using Seq.skip that is creating new enumerators and was slowing down things a lot.
let seqTakeIndices indices (xs : seq<_>) =
let enum = xs.GetEnumerator()
enum.MoveNext() |> ignore
let rec gather prev idxs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { if [1..n-prev] |> List.forall (fun _ -> enum.MoveNext()) then
yield enum.Current
yield! gather n ns }
indices |> List.sort |> gather 0

Rfactor this F# code to tail recursion

I write some code to learning F#.
Here is a example:
let nextPrime list=
let rec loop n=
match n with
| _ when (list |> List.filter (fun x -> x <= ( n |> double |> sqrt |> int)) |> List.forall (fun x -> n % x <> 0)) -> n
| _ -> loop (n+1)
loop (List.max list + 1)
let rec findPrimes num=
match num with
| 1 -> [2]
| n ->
let temp = findPrimes <| n-1
(nextPrime temp ) :: temp
//find 10 primes
findPrimes 10 |> printfn "%A"
I'm very happy that it just works!
I'm totally beginner to recursion
Recursion is a wonderful thing.
I think findPrimes is not efficient.
Someone help me to refactor findPrimes to tail recursion if possible?
BTW, is there some more efficient way to find first n primes?
Regarding the first part of your question, if you want to write a recursive list building function tail-recursively you should pass the list of intermediate results as an extra parameter to the function. In your case this would be something like
let findPrimesTailRecursive num =
let rec aux acc num =
match num with
| 1 -> acc
| n -> aux ((nextPrime acc)::acc) (n-1)
aux [2] num
The recursive function aux gathers its results in an extra parameter conveniently called acc (as in acc-umulator). When you reach your ending condition, just spit out the accumulated result. I've wrapped the tail-recursive helper function in another function, so the function signature remains the same.
As you can see, the call to aux is the only, and therefore last, call to happen in the n <> 1 case. It's now tail-recursive and will compile into a while loop.
I've timed your version and mine, generating 2000 primes. My version is 16% faster, but still rather slow. For generating primes, I like to use an imperative array sieve. Not very functional, but very (very) fast.
An alternative is to use an extra continuation argument to make findPrimes tail recursive. This technique always works. It will avoid stack overflows, but probably won't make your code faster.
Also, I put your nextPrime function a little closer to the style I'd use.
let nextPrime list=
let rec loop n = if list |> List.filter (fun x -> x*x <= n)
|> List.forall (fun x -> n % x <> 0)
then n
else loop (n+1)
loop (1 + List.head list)
let rec findPrimesC num cont =
match num with
| 1 -> cont [2]
| n -> findPrimesC (n-1) (fun temp -> nextPrime temp :: temp |> cont)
let findPrimes num = findPrimesC num (fun res -> res)
findPrimes 10
As others have said, there's faster ways to generate primes.
Why not simply write:
let isPrime n =
if n<=1 then false
else
let m = int(sqrt (float(n)))
{2..m} |> Seq.forall (fun i->n%i<>0)
let findPrimes n =
{2..n} |> Seq.filter isPrime |> Seq.toList
or sieve (very fast):
let generatePrimes max=
let p = Array.create (max+1) true
let rec filter i step =
if i <= max then
p.[i] <- false
filter (i+step) step
{2..int (sqrt (float max))} |> Seq.iter (fun i->filter (i+i) i)
{2..max} |> Seq.filter (fun i->p.[i]) |> Seq.toArray
BTW, is there some more efficient way to find first n primes?
I described a fast arbitrary-size Sieve of Eratosthenes in F# here that accumulated its results into an ever-growing ResizeArray:
> let primes =
let a = ResizeArray[2]
let grow() =
let p0 = a.[a.Count-1]+1
let b = Array.create p0 true
for di in a do
let rec loop i =
if i<b.Length then
b.[i] <- false
loop(i+di)
let i0 = p0/di*di
loop(if i0<p0 then i0+di-p0 else i0-p0)
for i=0 to b.Length-1 do
if b.[i] then a.Add(p0+i)
fun n ->
while n >= a.Count do
grow()
a.[n];;
val primes : (int -> int)
I know that this is a bit late, and an answer was already accepted. However, I believe that a good step by step guide to making something tail recursive may be of interest to the OP or anyone else for that matter. Here are some tips that have certainly helped me out. I'm going to use a strait-forward example other than prime generation because, as others have stated, there are better ways to generate primes.
Consider a naive implementation of a count function that will create a list of integers counting down from some n. This version is not tail recursive so for long lists you will encounter a stack overflow exception:
let rec countDown = function
| 0 -> []
| n -> n :: countDown (n - 1)
(* ^
|... the cons operator is in the tail position
as such it is evaluated last. this drags
stack frames through subsequent recursive
calls *)
One way to fix this is to apply continuation passing style with a parameterized function:
let countDown' n =
let rec countDown n k =
match n with
| 0 -> k [] (* v--- this is continuation passing style *)
| n -> countDown (n - 1) (fun ns -> n :: k ns)
(* ^
|... the recursive call is now in tail position *)
countDown n (fun ns -> ns)
(* ^
|... and we initialize k with the identity function *)
Then, refactor this parameterized function into a specialized representation. Notice that the function countDown' is not actually counting down. This is an artifact of the way the continuation is built up when n > 0 and then evaluated when n = 0. If you have something like the first example and you can't figure out how to make it tail recursive, what I'm suggesting is that you write the second one and then try to optimize it to eliminate the function parameter k. That will certainly improve the readability. This is an optimization of the second example:
let countDown'' n =
let rec countDown n ns =
match n with
| 0 -> List.rev ns (* reverse so we are actually counting down again *)
| n -> countDown (n - 1) (n :: ns)
countDown n []

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources