Take N elements from sequence with N different indexes in F# - f#

I'm new to F# and looking for a function which take N*indexes and a sequence and gives me N elements. If I have N indexes it should be equal to concat Seq.nth index0, Seq.nth index1 .. Seq.nth indexN but it should only scan over indexN elements (O(N)) in the sequence and not index0+index1+...+indexN (O(N^2)).
To sum up, I'm looking for something like:
//For performance, the index-list should be ordered on input, be padding between elements instead of indexes or be ordered when entering the function
seq {10 .. 20} |> Seq.takeIndexes [0;5;10]
Result: 10,15,20
I could make this by using seq { yield... } and have a index-counter to tick when some element should be passed out but if F# offers a nice standard way I would rather use that.
Thanks :)...
Addition: I have made the following. It works but ain't pretty. Suggestions is welcomed
let seqTakeIndexes (indexes : int list) (xs : seq<int>) =
seq {
//Assume indexes is sorted
let e = xs.GetEnumerator()
let i = ref indexes
let curr = ref 0
while e.MoveNext() && not (!i).IsEmpty do
if !curr = List.head !i then
i := (!i).Tail
yield e.Current
curr := !curr + 1
}

When you want to access elements by index, then using sequences isn't as good idea. Sequences are designed to allow sequential iteration. I would convert the necessary part of the sequence to an array and then pick the elements by index:
let takeIndexes ns input =
// Take only elements that we need to access (sequence could be infinite)
let arr = input |> Seq.take (1 + Seq.max ns) |> Array.ofSeq
// Simply pick elements at the specified indices from the array
seq { for index in ns -> arr.[index] }
seq [10 .. 20] |> takeIndexes [0;5;10]
Regarding your implementation - I don't think it can be made significantly more elegant. This is a general problem when implementing functions that need to take values from multiple sources in an interleaved fashion - there is just no elegant way of writing those!
However, you can write this in a functional way using recursion like this:
let takeIndexes indices (xs:seq<int>) =
// Iterates over the list of indices recursively
let rec loop (xe:IEnumerator<_>) idx indices = seq {
let next = loop xe (idx + 1)
// If the sequence ends, then end as well
if xe.MoveNext() then
match indices with
| i::indices when idx = i ->
// We're passing the specified index
yield xe.Current
yield! next indices
| _ ->
// Keep waiting for the first index from the list
yield! next indices }
seq {
// Note: 'use' guarantees proper disposal of the source sequence
use xe = xs.GetEnumerator()
yield! loop xe 0 indices }
seq [10 .. 20] |> takeIndexes [0;5;10]

When you need to scan a sequence and accumulate results in O(n), you can always fall back to Seq.fold:
let takeIndices ind sq =
let selector (idxLeft, currIdx, results) elem =
match idxLeft with
| [] -> (idxLeft, currIdx, results)
| idx::moreIdx when idx = currIdx -> (moreIdx, currIdx+1, elem::results)
| idx::_ when idx <> currIdx -> (idxLeft, currIdx+1, results)
| idx::_ -> invalidOp "Can't get here."
let (_, _, results) = sq |> Seq.fold selector (ind, 0, [])
results |> List.rev
seq [10 .. 20] |> takeIndices [0;5;10]
The drawback of this solution is that it will enumerate the sequence to the end, even if it has accumulated all the desired elements already.

Here is my shot at this. This solution will only go as far as it needs into the sequence and returns the elements as a list.
let getIndices xs (s:_ seq) =
let enum = s.GetEnumerator()
let rec loop i acc = function
| h::t as xs ->
if enum.MoveNext() then
if i = h then
loop (i+1) (enum.Current::acc) t
else
loop (i+1) acc xs
else
raise (System.IndexOutOfRangeException())
| _ -> List.rev acc
loop 0 [] xs
[10..20]
|> getIndices [2;4;8]
// Returns [12;14;18]
The only assumption made here is that the index list you supply is sorted. The function won't work properly otherwise.

Is it a problem, that the returned result is sorted?
This algorithm will work linearly over the input sequence. Just the indices need to be sorted. If the sequence is large, but indices are not so many - it'll be fast.
Complexity is: N -> Max(indices), M -> count of indices: O(N + MlogM) in the worst case.
let seqTakeIndices indexes =
let rec gather prev idxs xs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { let left = xs |> Seq.skip (n - prev)
yield left |> Seq.head
yield! gather n ns left }
indexes |> List.sort |> gather 0
Here is a List.fold variant, but is more complex to read. I prefer the first:
let seqTakeIndices indices xs =
let gather (prev, xs, res) n =
let left = xs |> Seq.skip (n - prev)
n, left, (Seq.head left)::res
let _, _, res = indices |> List.sort |> List.fold gather (0, xs, [])
res
Appended: Still slower than your variant, but a lot faster than mine older variants. Because of not using Seq.skip that is creating new enumerators and was slowing down things a lot.
let seqTakeIndices indices (xs : seq<_>) =
let enum = xs.GetEnumerator()
enum.MoveNext() |> ignore
let rec gather prev idxs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { if [1..n-prev] |> List.forall (fun _ -> enum.MoveNext()) then
yield enum.Current
yield! gather n ns }
indices |> List.sort |> gather 0

Related

How to split a sequence in F# based on another sequence in an idiomatic way

I have, in F#, 2 sequences, each containing distinct integers, strictly in ascending order: listMaxes and numbers.
If not Seq.isEmpty numbers, then it is guaranteed that not Seq.isEmpty listMaxes and Seq.last listMaxes >= Seq.last numbers.
I would like to implement in F# a function that returns a list of list of integers, whose List.length equals Seq.length listMaxes, containing the elements of numbers divided in lists, where the elements of listMaxes limit each group.
For example: called with the arguments
listMaxes = seq [ 25; 56; 65; 75; 88 ]
numbers = seq [ 10; 11; 13; 16; 20; 25; 31; 38; 46; 55; 65; 76; 88 ]
this function should return
[ [10; 11; 13; 16; 20; 25]; [31; 38; 46; 55]; [65]; List.empty; [76; 88] ]
I can implement this function, iterating over numbers only once:
let groupByListMaxes listMaxes numbers =
if Seq.isEmpty numbers then
List.replicate (Seq.length listMaxes) List.empty
else
List.ofSeq (seq {
use nbe = numbers.GetEnumerator ()
ignore (nbe.MoveNext ())
for lmax in listMaxes do
yield List.ofSeq (seq {
if nbe.Current <= lmax then
yield nbe.Current
while nbe.MoveNext () && nbe.Current <= lmax do
yield nbe.Current
})
})
But this code feels unclean, ugly, imperative, and very un-F#-y.
Is there any functional / F#-idiomatic way to achieve this?
Here's a version based on list interpretation, which is quite functional in style. You can use Seq.toList to convert between them, whenever you want to handle that. You could also use Seq.scan in conjunction with Seq.partition ((>=) max) if you want to use only library functions, but beware that it's very very easy to introduce a quadratic complexity in either computation or memory when doing that.
This is linear in both:
let splitAt value lst =
let rec loop l1 = function
| [] -> List.rev l1, []
| h :: t when h > value -> List.rev l1, (h :: t)
| h :: t -> loop (h :: l1) t
loop [] lst
let groupByListMaxes listMaxes numbers =
let rec loop acc lst = function
| [] -> List.rev acc
| h :: t ->
let out, lst' = splitAt h lst
loop (out :: acc) lst' t
loop [] numbers listMaxes
It can be done like this with pattern matching and tail recursion:
let groupByListMaxes listMaxes numbers =
let rec inner acc numbers =
function
| [] -> acc |> List.rev
| max::tail ->
let taken = numbers |> Seq.takeWhile ((>=) max) |> List.ofSeq
let n = taken |> List.length
inner (taken::acc) (numbers |> Seq.skip n) tail
inner [] numbers (listMaxes |> List.ofSeq)
Update: I also got inspired by fold and came up with the following solution that strictly refrains from converting the input sequences.
let groupByListMaxes maxes numbers =
let rec inner (acc, (cur, numbers)) max =
match numbers |> Seq.tryHead with
// Add n to the current list of n's less
// than the local max
| Some n when n <= max ->
let remaining = numbers |> Seq.tail
inner (acc, (n::cur, remaining)) max
// Complete the current list by adding it
// to the accumulated result and prepare
// the next list for fold.
| _ ->
(List.rev cur)::acc, ([], numbers)
maxes |> Seq.fold inner ([], ([], numbers)) |> fst |> List.rev
I have found a better implementation myself. Tips for improvements are still welcome.
Dealing with 2 sequences is really a pain. And I really do want to iterate over numbers only once without turning that sequence into a list. But then I realized that turning listMaxes (generally the shorter of the sequences) is less costly. That way only 1 sequence remains, and I can use Seq.fold over numbers.
What should be the state that we want to keep and change while iterating with Seq.fold over numbers? First, it should definitely include the remaining of the listMaxes, yet the previous maxes that we already have surpassed are no longer of interest. Second, the accumulated lists so far, although, like in the other answers, these can be kept in reverse order. More to the point: the state is a couple which has as second element a reversed list of reversed lists of the numbers so far.
let groupByListMaxes listMaxes numbers =
let rec folder state number =
match state with
| m :: maxes, _ when number > m ->
folder (maxes, List.empty :: snd state) number
| m :: maxes, [] ->
fst state, List.singleton (List.singleton number)
| m :: maxes, h :: t ->
fst state, (number :: h) :: t
| [], _ ->
failwith "Guaranteed not to happen"
let listMaxesList = List.ofSeq listMaxes
let initialState = listMaxesList, List.empty
let reversed = snd (Seq.fold folder initialState numbers)
let temp = List.rev (List.map List.rev reversed)
let extraLength = List.length listMaxesList - List.length temp
let extra = List.replicate extraLength List.empty
List.concat [temp; extra]
I know this is an old question but I had a very similar problem and I think this is a simple solution:
let groupByListMaxes cs xs =
List.scan (fun (_, xs) c -> List.partition (fun x -> x <= c) xs)
([], xs)
cs
|> List.skip 1
|> List.map fst

Finding (and removing) repeating pairs in array

I want a way to get rid of repeating pairs in an array. For my problem, the pairs will be consecutive, and there will be at most one repeating pair.
My current implementation seems too complicated. The elements 3 and 4 form what I'm calling a repeating pair in arr1 below. As a pair, they only appear once in the desired output, arr2. What are some more efficient ways?
let arr1=[|4; 2; 3; 4; 3; 4; 1|]
let n=arr1.Length
let iPlus2IsEqual=Array.map2 (fun x y -> x=y) arr1.[2..] arr1.[..(n-3)]
let consecutive=Array.map2 (fun x y -> x && y) iPlus2IsEqual.[1..] iPlus2IsEqual.[..(n-4)] |> Array.tryFindIndex (fun x -> x)
let dup=if consecutive.IsSome then consecutive.Value+1 else n-1
let arr2=if dup>=n-3 then arr1.[..dup] else Array.append arr1.[..dup] arr1.[(dup+3)..]
>
val arr2 : int [] = [|4; 2; 3; 4; 1|]
We can use recursion like so (it will get multiple repeats for free too)
let rec filterrepeats l =
match l with
|a::b::c::d::t when a=c && b=d -> a::b::(filterrepeats t)
|h::t ->h::(filterrepeats t)
|[] -> []
> filterrepeats [4;2;3;4;3;4;1];;
val it : int list = [4; 2; 3; 4; 1]
This works on lists, so you will need to add a call to Array.toList before you run it.
The above is not tail recursive as the compiler doesn't know what goes on the right hand side of h::(filterrepeats t) until after the function call. You can solve this by using an accumulator like so:
let rec filterrepeats l =
let rec loop l acc =
match l with
|a::b::c::d::t when a=c && b=d ->loop t (b::a::acc)
|h::t ->loop t (h::acc)
|[] -> acc
loop (List.rev l) []
For large arrays this is around 13x faster than your solution:
let inline tryFindDuplicatedPairIndex (xs: _ []) =
let rec loop i x0 x1 x2 =
if i < xs.Length-4 then
let x3 = xs.[i+3]
if x0=x2 && x1=x3 then Some i else
loop (i+1) x1 x2 x3
else None
if xs.Length < 4 then None else
loop 0 xs.[0] xs.[1] xs.[2]
let inline removeDuplicatedPair (xs: _ []) =
match tryFindDuplicatedPairIndex xs with
| None -> Array.copy xs
| Some i ->
let ys = Array.zeroCreate (xs.Length-2)
for j=0 to i-1 do
ys.[j] <- xs.[j]
for j=i+2 to xs.Length-1 do
ys.[j-2] <- xs.[j]
ys
I use inline and test elements individually (i.e. rather than as a tuple: (x0,x1) = (x2,x3)) to try to prevent = from being a generic equality test because that is very slow. I've reused previous array lookups from one iteration to the next. I copy the input array if the output is identical to the input and pre-allocate an array with n-2 elements otherwise. I've hand-rolled the copying to my pre-allocated array to avoid creating any garbage (e.g. instead of Array.append of two slices).
No stack overflow with large list (length >= 100K) and remove all duplicate pairs
let rec distinctPairs list =
List.foldBack (fun x (l,r) -> x::r, l) list ([],[])
|> fun (odds, evens) -> List.zip odds evens
|> Seq.distinct
Not very fast, 1M list take 500ms, anyway faster ?
Only work for list with even length

Rfactor this F# code to tail recursion

I write some code to learning F#.
Here is a example:
let nextPrime list=
let rec loop n=
match n with
| _ when (list |> List.filter (fun x -> x <= ( n |> double |> sqrt |> int)) |> List.forall (fun x -> n % x <> 0)) -> n
| _ -> loop (n+1)
loop (List.max list + 1)
let rec findPrimes num=
match num with
| 1 -> [2]
| n ->
let temp = findPrimes <| n-1
(nextPrime temp ) :: temp
//find 10 primes
findPrimes 10 |> printfn "%A"
I'm very happy that it just works!
I'm totally beginner to recursion
Recursion is a wonderful thing.
I think findPrimes is not efficient.
Someone help me to refactor findPrimes to tail recursion if possible?
BTW, is there some more efficient way to find first n primes?
Regarding the first part of your question, if you want to write a recursive list building function tail-recursively you should pass the list of intermediate results as an extra parameter to the function. In your case this would be something like
let findPrimesTailRecursive num =
let rec aux acc num =
match num with
| 1 -> acc
| n -> aux ((nextPrime acc)::acc) (n-1)
aux [2] num
The recursive function aux gathers its results in an extra parameter conveniently called acc (as in acc-umulator). When you reach your ending condition, just spit out the accumulated result. I've wrapped the tail-recursive helper function in another function, so the function signature remains the same.
As you can see, the call to aux is the only, and therefore last, call to happen in the n <> 1 case. It's now tail-recursive and will compile into a while loop.
I've timed your version and mine, generating 2000 primes. My version is 16% faster, but still rather slow. For generating primes, I like to use an imperative array sieve. Not very functional, but very (very) fast.
An alternative is to use an extra continuation argument to make findPrimes tail recursive. This technique always works. It will avoid stack overflows, but probably won't make your code faster.
Also, I put your nextPrime function a little closer to the style I'd use.
let nextPrime list=
let rec loop n = if list |> List.filter (fun x -> x*x <= n)
|> List.forall (fun x -> n % x <> 0)
then n
else loop (n+1)
loop (1 + List.head list)
let rec findPrimesC num cont =
match num with
| 1 -> cont [2]
| n -> findPrimesC (n-1) (fun temp -> nextPrime temp :: temp |> cont)
let findPrimes num = findPrimesC num (fun res -> res)
findPrimes 10
As others have said, there's faster ways to generate primes.
Why not simply write:
let isPrime n =
if n<=1 then false
else
let m = int(sqrt (float(n)))
{2..m} |> Seq.forall (fun i->n%i<>0)
let findPrimes n =
{2..n} |> Seq.filter isPrime |> Seq.toList
or sieve (very fast):
let generatePrimes max=
let p = Array.create (max+1) true
let rec filter i step =
if i <= max then
p.[i] <- false
filter (i+step) step
{2..int (sqrt (float max))} |> Seq.iter (fun i->filter (i+i) i)
{2..max} |> Seq.filter (fun i->p.[i]) |> Seq.toArray
BTW, is there some more efficient way to find first n primes?
I described a fast arbitrary-size Sieve of Eratosthenes in F# here that accumulated its results into an ever-growing ResizeArray:
> let primes =
let a = ResizeArray[2]
let grow() =
let p0 = a.[a.Count-1]+1
let b = Array.create p0 true
for di in a do
let rec loop i =
if i<b.Length then
b.[i] <- false
loop(i+di)
let i0 = p0/di*di
loop(if i0<p0 then i0+di-p0 else i0-p0)
for i=0 to b.Length-1 do
if b.[i] then a.Add(p0+i)
fun n ->
while n >= a.Count do
grow()
a.[n];;
val primes : (int -> int)
I know that this is a bit late, and an answer was already accepted. However, I believe that a good step by step guide to making something tail recursive may be of interest to the OP or anyone else for that matter. Here are some tips that have certainly helped me out. I'm going to use a strait-forward example other than prime generation because, as others have stated, there are better ways to generate primes.
Consider a naive implementation of a count function that will create a list of integers counting down from some n. This version is not tail recursive so for long lists you will encounter a stack overflow exception:
let rec countDown = function
| 0 -> []
| n -> n :: countDown (n - 1)
(* ^
|... the cons operator is in the tail position
as such it is evaluated last. this drags
stack frames through subsequent recursive
calls *)
One way to fix this is to apply continuation passing style with a parameterized function:
let countDown' n =
let rec countDown n k =
match n with
| 0 -> k [] (* v--- this is continuation passing style *)
| n -> countDown (n - 1) (fun ns -> n :: k ns)
(* ^
|... the recursive call is now in tail position *)
countDown n (fun ns -> ns)
(* ^
|... and we initialize k with the identity function *)
Then, refactor this parameterized function into a specialized representation. Notice that the function countDown' is not actually counting down. This is an artifact of the way the continuation is built up when n > 0 and then evaluated when n = 0. If you have something like the first example and you can't figure out how to make it tail recursive, what I'm suggesting is that you write the second one and then try to optimize it to eliminate the function parameter k. That will certainly improve the readability. This is an optimization of the second example:
let countDown'' n =
let rec countDown n ns =
match n with
| 0 -> List.rev ns (* reverse so we are actually counting down again *)
| n -> countDown (n - 1) (n :: ns)
countDown n []

Remove a single non-unique value from a sequence in F#

I have a sequence of integers representing dice in F#.
In the game in question, the player has a pool of dice and can choose to play one (governed by certain rules) and keep the rest.
If, for example, a player rolls a 6, 6 and a 4 and decides to play one the sixes, is there a simple way to return a sequence with only one 6 removed?
Seq.filter (fun x -> x != 6) dice
removes all of the sixes, not just one.
Non-trivial operations on sequences are painful to work with, since they don't support pattern matching. I think the simplest solution is as follows:
let filterFirst f s =
seq {
let filtered = ref false
for a in s do
if filtered.Value = false && f a then
filtered := true
else yield a
}
So long as the mutable implementation is hidden from the client, it's still functional style ;)
If you're going to store data I would use ResizeArray instead of a Sequence. It has a wealth of functions built in such as the function you asked about. It's simply called Remove. Note: ResizeArray is an abbreviation for the CLI type List.
let test = seq [1; 2; 6; 6; 1; 0]
let a = new ResizeArray<int>(test)
a.Remove 6 |> ignore
Seq.toList a |> printf "%A"
// output
> [1; 2; 6; 1; 0]
Other data type options could be Array
let removeOneFromArray v a =
let i = Array.findIndex ((=)v) a
Array.append a.[..(i-1)] a.[(i+1)..]
or List
let removeOneFromList v l =
let rec remove acc = function
| x::xs when x = v -> List.rev acc # xs
| x::xs -> remove (x::acc) xs
| [] -> acc
remove [] l
the below code will work for a list (so not any seq but it sounds like the sequence your using could be a List)
let rec removeOne value list =
match list with
| head::tail when head = value -> tail
| head::tail -> head::(removeOne value tail)
| _ -> [] //you might wanna fail here since it didn't find value in
//the list
EDIT: code updated based on correct comment below. Thanks P
EDIT: After reading a different answer I thought that a warning would be in order. Don't use the above code for infite sequences but since I guess your players don't have infite dice that should not be a problem but for but for completeness here's an implementation that would work for (almost) any
finite sequence
let rec removeOne value seq acc =
match seq.Any() with
| true when s.First() = value -> seq.Skip(1)
| true -> seq.First()::(removeOne value seq.Skip(1))
| _ -> List.rev acc //you might wanna fail here since it didn't find value in
//the list
However I recommend using the first solution which Im confident will perform better than the latter even if you have to turn a sequence into a list first (at least for small sequences or large sequences with the soughtfor value in the end)
I don't think there is any function that would allow you to directly represent the idea that you want to remove just the first element matching the specified criteria from the list (e.g. something like Seq.removeOne).
You can implement the function in a relatively readable way using Seq.fold (if the sequence of numbers is finite):
let removeOne f l =
Seq.fold (fun (removed, res) v ->
if removed then true, v::res
elif f v then true, res
else false, v::res) (false, []) l
|> snd |> List.rev
> removeOne (fun x -> x = 6) [ 1; 2; 6; 6; 1 ];
val it : int list = [1; 2; 6; 1]
The fold function keeps some state - in this case of type bool * list<'a>. The Boolean flag represents whether we already removed some element and the list is used to accumulate the result (which has to be reversed at the end of processing).
If you need to do this for (possibly) infinite seq<int>, then you'll need to use GetEnumerator directly and implement the code as a recursive sequence expression. This is a bit uglier and it would look like this:
let removeOne f (s:seq<_>) =
// Get enumerator of the input sequence
let en = s.GetEnumerator()
let rec loop() = seq {
// Move to the next element
if en.MoveNext() then
// Is this the element to skip?
if f en.Current then
// Yes - return all remaining elements without filtering
while en.MoveNext() do
yield en.Current
else
// No - return this element and continue looping
yield en.Current
yield! loop() }
loop()
You can try this:
let rec removeFirstOccurrence item screened items =
items |> function
| h::tail -> if h = item
then screened # tail
else tail |> removeFirstOccurrence item (screened # [h])
| _ -> []
Usage:
let updated = products |> removeFirstOccurrence product []

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources