Finding (and removing) repeating pairs in array - f#

I want a way to get rid of repeating pairs in an array. For my problem, the pairs will be consecutive, and there will be at most one repeating pair.
My current implementation seems too complicated. The elements 3 and 4 form what I'm calling a repeating pair in arr1 below. As a pair, they only appear once in the desired output, arr2. What are some more efficient ways?
let arr1=[|4; 2; 3; 4; 3; 4; 1|]
let n=arr1.Length
let iPlus2IsEqual=Array.map2 (fun x y -> x=y) arr1.[2..] arr1.[..(n-3)]
let consecutive=Array.map2 (fun x y -> x && y) iPlus2IsEqual.[1..] iPlus2IsEqual.[..(n-4)] |> Array.tryFindIndex (fun x -> x)
let dup=if consecutive.IsSome then consecutive.Value+1 else n-1
let arr2=if dup>=n-3 then arr1.[..dup] else Array.append arr1.[..dup] arr1.[(dup+3)..]
>
val arr2 : int [] = [|4; 2; 3; 4; 1|]

We can use recursion like so (it will get multiple repeats for free too)
let rec filterrepeats l =
match l with
|a::b::c::d::t when a=c && b=d -> a::b::(filterrepeats t)
|h::t ->h::(filterrepeats t)
|[] -> []
> filterrepeats [4;2;3;4;3;4;1];;
val it : int list = [4; 2; 3; 4; 1]
This works on lists, so you will need to add a call to Array.toList before you run it.
The above is not tail recursive as the compiler doesn't know what goes on the right hand side of h::(filterrepeats t) until after the function call. You can solve this by using an accumulator like so:
let rec filterrepeats l =
let rec loop l acc =
match l with
|a::b::c::d::t when a=c && b=d ->loop t (b::a::acc)
|h::t ->loop t (h::acc)
|[] -> acc
loop (List.rev l) []

For large arrays this is around 13x faster than your solution:
let inline tryFindDuplicatedPairIndex (xs: _ []) =
let rec loop i x0 x1 x2 =
if i < xs.Length-4 then
let x3 = xs.[i+3]
if x0=x2 && x1=x3 then Some i else
loop (i+1) x1 x2 x3
else None
if xs.Length < 4 then None else
loop 0 xs.[0] xs.[1] xs.[2]
let inline removeDuplicatedPair (xs: _ []) =
match tryFindDuplicatedPairIndex xs with
| None -> Array.copy xs
| Some i ->
let ys = Array.zeroCreate (xs.Length-2)
for j=0 to i-1 do
ys.[j] <- xs.[j]
for j=i+2 to xs.Length-1 do
ys.[j-2] <- xs.[j]
ys
I use inline and test elements individually (i.e. rather than as a tuple: (x0,x1) = (x2,x3)) to try to prevent = from being a generic equality test because that is very slow. I've reused previous array lookups from one iteration to the next. I copy the input array if the output is identical to the input and pre-allocate an array with n-2 elements otherwise. I've hand-rolled the copying to my pre-allocated array to avoid creating any garbage (e.g. instead of Array.append of two slices).

No stack overflow with large list (length >= 100K) and remove all duplicate pairs
let rec distinctPairs list =
List.foldBack (fun x (l,r) -> x::r, l) list ([],[])
|> fun (odds, evens) -> List.zip odds evens
|> Seq.distinct
Not very fast, 1M list take 500ms, anyway faster ?
Only work for list with even length

Related

How to split a sequence in F# based on another sequence in an idiomatic way

I have, in F#, 2 sequences, each containing distinct integers, strictly in ascending order: listMaxes and numbers.
If not Seq.isEmpty numbers, then it is guaranteed that not Seq.isEmpty listMaxes and Seq.last listMaxes >= Seq.last numbers.
I would like to implement in F# a function that returns a list of list of integers, whose List.length equals Seq.length listMaxes, containing the elements of numbers divided in lists, where the elements of listMaxes limit each group.
For example: called with the arguments
listMaxes = seq [ 25; 56; 65; 75; 88 ]
numbers = seq [ 10; 11; 13; 16; 20; 25; 31; 38; 46; 55; 65; 76; 88 ]
this function should return
[ [10; 11; 13; 16; 20; 25]; [31; 38; 46; 55]; [65]; List.empty; [76; 88] ]
I can implement this function, iterating over numbers only once:
let groupByListMaxes listMaxes numbers =
if Seq.isEmpty numbers then
List.replicate (Seq.length listMaxes) List.empty
else
List.ofSeq (seq {
use nbe = numbers.GetEnumerator ()
ignore (nbe.MoveNext ())
for lmax in listMaxes do
yield List.ofSeq (seq {
if nbe.Current <= lmax then
yield nbe.Current
while nbe.MoveNext () && nbe.Current <= lmax do
yield nbe.Current
})
})
But this code feels unclean, ugly, imperative, and very un-F#-y.
Is there any functional / F#-idiomatic way to achieve this?
Here's a version based on list interpretation, which is quite functional in style. You can use Seq.toList to convert between them, whenever you want to handle that. You could also use Seq.scan in conjunction with Seq.partition ((>=) max) if you want to use only library functions, but beware that it's very very easy to introduce a quadratic complexity in either computation or memory when doing that.
This is linear in both:
let splitAt value lst =
let rec loop l1 = function
| [] -> List.rev l1, []
| h :: t when h > value -> List.rev l1, (h :: t)
| h :: t -> loop (h :: l1) t
loop [] lst
let groupByListMaxes listMaxes numbers =
let rec loop acc lst = function
| [] -> List.rev acc
| h :: t ->
let out, lst' = splitAt h lst
loop (out :: acc) lst' t
loop [] numbers listMaxes
It can be done like this with pattern matching and tail recursion:
let groupByListMaxes listMaxes numbers =
let rec inner acc numbers =
function
| [] -> acc |> List.rev
| max::tail ->
let taken = numbers |> Seq.takeWhile ((>=) max) |> List.ofSeq
let n = taken |> List.length
inner (taken::acc) (numbers |> Seq.skip n) tail
inner [] numbers (listMaxes |> List.ofSeq)
Update: I also got inspired by fold and came up with the following solution that strictly refrains from converting the input sequences.
let groupByListMaxes maxes numbers =
let rec inner (acc, (cur, numbers)) max =
match numbers |> Seq.tryHead with
// Add n to the current list of n's less
// than the local max
| Some n when n <= max ->
let remaining = numbers |> Seq.tail
inner (acc, (n::cur, remaining)) max
// Complete the current list by adding it
// to the accumulated result and prepare
// the next list for fold.
| _ ->
(List.rev cur)::acc, ([], numbers)
maxes |> Seq.fold inner ([], ([], numbers)) |> fst |> List.rev
I have found a better implementation myself. Tips for improvements are still welcome.
Dealing with 2 sequences is really a pain. And I really do want to iterate over numbers only once without turning that sequence into a list. But then I realized that turning listMaxes (generally the shorter of the sequences) is less costly. That way only 1 sequence remains, and I can use Seq.fold over numbers.
What should be the state that we want to keep and change while iterating with Seq.fold over numbers? First, it should definitely include the remaining of the listMaxes, yet the previous maxes that we already have surpassed are no longer of interest. Second, the accumulated lists so far, although, like in the other answers, these can be kept in reverse order. More to the point: the state is a couple which has as second element a reversed list of reversed lists of the numbers so far.
let groupByListMaxes listMaxes numbers =
let rec folder state number =
match state with
| m :: maxes, _ when number > m ->
folder (maxes, List.empty :: snd state) number
| m :: maxes, [] ->
fst state, List.singleton (List.singleton number)
| m :: maxes, h :: t ->
fst state, (number :: h) :: t
| [], _ ->
failwith "Guaranteed not to happen"
let listMaxesList = List.ofSeq listMaxes
let initialState = listMaxesList, List.empty
let reversed = snd (Seq.fold folder initialState numbers)
let temp = List.rev (List.map List.rev reversed)
let extraLength = List.length listMaxesList - List.length temp
let extra = List.replicate extraLength List.empty
List.concat [temp; extra]
I know this is an old question but I had a very similar problem and I think this is a simple solution:
let groupByListMaxes cs xs =
List.scan (fun (_, xs) c -> List.partition (fun x -> x <= c) xs)
([], xs)
cs
|> List.skip 1
|> List.map fst

F#, implement fold3, fold4, fold_n

I am interested to implement fold3, fold4 etc., similar to List.fold and List.fold2. e.g.
// TESTCASE
let polynomial (x:double) a b c = a*x + b*x*x + c*x*x*x
let A = [2.0; 3.0; 4.0; 5.0]
let B = [1.5; 1.0; 0.5; 0.2]
let C = [0.8; 0.01; 0.001; 0.0001]
let result = fold3 polynomial 0.7 A B C
// 2.0 * (0.7 ) + 1.5 * (0.7 )^2 + 0.8 * (0.7 )^3 -> 2.4094
// 3.0 * (2.4094) + 1.0 * (2.4094)^2 + 0.01 * (2.4094)^3 -> 13.173
// 4.0 * (13.173) + 0.5 * (13.173)^2 + 0.001 * (13.173)^3 -> 141.75
// 5.0 * (141.75) + 0.2 * (141.75)^2 + 0.0001 * (141.75)^3 -> 5011.964
//
// Output: result = 5011.964
My first method is grouping the 3 lists A, B, C, into a list of tuples, and then apply list.fold
let fold3 f x A B C =
List.map3 (fun a b c -> (a,b,c)) A B C
|> List.fold (fun acc (a,b,c) -> f acc a b c) x
// e.g. creates [(2.0,1.5,0.8); (3.0,1.0,0.01); ......]
My second method is to declare a mutable data, and use List.map3
let mutable result = 0.7
List.map3 (fun a b c ->
result <- polynomial result a b c // Change mutable data
// Output intermediate data
result) A B C
// Output from List.map3: [2.4094; 13.17327905; 141.7467853; 5011.963942]
// result mutable: 5011.963942
I would like to know if there are other ways to solve this problem. Thank you.
For fold3, you could just do zip3 and then fold:
let polynomial (x:double) (a, b, c) = a*x + b*x*x + c*x*x*x
List.zip3 A B C |> List.fold polynomial 0.7
But if you want this for the general case, then you need what we call "applicative functors".
First, imagine you have a list of functions and a list of values. Let's assume for now they're of the same size:
let fs = [ (fun x -> x+1); (fun x -> x+2); (fun x -> x+3) ]
let xs = [3;5;7]
And what you'd like to do (only natural) is to apply each function to each value. This is easily done with List.map2:
let apply fs xs = List.map2 (fun f x -> f x) fs xs
apply fs xs // Result = [4;7;10]
This operation "apply" is why these are called "applicative functors". Not just any ol' functors, but applicative ones. (the reason for why they're "functors" is a tad more complicated)
So far so good. But wait! What if each function in my list of functions returned another function?
let f1s = [ (fun x -> fun y -> x+y); (fun x -> fun y -> x-y); (fun x -> fun y -> x*y) ]
Or, if I remember that fun x -> fun y -> ... can be written in the short form of fun x y -> ...
let f1s = [ (fun x y -> x+y); (fun x y -> x-y); (fun x y -> x*y) ]
What if I apply such list of functions to my values? Well, naturally, I'll get another list of functions:
let f2s = apply f1s xs
// f2s = [ (fun y -> 3+y); (fun y -> 5+y); (fun y -> 7+y) ]
Hey, here's an idea! Since f2s is also a list of functions, can I apply it again? Well of course I can!
let ys = [1;2;3]
apply f2s ys // Result: [4;7;10]
Wait, what? What just happened?
I first applied the first list of functions to xs, and got another list of functions as a result. And then I applied that result to ys, and got a list of numbers.
We could rewrite that without intermediate variable f2s:
let f1s = [ (fun x y -> x+y); (fun x y -> x-y); (fun x y -> x*y) ]
let xs = [3;5;7]
let ys = [1;2;3]
apply (apply f1s xs) ys // Result: [4;7;10]
For extra convenience, this operation apply is usually expressed as an operator:
let (<*>) = apply
f1s <*> xs <*> ys
See what I did there? With this operator, it now looks very similar to just calling the function with two arguments. Neat.
But wait. What about our original task? In the original requirements we don't have a list of functions, we only have one single function.
Well, that can be easily fixed with another operation, let's call it "apply first". This operation will take a single function (not a list) plus a list of values, and apply this function to each value in the list:
let applyFirst f xs = List.map f xs
Oh, wait. That's just map. Silly me :-)
For extra convenience, this operation is usually also given an operator name:
let (<|>) = List.map
And now, I can do things like this:
let f x y = x + y
let xs = [3;5;7]
let ys = [1;2;3]
f <|> xs <*> ys // Result: [4;7;10]
Or this:
let f x y z = (x + y)*z
let xs = [3;5;7]
let ys = [1;2;3]
let zs = [1;-1;100]
f <|> xs <*> ys <*> zs // Result: [4;-7;1000]
Neat! I made it so I can apply arbitrary functions to lists of arguments at once!
Now, finally, you can apply this to your original problem:
let polynomial a b c (x:double) = a*x + b*x*x + c*x*x*x
let A = [2.0; 3.0; 4.0; 5.0]
let B = [1.5; 1.0; 0.5; 0.2]
let C = [0.8; 0.01; 0.001; 0.0001]
let ps = polynomial <|> A <*> B <*> C
let result = ps |> List.fold (fun x f -> f x) 0.7
The list ps consists of polynomial instances that are partially applied to corresponding elements of A, B, and C, and still expecting the final argument x. And on the next line, I simply fold over this list of functions, applying each of them to the result of the previous.
You could check the implementation for ideas:
https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs
let fold<'T,'State> (f : 'State -> 'T -> 'State) (acc: 'State) (array:'T[]) =
checkNonNull "array" array
let f = OptimizedClosures.FSharpFunc<_,_,_>.Adapt(f)
let mutable state = acc
for i = 0 to array.Length-1 do
state <- f.Invoke(state,array.[i])
state
here's a few implementations for you:
let fold2<'a,'b,'State> (f : 'State -> 'a -> 'b -> 'State) (acc: 'State) (a:'a array) (b:'b array) =
let mutable state = acc
Array.iter2 (fun x y->state<-f state x y) a b
state
let iter3 f (a: 'a[]) (b: 'b[]) (c: 'c[]) =
let f = OptimizedClosures.FSharpFunc<_,_,_,_>.Adapt(f)
if a.Length <> b.Length || a.Length <> c.Length then failwithf "length"
for i = 0 to a.Length-1 do
f.Invoke(a.[i], b.[i], c.[i])
let altIter3 f (a: 'a[]) (b: 'b[]) (c: 'c[]) =
if a.Length <> b.Length || a.Length <> c.Length then failwithf "length"
for i = 0 to a.Length-1 do
f (a.[i]) (b.[i]) (c.[i])
let fold3<'a,'b,'State> (f : 'State -> 'a -> 'b -> 'c -> 'State) (acc: 'State) (a:'a array) (b:'b array) (c:'c array) =
let mutable state = acc
iter3 (fun x y z->state<-f state x y z) a b c
state
NB. we don't have an iter3, so, implement that. OptimizedClosures.FSharpFunc only allow up to 5 (or is it 7?) params. There are a finite number of type slots available. It makes sense. You can go higher than this, of course, without using the OptimizedClosures stuff.
... anyway, generally, you don't want to be iterating too many lists / arrays / sequences at once. So I'd caution against going too high.
... the better way forward in such cases may be to construct a record or tuple from said lists / arrays, first. Then, you can just use map and iter, which are already baked in. This is what zip / zip3 are all about (see: "(array1.[i],array2.[i],array3.[i])")
let zip3 (array1: _[]) (array2: _[]) (array3: _[]) =
checkNonNull "array1" array1
checkNonNull "array2" array2
checkNonNull "array3" array3
let len1 = array1.Length
if len1 <> array2.Length || len1 <> array3.Length then invalidArg3ArraysDifferent "array1" "array2" "array3" len1 array2.Length array3.Length
let res = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked len1
for i = 0 to res.Length-1 do
res.[i] <- (array1.[i],array2.[i],array3.[i])
res
I'm working with arrays at the moment, so my solution pertained to those. Sorry about that. Here's a recursive version for lists.
let fold3 f acc a b c =
let mutable state = acc
let rec fold3 f a b c =
match a,b,c with
| [],[],[] -> ()
| [],_,_
| _,[],_
| _,_,[] -> failwith "length"
| ahead::atail, bhead::btail, chead::ctail ->
state <- f state ahead bhead chead
fold3 f atail btail ctail
fold3 f a b c
i.e. we define a recursive function within a function which acts upon/mutates/changes the outer scoped mutable acc variable (a closure in functional speak). Finally, this gets returned.
It's pretty cool how much type information gets inferred about these functions. In the array examples above, mostly I was explicit with 'a 'b 'c. This time, we let type inference kick in. It knows we're dealing with lists from the :: operator. That's kind of neat.
NB. the compiler will probably unwind this tail-recursive approach so that it is just a loop behind-the-scenes. Generally, get a correct answer before optimising. Just mentioning this, though, as food for later thought.
I think the existing answers provide great options if you want to generalize folding, which was your original question. However, if I simply wanted to call the polynomial function on inputs specified in A, B and C, then I would probably do not want to introduce fairly complex constructs like applicative functors with fancy operators to my code base.
The problem becomes a lot easier if you transpose the input data, so that rather than having a list [A; B; C] with lists for individual variables, you have a transposed list with inputs for calculating each polynomial. To do this, we'll need the transpose function:
let rec transpose = function
| (_::_)::_ as M -> List.map List.head M :: transpose (List.map List.tail M)
| _ -> []
Now you can create a list with inputs, transpose it and calculate all polynomials simply using List.map:
transpose [A; B; C]
|> List.map (function
| [a; b; c] -> polynomial 0.7 a b c
| _ -> failwith "wrong number of arguments")
There are many ways to solve this problem. Few are mentioned like first zip3 all three list, then run over it. Using Applicate Functors like Fyodor Soikin describes means you can turn any function with any amount of arguments into a function that expects list instead of single arguments. This is a good general solution that works with any numbers of lists.
While this is a general good idea, i'm sometimes shocked that so few use more low-level tools. In this case it is a good idea to use recursion and learn more about recursion.
Recursion here is the right-tool because we have immutable data-types. But you could consider how you would implement it with mutable lists and looping first, if that helps. The steps would be:
You loop over an index from 0 to the amount of elements in the lists.
You check if every list has an element for the index
If every list has an element then you pass this to your "folder" function
If at least one list don't have an element, then you abort the loop
The recursive version works exactly the same. Only that you don't use an index to access the elements. You would chop of the first element from every list and then recurse on the remaining list.
Otherwise List.isEmpty is the function to check if a List is empty. You can chop off the first element with List.head and you get the remaining list with the first element removed by List.tail. This way you can just write:
let rec fold3 f acc l1 l2 l3 =
let h = List.head
let t = List.tail
let empty = List.isEmpty
if (empty l1) || (empty l2) && (empty l3)
then acc
else fold3 f (f acc (h l1) (h l2) (h l3)) (t l1) (t l2) (t l3)
The if line checks if every list has at least one element. If that is true
it executes: f acc (h l1) (h l2) (h l3). So it executes f and passes it the first element of every list as an argument. The result is the new accumulator of
the next fold3 call.
Now that you worked on the first element of every list, you must chop off the first element of every list, and continue with the remaining lists. You achieve that with List.tail or in the above example (t l1) (t l2) (t l3). Those are the next remaining lists for the next fold3 call.
Creating a fold4, fold5, fold6 and so on isn't really hard, and I think it is self-explanatory. My general advice is to learn a little bit more about recursion and try to write recursive List functions without Pattern Matching. Pattern Matching is not always easier.
Some code examples:
fold3 (fun acc x y z -> x + y + z :: acc) [] [1;2;3] [10;20;30] [100;200;300] // [333;222;111]
fold3 (fun acc x y z -> x :: y :: z :: acc) [] [1;2;3] [10;20;30] [100;200;300] // [3; 30; 300; 2; 20; 200; 1; 10; 100]

Take N elements from sequence with N different indexes in F#

I'm new to F# and looking for a function which take N*indexes and a sequence and gives me N elements. If I have N indexes it should be equal to concat Seq.nth index0, Seq.nth index1 .. Seq.nth indexN but it should only scan over indexN elements (O(N)) in the sequence and not index0+index1+...+indexN (O(N^2)).
To sum up, I'm looking for something like:
//For performance, the index-list should be ordered on input, be padding between elements instead of indexes or be ordered when entering the function
seq {10 .. 20} |> Seq.takeIndexes [0;5;10]
Result: 10,15,20
I could make this by using seq { yield... } and have a index-counter to tick when some element should be passed out but if F# offers a nice standard way I would rather use that.
Thanks :)...
Addition: I have made the following. It works but ain't pretty. Suggestions is welcomed
let seqTakeIndexes (indexes : int list) (xs : seq<int>) =
seq {
//Assume indexes is sorted
let e = xs.GetEnumerator()
let i = ref indexes
let curr = ref 0
while e.MoveNext() && not (!i).IsEmpty do
if !curr = List.head !i then
i := (!i).Tail
yield e.Current
curr := !curr + 1
}
When you want to access elements by index, then using sequences isn't as good idea. Sequences are designed to allow sequential iteration. I would convert the necessary part of the sequence to an array and then pick the elements by index:
let takeIndexes ns input =
// Take only elements that we need to access (sequence could be infinite)
let arr = input |> Seq.take (1 + Seq.max ns) |> Array.ofSeq
// Simply pick elements at the specified indices from the array
seq { for index in ns -> arr.[index] }
seq [10 .. 20] |> takeIndexes [0;5;10]
Regarding your implementation - I don't think it can be made significantly more elegant. This is a general problem when implementing functions that need to take values from multiple sources in an interleaved fashion - there is just no elegant way of writing those!
However, you can write this in a functional way using recursion like this:
let takeIndexes indices (xs:seq<int>) =
// Iterates over the list of indices recursively
let rec loop (xe:IEnumerator<_>) idx indices = seq {
let next = loop xe (idx + 1)
// If the sequence ends, then end as well
if xe.MoveNext() then
match indices with
| i::indices when idx = i ->
// We're passing the specified index
yield xe.Current
yield! next indices
| _ ->
// Keep waiting for the first index from the list
yield! next indices }
seq {
// Note: 'use' guarantees proper disposal of the source sequence
use xe = xs.GetEnumerator()
yield! loop xe 0 indices }
seq [10 .. 20] |> takeIndexes [0;5;10]
When you need to scan a sequence and accumulate results in O(n), you can always fall back to Seq.fold:
let takeIndices ind sq =
let selector (idxLeft, currIdx, results) elem =
match idxLeft with
| [] -> (idxLeft, currIdx, results)
| idx::moreIdx when idx = currIdx -> (moreIdx, currIdx+1, elem::results)
| idx::_ when idx <> currIdx -> (idxLeft, currIdx+1, results)
| idx::_ -> invalidOp "Can't get here."
let (_, _, results) = sq |> Seq.fold selector (ind, 0, [])
results |> List.rev
seq [10 .. 20] |> takeIndices [0;5;10]
The drawback of this solution is that it will enumerate the sequence to the end, even if it has accumulated all the desired elements already.
Here is my shot at this. This solution will only go as far as it needs into the sequence and returns the elements as a list.
let getIndices xs (s:_ seq) =
let enum = s.GetEnumerator()
let rec loop i acc = function
| h::t as xs ->
if enum.MoveNext() then
if i = h then
loop (i+1) (enum.Current::acc) t
else
loop (i+1) acc xs
else
raise (System.IndexOutOfRangeException())
| _ -> List.rev acc
loop 0 [] xs
[10..20]
|> getIndices [2;4;8]
// Returns [12;14;18]
The only assumption made here is that the index list you supply is sorted. The function won't work properly otherwise.
Is it a problem, that the returned result is sorted?
This algorithm will work linearly over the input sequence. Just the indices need to be sorted. If the sequence is large, but indices are not so many - it'll be fast.
Complexity is: N -> Max(indices), M -> count of indices: O(N + MlogM) in the worst case.
let seqTakeIndices indexes =
let rec gather prev idxs xs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { let left = xs |> Seq.skip (n - prev)
yield left |> Seq.head
yield! gather n ns left }
indexes |> List.sort |> gather 0
Here is a List.fold variant, but is more complex to read. I prefer the first:
let seqTakeIndices indices xs =
let gather (prev, xs, res) n =
let left = xs |> Seq.skip (n - prev)
n, left, (Seq.head left)::res
let _, _, res = indices |> List.sort |> List.fold gather (0, xs, [])
res
Appended: Still slower than your variant, but a lot faster than mine older variants. Because of not using Seq.skip that is creating new enumerators and was slowing down things a lot.
let seqTakeIndices indices (xs : seq<_>) =
let enum = xs.GetEnumerator()
enum.MoveNext() |> ignore
let rec gather prev idxs =
match idxs with
| [] -> Seq.empty
| n::ns -> seq { if [1..n-prev] |> List.forall (fun _ -> enum.MoveNext()) then
yield enum.Current
yield! gather n ns }
indices |> List.sort |> gather 0

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Get a random subset from a set in F#

I am trying to think of an elegant way of getting a random subset from a set in F#
Any thoughts on this?
Perhaps this would work: say we have a set of 2x elements and we need to pick a subset of y elements. Then if we could generate an x sized bit random number that contains exactly y 2n powers we effectively have a random mask with y holes in it. We could keep generating new random numbers until we get the first one satisfying this constraint but is there a better way?
If you don't want to convert to an array you could do something like this. This is O(n*m) where m is the size of the set.
open System
let rnd = Random(0);
let set = Array.init 10 (fun i -> i) |> Set.of_array
let randomSubSet n set =
seq {
let i = set |> Set.to_seq |> Seq.nth (rnd.Next(set.Count))
yield i
yield! set |> Set.remove i
}
|> Seq.take n
|> Set.of_seq
let result = set |> randomSubSet 3
for x in result do
printfn "%A" x
Agree with #JohannesRossel. There's an F# shuffle-an-array algorithm here you can modify suitably. Convert the Set into an array, and then loop until you've selected enough random elements for the new subset.
Not having a really good grasp of F# and what might be considered elegant there, you could just do a shuffle on the list of elements and select the first y. A Fisher-Yates shuffle even helps you in this respect as you also only need to shuffle y elements.
rnd must be out of subset function.
let rnd = new Random()
let rec subset xs =
let removeAt n xs = ( Seq.nth (n-1) xs, Seq.append (Seq.take (n-1) xs) (Seq.skip n xs) )
match xs with
| [] -> []
| _ -> let (rem, left) = removeAt (rnd.Next( List.length xs ) + 1) xs
let next = subset (List.of_seq left)
if rnd.Next(2) = 0 then rem :: next else next
Do you mean a random subset of any size?
For the case of a random subset of a specific size, there's a very elegant answer here:
Select N random elements from a List<T> in C#
Here it is in pseudocode:
RandomKSubset(list, k):
n = len(list)
needed = k
result = {}
for i = 0 to n:
if rand() < needed / (n-i)
push(list[i], result)
needed--
return result
Using Seq.fold to construct using lazy evaluation random sub-set:
let rnd = new Random()
let subset2 xs = let insertAt n xs x = Seq.concat [Seq.take n xs; seq [x]; Seq.skip n xs]
let randomInsert xs = insertAt (rnd.Next( (Seq.length xs) + 1 )) xs
xs |> Seq.fold randomInsert Seq.empty |> Seq.take (rnd.Next( Seq.length xs ) + 1)

Resources