I am working through the Project Euler puzzles (https://projecteuler.net/) using F#. It turns out that a lot of the puzzles require the generation of prime numbers, which is an expensive computation.
To make my life easier, I want to create a module that will generate prime numbers, but also write them to a text file so I just look them up the next time I need them.
Here is my prime generator module:
module Utilities
open System
open System.IO
open System.Reflection
module Primes =
//This is where the cache file is
let private cachePath =
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)
+ "\PrimeCache.txt"
//Parses the file to an int list
let private readCache : int list =
let text = File.ReadAllText(cachePath).Trim()
text.Split([|','|])
|> Seq.filter(fun w -> w |> String.length > 0)
|> Seq.map(fun w -> Int32.Parse(w))
|> Seq.toList
//Writes an int list to the file
let writeCache cache =
let text = String.Join(",", cache |> Seq.map(fun n -> n.ToString()))
File.WriteAllText(cachePath, text)
//In-memory cache
let mutable cache = []
cache <- readCache
let private appendElement element list =
element::(list |> List.rev) |> List.rev
let private isPrimeInner (n : int) =
if n < 2 then false
else
let len = cache.Length
//Factor can't be greater than the square root
let factorLimit = Convert.ToInt32(Math.Ceiling(Math.Sqrt(Convert.ToDouble(n))))
let mutable i = 0
let mutable stop = false
let mutable isPrime = true
while stop = false do
//Get the next element from cache
let p = cache.[i]
i <- i+1
//Don't go past last index of cache
if i >= len
then stop <- true
else ()
//Don't check primes > sqrt(n)
if p > factorLimit
then stop <- true
else ()
//If its divisible by any prime, its not a prime
if n % p = 0
then
stop <- true
isPrime <- false
else ()
//Add primes to cache
if isPrime
then cache <- appendElement n cache
else ()
isPrime
let isPrime (n : int) : bool =
let result = isPrimeInner n
result
let getPrimes : int seq =
seq {
for p in cache do
yield p
let next = if cache.Length > 0
then cache.[cache.Length-1]+1
else 2
for n = next to Int32.MaxValue do
if isPrimeInner n
then yield n
else ()
}
let saveCache : unit =
writeCache cache
Here is the code for the problem I am currently working on that consumes this. (The actual problem is to sum all primes under 2,000,000, but I am testing on 1000 for now):
module Problem10
let getAnswer =
let limit = 1000
let primes = Utilities.Primes.getPrimes
|> Seq.takeWhile(fun p -> p < limit)
let result = primes |> Seq.sum
Utilities.Primes.saveCache
result
This will give the right answer, but will save an empty cache file back to the disk. If I step through it in the debugger, the call to saveCache is one of the first things that is hit, before any breakpoints I set inside isPrimeInner. If I switch the calling code to this:
module Problem10
let getAnswer =
let limit = 1000
let primes = Utilities.Primes.getPrimes
|> Seq.takeWhile(fun p -> p < limit)
let result = primes |> Seq.sum
let x = Utilities.Primes.cache
Utilities.Primes.writeCache x
result
the cache will be properly saved with new primes.
I've seen this kind of behavior before in C# when misusing iterator blocks. I don't understand how these two versions of the calling code are any different at runtime. What am I doing wrong here?
I am attempting to generate a series of guesses for the second Taxicab number. What I want to do is is call the Attempt function on a series of integers in a finite sequence. I have my two questions about implementation in the comments.
A taxi cab number, in case your wondering, is the least number that satisfied the sum of 2 unique cubes in for n unique sets of 2 unique cubes. Ta(2) is 1729.
[<EntryPoint>]
let main argv =
let Attempt (start : int) =
let stop = start+20
let integerList = [start..stop]
let list = List.init 3 (fun x -> integerList.[x])
//Is there a simple way to make initialize the list with random indices of integerList?
let Cube x = x*x*x
let newlist = list |> List.map (fun x -> Cube x)
let partitionList (x : List<int>) (y : int) = List.sum [x.[y];x.[y+1]]
let intLIST = [0..2]
let partitionList' = [for i in intLIST do yield partitionList newlist i]
let x = Set.ofList partitionList'
let y = Set.ofList partitionList'
//I was going to try to use some kind of equality operator to determine whether the two sets were equal, which could tell me whether we had actually found a Taxicab number by the weakened definition.
System.Console.Write(list)
System.Console.Write(newlist)
let rnd = System.Random()
//My primary question is how can I convert a random to an integer to use in start for the function Attempt?
System.Console.ReadKey() |> ignore
printfn("%A") argv
0
Dirty way to initialize list with random indexes of another list:
let randomIndexes count myList =
let rand = System.Random()
seq {
for n = 1 to count do
yield rand.Next(List.length myList) }
|> Seq.distinct
//|> Seq.sort // if you need them sorted
|> List.ofSeq
let result = randomIndexes 5 [3;2;4;5]
printfn "%A" result
Creating a Parallel.ForEach expression of this form:
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
Parallel.ForEach(Partitioner.Create(low, high+1, rangesize), (fun j ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
)) |> ignore
Causes me to receive the error: No overloads match for method 'ForEach'. However I am using the Parallel.ForEach<TSource> Method (Partitioner<TSource>, Action<TSource>) and it seems right to me. Am I missing something?
Edited: I am trying to obtain the same results as the code below (that does not use a Partitioner):
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
let A = [| low .. high |]
Parallel.ForEach(A, fun (j:int) ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
) |> ignore
Are you sure that you have opened all necessary namespaces, all the values you are using (low, high and PROCS) are defined and that your code does not accidentally redefine some of the names that you're using (like Partitioner)?
I created a very simple F# script with this code and it seems to be working fine (I refactored the code to create a partitioner called p, but that does not affect the behavior):
open System.Threading.Tasks
open System.Collections.Concurrent
let PROCS = 10
let low, high = 0, 100
let p = Partitioner.Create(low, high+1, high+1-low/(PROCS*3))
Parallel.ForEach(p, (fun j ->
printfn "%A" j // Print the desired range (using %A as it is a tuple)
)) |> ignore
It is important that the value j is actually a pair of type int * int, so if the body uses it in a wrong way (e.g. as an int), you will get the error. In that case, you can add a type annotation to j and you would get a more useful error elsewhere:
Parallel.ForEach(p, (fun (j:int * int) ->
printfn "%d" j // Error here, because `j` is used as an int, but it is a pair!
)) |> ignore
This means that if you want to perform something for all j values in the original range, you need to write something like this:
Parallel.ForEach(p, (fun (loJ, hiJ) ->
for j in loJ .. hiJ - 1 do // Iterate over all js in this partition
printfn "%d" j // process the current j
)) |> ignore
Aside, I guess that the last argument to Partitioner.Create should actually be (high+1-low)/(PROCS*3) - you probably want to divide the total number of steps, not just the low value.
IE,
What am I doing wrong here? Does it have to to with lists, sequences and arrays and the way the limitations work?
So here is the setup: I'm trying to generate some primes. I see that there are a billion text files of a billion primes. The question isn't why...the question is how are the guys using python calculating all of the primes below 1,000,000 in milliseconds on this post...and what am I doing wrong with the following F# code?
let sieve_primes2 top_number =
let numbers = [ for i in 2 .. top_number do yield i ]
let sieve (n:int list) =
match n with
| [x] -> x,[]
| hd :: tl -> hd, List.choose(fun x -> if x%hd = 0 then None else Some(x)) tl
| _ -> failwith "Pernicious list error."
let rec sieve_prime (p:int list) (n:int list) =
match (sieve n) with
| i,[] -> i::p
| i,n' -> sieve_prime (i::p) n'
sieve_prime [1;0] numbers
With the timer on in FSI, I get 4.33 seconds worth of CPU for 100000... after that, it all just blows up.
Your sieve function is slow because you tried to filter out composite numbers up to top_number. With Sieve of Eratosthenes, you only need to do so until sqrt(top_number) and remaining numbers are inherently prime. Suppose we havetop_number = 1,000,000, your function does 78498 rounds of filtering (the number of primes until 1,000,000) while the original sieve only does so 168 times (the number of primes until 1,000).
You can avoid generating even numbers except 2 which cannot be prime from the beginning. Moreover, sieve and sieve_prime can be merged into a recursive function. And you could use lightweight List.filter instead of List.choose.
Incorporating above suggestions:
let sieve_primes top_number =
let numbers = [ yield 2
for i in 3..2..top_number -> i ]
let rec sieve ns =
match ns with
| [] -> []
| x::xs when x*x > top_number -> ns
| x::xs -> x::sieve (List.filter(fun y -> y%x <> 0) xs)
sieve numbers
In my machine, the updated version is very fast and it completes within 0.6s for top_number = 1,000,000.
Based on my code here: stackoverflow.com/a/8371684/124259
Gets the first 1 million primes in 22 milliseconds in fsi - a significant part is probably compiling the code at this point.
#time "on"
let limit = 1000000
//returns an array of all the primes up to limit
let table =
let table = Array.create limit true //use bools in the table to save on memory
let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
let mutable curfactor = 1;
while curfactor < tlimit-2 do
curfactor <- curfactor+2
if table.[curfactor] then //simple optimisation
let mutable v = curfactor*2
while v < limit do
table.[v] <- false
v <- v + curfactor
let out = Array.create (100000) 0 //this needs to be greater than pi(limit)
let mutable idx = 1
out.[0]<-2
let mutable curx=1
while curx < limit-2 do
curx <- curx + 2
if table.[curx] then
out.[idx]<-curx
idx <- idx+1
out
There have been several good answers both as to general trial division algorithm using lists (#pad) and in choice of an array for a sieving data structure using the Sieve of Eratosthenes (SoE) (#John Palmer and #Jon Harrop). However, #pad's list algorithm isn't particularly fast and will "blow up" for larger sieving ranges and #John Palmer's array solution is somewhat more complex, uses more memory than necessary, and uses external mutable state so is not different than if the program were written in an imperative language such as C#.
EDIT_ADD: I've edited the below code (old code with line comments) modifying the sequence expression to avoid some function calls so as to reflect more of an "iterator style" and while it saved 20% of the speed it still doesn't come close to that of a true C# iterator which is about the same speed as the "roll your own enumerator" final F# code. I've modified the timing information below accordingly. END_EDIT
The following true SoE program only uses 64 KBytes of memory to sieve primes up to a million (due to only considering odd numbers and using the packed bit BitArray) and still is almost as fast as #John Palmer's program at about 40 milliseconds to sieve to one million on a i7 2700K (3.5 GHz), with only a few lines of code:
open System.Collections
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
seq { for i = -1 to BFLMT do if i<0 then yield 2u
elif buf.[i] then yield uint32(3+i+i) }
// seq { yield 2u; yield! seq { 0..BFLMT } |> Seq.filter (fun i->buf.[i])
// |> Seq.map (fun i->uint32 (i+i+3)) }
primesSOE 1000000u |> Seq.length;;
Almost all of the elapsed time is spent in the last two lines enumerating the found primes due to the inefficiency of the sequence run time library as well as the cost of enumerating itself at about 28 clock cycles per function call and return with about 16 function calls per iteration. This could be reduced to only a few function calls per iteration by rolling our own iterator, but the code is not as concise; note that in the following code there is no mutable state exposed other than the contents of the sieving array and the reference variable necessary for the iterator implementation using object expressions:
open System
open System.Collections
open System.Collections.Generic
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
let nmrtr() =
let i = ref -2
let rec nxti() = i:=!i+1;if !i<=BFLMT && not buf.[!i] then nxti() else !i<=BFLMT
let inline curr() = if !i<0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
else let v = uint32 !i in v+v+3u
{ new IEnumerator<_> with
member this.Current = curr()
interface IEnumerator with
member this.Current = box (curr())
member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
interface IDisposable with
member this.Dispose() = () }
{ new IEnumerable<_> with
member this.GetEnumerator() = nmrtr()
interface IEnumerable with
member this.GetEnumerator() = nmrtr() :> IEnumerator }
primesSOE 1000000u |> Seq.length;;
The above code takes about 8.5 milliseconds to sieve the primes to a million on the same machine due to greatly reducing the number of function calls per iteration to about three from about 16. This is about the same speed as C# code written in the same style. It's too bad that F#'s iterator style as I used in the first example doesn't automatically generate the IEnumerable boiler plate code as C# iterators do, but I guess that is the intention of sequences - just that they are so damned inefficient as to speed performance due to being implemented as sequence computation expressions.
Now, less than half of the time is expended in enumerating the prime results for a much better use of CPU time.
What am I doing wrong here?
You've implemented a different algorithm that goes through each possible value and uses % to determine if it needs to be removed. What you're supposed to be doing is stepping through with a fixed increment removing multiples. That would be asymptotically.
You cannot step through lists efficiently because they don't support random access so use arrays.
Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x