Does anybody know the reason why QuotationEvaluator is so slow? - f#

It's common knowledge in the F# community that the PowerPack's quotation compiling facility produces very slow code, so slow in fact that it performs even worse than naive interpretation. I've been looking into the reasons for this, but I haven't been able to find a convincing answer so far. There have been claims that this happens either because of inefficient representation of things like pattern matches in quotations or because of an inherent inefficiency with Expression Trees used by the library. I'd like to illustrate why I think neither is true with a simple example:
#r "FSharp.Powerpack.Linq.dll"
open System
open System.Linq.Expressions
open Microsoft.FSharp.Quotations.Patterns
let powerpack = Microsoft.FSharp.Linq.QuotationEvaluator.Compile <# 1 + 1 #>
// explicitly rewrite above quotation with expression trees
let expressionTree =
let (Call(_,addM,_)) = <# 1 + 1 #>
let constExpr (x : 'T) = Expression.Constant(box x, typeof<'T>)
let eval = Expression.Call(addM, constExpr 1, constExpr 1)
let lambda = Expression.Lambda<Func<int>>(eval)
lambda.Compile()
// reflection - based evaluation
let reflection =
let (Call(_,addM,_)) = <# 1 + 1 #>
fun () -> addM.Invoke(null, [| 1 :> obj ; 1 :> obj |]) :?> int
#time
// QuotationEvaluator ~ 2.5 secs
for i in 1 .. 1000000 do
powerpack () |> ignore
// native evaluation ~ 1 msec
for i in 1 .. 1000000 do
(fun () -> 1 + 1) () |> ignore
// reflection evaluation ~ 700 msec
for i in 1 .. 1000000 do
reflection () |> ignore
// naive expression tree ~ 19 msec
for i in 1 .. 1000000 do
expressionTree.Invoke () |> ignore
Something is clearly going wrong here. The question is, what?
EDIT: the same behaviour also occurs with the FSharpx.Linq compiler

Below is the implementation of the compile:
let CompileImpl (e: #Expr, eraseEquality) =
let ty = e.Type
let e = Expr.NewDelegate(GetFuncType([|typeof<unit>; ty |]), [new Var("unit",typeof<unit>)],e)
let linqExpr = Conv (e,eraseEquality)
let linqExpr = (linqExpr :?> LambdaExpression)
let d = linqExpr.Compile()
(fun () ->
try
d.DynamicInvoke [| box () |]
with :? System.Reflection.TargetInvocationException as exn ->
raise exn.InnerException)
Notice the use of DynamicInvoke on the delegate, which is much slower than Invoke and the reason for the result you are getting.

Related

Speed up FsCheck Arbitrary generation

I'm writting some generators and an Arbitrary, but is too slow (see the GC numbers also). I think I have an error on my code, but I can't figure out where. Or my approach (map2 (fold)) is "weird"?.
Generators:
type Generators () =
static let notAllowed = Array.append [| '0'..'9' |] [| '\n'; '\r'; '['; ']'; '/'; |]
static let containsInvalidValues (s : string) = s.IndexOfAny(notAllowed) <> -1
static member positiveIntsGen() = Arb.generate<PositiveInt> |> Gen.map int
static member separatorStringGen() =
Arb.generate<NonEmptyString>
|> Gen.suchThat (fun s -> s.Get.Length < 5 && not (s.Get |> containsInvalidValues))
Arbitrary:
let manyNumbersNewLineCustomDelimiterStrInput =
Gen.map2 (fun (ints : int[]) (nes : NonEmptyString) ->
Array.fold (fun acc num ->
if num % 2 = 0 then acc + "," + num.ToString()
else if num % 3 = 0 then acc + "\n" + num.ToString()
else acc + "\n" + num.ToString()) ("//[" + nes.Get + "]\n") ints )
(Generators.array12OfIntsGen())
(Generators.separatorStringGen())
|> Arb.fromGen
The configuration have the MaxTest = 500 and it takes ~5 minutes to complete.
Output (using #timer):
StrCalcTest.get_When pass an string that starts with "//[" and contains "]\n" use the multicharacter value between them as separator-Ok, passed 500 tests.
Real: 00:07:03.467, CPU: 00:07:03.296, GC gen0: 75844, gen1: 71968, gen2: 4
Without actually testing anything, my guess would be that the problematic part is this:
Arb.generate<NonEmptyString>
|> Gen.suchThat (fun s -> s.Get.Length < 5 && not (s.Get |> containsInvalidValues))
This means that you will generate strings and filtering out all those that satisfy certain conditions. But if the conditions are too restrictive, FsCheck might need to generate a very large number of strings until you actually get some that pass the test.
If you can express the rule so that you are generating the strings so that everything you generate is a valid string, then I think it should be faster.
Could you, for example, generate a number n (for the string length) followed by n values of type char (that satisfy your conditions) and then append them to form the separator string? (I think FsCheck's gen { .. } computation might be a nice way of writing that.)

How do I write a computation expression builder that accumulates a value and also allows standard language constructs?

I have a computation expression builder that builds up a value as you go, and has many custom operations. However, it does not allow for standard F# language constructs, and I'm having a lot of trouble figuring out how to add this support.
To give a stand-alone example, here's a dead-simple and fairly pointless computation expression that builds F# lists:
type Items<'a> = Items of 'a list
type ListBuilder() =
member x.Yield(()) = Items []
[<CustomOperation("add")>]
member x.Add(Items current, item:'a) =
Items [ yield! current; yield item ]
[<CustomOperation("addMany")>]
member x.AddMany(Items current, items: seq<'a>) =
Items [ yield! current; yield! items ]
let listBuilder = ListBuilder()
let build (Items items) = items
I can use this to build lists just fine:
let stuff =
listBuilder {
add 1
add 5
add 7
addMany [ 1..10 ]
add 42
}
|> build
However, this is a compiler error:
listBuilder {
let x = 5 * 39
add x
}
// This expression was expected to have type unit, but
// here has type int.
And so is this:
listBuilder {
for x = 1 to 50 do
add x
}
// This control construct may only be used if the computation expression builder
// defines a For method.
I've read all the documentation and examples I can find, but there's something I'm just not getting. Every .Bind() or .For() method signature I try just leads to more and more confusing compiler errors. Most of the examples I can find either build up a value as you go along, or allow for regular F# language constructs, but I haven't been able to find one that does both.
If someone could point me in the right direction by showing me how to take this example and add support in the builder for let bindings and for loops (at minimum - using, while and try/catch would be great, but I can probably figure those out if someone gets me started) then I'll be able to gratefully apply the lesson to my actual problem.
The best place to look is the spec. For example,
b {
let x = e
op x
}
gets translated to
T(let x = e in op x, [], fun v -> v, true)
=> T(op x, {x}, fun v -> let x = e in v, true)
=> [| op x, let x = e in b.Yield(x) |]{x}
=> b.Op(let x = e in in b.Yield(x), x)
So this shows where things have gone wrong, though it doesn't present an obvious solution. Clearly, Yield needs to be generalized since it needs to take arbitrary tuples (based on how many variables are in scope). Perhaps more subtly, it also shows that x is not in scope in the call to add (see that unbound x as the second argument to b.Op?). To allow your custom operators to use bound variables, their arguments need to have the [<ProjectionParameter>] attribute (and take functions from arbitrary variables as arguments), and you'll also need to set MaintainsVariableSpace to true if you want bound variables to be available to later operators. This will change the final translation to:
b.Op(let x = e in b.Yield(x), fun x -> x)
Building up from this, it seems that there's no way to avoid passing the set of bound values along to and from each operation (though I'd love to be proven wrong) - this will require you to add a Run method to strip those values back off at the end. Putting it all together, you'll get a builder which looks like this:
type ListBuilder() =
member x.Yield(vars) = Items [],vars
[<CustomOperation("add",MaintainsVariableSpace=true)>]
member x.Add((Items current,vars), [<ProjectionParameter>]f) =
Items (current # [f vars]),vars
[<CustomOperation("addMany",MaintainsVariableSpace=true)>]
member x.AddMany((Items current, vars), [<ProjectionParameter>]f) =
Items (current # f vars),vars
member x.Run(l,_) = l
The most complete examples I've seen are in §6.3.10 of the spec, especially this one:
/// Computations that can cooperatively yield by returning a continuation
type Eventually<'T> =
| Done of 'T
| NotYetDone of (unit -> Eventually<'T>)
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module Eventually =
/// The bind for the computations. Stitch 'k' on to the end of the computation.
/// Note combinators like this are usually written in the reverse way,
/// for example,
/// e |> bind k
let rec bind k e =
match e with
| Done x -> NotYetDone (fun () -> k x)
| NotYetDone work -> NotYetDone (fun () -> bind k (work()))
/// The return for the computations.
let result x = Done x
type OkOrException<'T> =
| Ok of 'T
| Exception of System.Exception
/// The catch for the computations. Stitch try/with throughout
/// the computation and return the overall result as an OkOrException.
let rec catch e =
match e with
| Done x -> result (Ok x)
| NotYetDone work ->
NotYetDone (fun () ->
let res = try Ok(work()) with | e -> Exception e
match res with
| Ok cont -> catch cont // note, a tailcall
| Exception e -> result (Exception e))
/// The delay operator.
let delay f = NotYetDone (fun () -> f())
/// The stepping action for the computations.
let step c =
match c with
| Done _ -> c
| NotYetDone f -> f ()
// The rest of the operations are boilerplate.
/// The tryFinally operator.
/// This is boilerplate in terms of "result", "catch" and "bind".
let tryFinally e compensation =
catch (e)
|> bind (fun res -> compensation();
match res with
| Ok v -> result v
| Exception e -> raise e)
/// The tryWith operator.
/// This is boilerplate in terms of "result", "catch" and "bind".
let tryWith e handler =
catch e
|> bind (function Ok v -> result v | Exception e -> handler e)
/// The whileLoop operator.
/// This is boilerplate in terms of "result" and "bind".
let rec whileLoop gd body =
if gd() then body |> bind (fun v -> whileLoop gd body)
else result ()
/// The sequential composition operator
/// This is boilerplate in terms of "result" and "bind".
let combine e1 e2 =
e1 |> bind (fun () -> e2)
/// The using operator.
let using (resource: #System.IDisposable) f =
tryFinally (f resource) (fun () -> resource.Dispose())
/// The forLoop operator.
/// This is boilerplate in terms of "catch", "result" and "bind".
let forLoop (e:seq<_>) f =
let ie = e.GetEnumerator()
tryFinally (whileLoop (fun () -> ie.MoveNext())
(delay (fun () -> let v = ie.Current in f v)))
(fun () -> ie.Dispose())
// Give the mapping for F# computation expressions.
type EventuallyBuilder() =
member x.Bind(e,k) = Eventually.bind k e
member x.Return(v) = Eventually.result v
member x.ReturnFrom(v) = v
member x.Combine(e1,e2) = Eventually.combine e1 e2
member x.Delay(f) = Eventually.delay f
member x.Zero() = Eventually.result ()
member x.TryWith(e,handler) = Eventually.tryWith e handler
member x.TryFinally(e,compensation) = Eventually.tryFinally e compensation
member x.For(e:seq<_>,f) = Eventually.forLoop e f
member x.Using(resource,e) = Eventually.using resource e
The tutorial at "F# for fun and profit" is first class in this regard.
http://fsharpforfunandprofit.com/posts/computation-expressions-intro/
Following a similar struggle to Joel's (and not finding §6.3.10 of the spec that helpful) my issue with getting the For construct to generate a list came down to getting types to line up properly (no special attributes required). In particular I was slow to realise that For would build a list of lists, and therefore need flattening, despite the best efforts of the compiler to put me right. Examples that I found on the web were always wrappers around seq{}, using the yield keyword, repeated use of which invokes a call to Combine, which does the flattening. In case a concrete example helps, the following excerpt uses for to build a list of integers - my ultimate aim being to create lists of components for rendering in a GUI (with some additional laziness thrown in). Also In depth talk on CE here which elaborates on kvb's points above.
module scratch
type Dispatcher = unit -> unit
type viewElement = int
type lazyViews = Lazy<list<viewElement>>
type ViewElementsBuilder() =
member x.Return(views: lazyViews) : list<viewElement> = views.Value
member x.Yield(v: viewElement) : list<viewElement> = [v]
member x.ReturnFrom(viewElements: list<viewElement>) = viewElements
member x.Zero() = list<viewElement>.Empty
member x.Combine(listA:list<viewElement>, listB: list<viewElement>) = List.concat [listA; listB]
member x.Delay(f) = f()
member x.For(coll:seq<'a>, forBody: 'a -> list<viewElement>) : list<viewElement> =
// seq {for v in coll do yield! f v} |> List.ofSeq
Seq.map forBody coll |> Seq.collect id |> List.ofSeq
let ve = new ViewElementsBuilder()
let makeComponent(m: int, dispatch: Dispatcher) : viewElement = m
let makeComponents() : list<viewElement> = [77; 33]
let makeViewElements() : list<viewElement> =
let model = {| Scores = [33;23;22;43;] |> Seq.ofList; Trainer = "John" |}
let d:Dispatcher = fun() -> () // Does nothing here, but will be used to raise messages from UI
ve {
for score in model.Scores do
yield makeComponent (score, d)
yield makeComponent (score * 100 / 50 , d)
if model.Trainer = "John" then
return lazy
[ makeComponent (12, d)
makeComponent (13, d)
]
else
return lazy
[ makeComponent (14, d)
makeComponent (15, d)
]
yield makeComponent (33, d)
return! makeComponents()
}

How to define x++ (where x: int ref) in F#?

I currently use this function
let inc (i : int ref) =
let res = !i
i := res + 1
res
to write things like
let str = input.[inc index]
How define increment operator ++, so that I could write
let str = input.[index++]
You cannot define postfix operators in F# - see 4.4 Operators and Precedence. If you agree to making it prefix instead, then you can define, for example,
let (++) x = incr x; !x
and use it as below:
let y = ref 1
(++) y;;
val y : int ref = {contents = 2;}
UPDATE: as fpessoa pointed out ++ cannot be used as a genuine prefix operator, indeed (see here and there for the rules upon characters and character sequences comprising valid F# prefix operators).
Interestingly, the unary + can be overloaded for the purpose:
let (~+) x = incr x; !x
allowing
let y = ref 1
+y;;
val y : int ref = {contents = 2;}
Nevertheless, it makes sense to mention that the idea of iterating an array like below
let v = [| 1..5 |]
let i = ref -1
v |> Seq.iter (fun _ -> printfn "%d" v.[+i])
for the sake of "readability" looks at least strange in comparison with the idiomatic functional manner
[|1..5|] |> Seq.iter (printfn "%d")
which some initiated already had expressed in comments to the original question.
I was trying to write it as a prefix operator as suggested, but you can't define (++) as a proper prefix operator, i.e., run things like ++y without the () as you could for example for (!+):
let (!+) (i : int ref) = incr i; !i
let v = [| 1..5 |]
let i = ref -1
[1..5] |> Seq.iter (fun _ -> printfn "%d" v.[!+i])
Sorry, but I guess the answer is that actually you can't do even that.

Project Euler #14 attempt fails with StackOverflowException

I recently started solving Project Euler problems in Scala, however when I got to problem 14, I got the StackOverflowError, so I rewrote my solution in F#, since (I am told) the F# compiler, unlike Scala's (which produces Java bytecode), translates recursive calls into loops.
My question to you therefore is, how is it possible that the code below throws the StackOverflowException after reaching some number above 113000? I think that the recursion doesn't have to be a tail recursion in order to be translated/optimized, does it?
I tried several rewrites of my code, but without success. I really don't want to have to write the code in imperative style using loops, and I don't think I could turn the len function to be tail-recursive, even if that was the problem preventing it from being optimized.
module Problem14 =
let lenMap = Dictionary<'int,'int>()
let next n =
if n % 2 = 0 then n/2
else 3*n+1
let memoize(num:int, lng:int):int =
lenMap.[num]<-lng
lng
let rec len(num:int):int =
match num with
| 1 -> 1
| _ when lenMap.ContainsKey(num) -> lenMap.[num]
| _ -> memoize(num, 1+(len (next num)))
let cand = seq{ for i in 999999 .. -1 .. 1 -> i}
let tuples = cand |> Seq.map(fun n -> (n, len(n)))
let Result = tuples |> Seq.maxBy(fun n -> snd n) |> fst
NOTE: I am aware that the code below is very far from optimal and several lines could be a lot simpler, but I am not very proficient in F# and did not bother looking up ways to simplify it and make it more elegant (yet).
Thank you.
Your current code runs without error and gets the correct result if I change all the int to int64 and append an L after every numeric literal (e.g. -1L). If the actual problem is that you're overflowing a 32-bit integer, I'm not sure why you get a StackOverflowException.
module Problem14 =
let lenMap = System.Collections.Generic.Dictionary<_,_>()
let next n =
if n % 2L = 0L then n/2L
else 3L*n+1L
let memoize(num, lng) =
lenMap.[num]<-lng
lng
let rec len num =
match num with
| 1L -> 1L
| _ when lenMap.ContainsKey(num) -> lenMap.[num]
| _ -> memoize(num, 1L+(len (next num)))
let cand = seq{ for i in 999999L .. -1L .. 1L -> i}
let tuples = cand |> Seq.map(fun n -> (n, len(n)))
let Result = tuples |> Seq.maxBy(fun n -> snd n) |> fst

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources