Learning to use F#, and I'm trying to get familiar with the match expression. I expect the below code to pick two consecutive days out of the week, the current day and the day after. It only picks out the current day. What am I missing here?
DayOfWeek array:
let days = [|DayOfWeek.Sunday, true;
DayOfWeek.Monday, false;
DayOfWeek.Tuesday, true;
DayOfWeek.Wednesday, true;
DayOfWeek.Thursday, true;
DayOfWeek.Friday, true;
DayOfWeek.Saturday, true;|]
Match expression:
let curDate = DateTime.Now
let validDates =
[
for i in days do
match i with
| day, true ->
match day with
| x when int x = int curDate.DayOfWeek ||
int x > int curDate.DayOfWeek
&& int x - int curDate.DayOfWeek = 1 ->
yield
x
| _ -> ()
|_ -> ()
]
Your solution seems extremely convoluted to me, and like others have mentioned it only works if the underlying int value of tomorrow's DayOfWeek is one greater than today's. As you know, the week is a cycle so that logic won't always hold true. I don't want to spoonfeed, but there is a much easier solution:
let today = DateTime.Now.DayOfWeek
let days = [|DayOfWeek.Sunday, true;
DayOfWeek.Monday, false;
DayOfWeek.Tuesday, true;
DayOfWeek.Wednesday, true;
DayOfWeek.Thursday, true;
DayOfWeek.Friday, true;
DayOfWeek.Saturday, true;|]
let today_and_tomorrow =
let idx_today = Array.findIndex (fun (day, _) -> day = today) days
days.[idx_today], days.[idx_today + 1 % days.Length]
For me, the difficulty is with using pattern matching.
Here is how I would do it without, allowing you to take any number of days, not just two.
open System
let next count days day =
seq { while true do yield! days } // make the days array infinite
|> Seq.skipWhile (fun (d, _) -> d <> day) // skip until we find our day
|> Seq.filter (fun (_, incl) -> incl) // get rid of 'false' days
|> Seq.take count // take the next 'count' of days
|> Seq.map (fun (d, _) -> d) // we only care about the day now, so a simple map gets rid of the boolean
Using your array of days, I get the following:
DayOfWeek.Sunday
|> next 2 days
val it : seq<DayOfWeek> = seq [Sunday; Tuesday]
and
DayOfWeek.Thursday
|> next 3 days
val it : seq<DayOfWeek> = seq [Thursday; Friday; Saturday]
and
DayOfWeek.Sunday
|> next 10000 days
|> Seq.iter (printfn "%A")
Well, I'm not going to print what this one does, you'll just have to use your imagination. :)
I hope that helps!
Edit I made it handle an infinite number of days.
I think you can write this a lot easier by using the Enum-caps of F#/.net:
open System;;
let weekdayAfter (day : DateTime) : DayOfWeek =
int day.DayOfWeek
|> (fun i -> (i+1) % 7)
|> Microsoft.FSharp.Core.LanguagePrimitives.EnumOfValue<_, _>
let today_and_tomorrow =
let today = DateTime.Today
(today.DayOfWeek, weekdayAfter today)
And if you really want to use pattern-matching then why not go with the readable/obvious solution:
let dayAfter (day : DateTime) =
match day.DayOfWeek with
| DayOfWeek.Sunday -> DayOfWeek.Monday
| DayOfWeek.Monday -> DayOfWeek.Tuesday
| DayOfWeek.Tuesday -> DayOfWeek.Wednesday
| DayOfWeek.Wednesday -> DayOfWeek.Thursday
| DayOfWeek.Thursday -> DayOfWeek.Friday
| DayOfWeek.Friday -> DayOfWeek.Saturday
| DayOfWeek.Saturday -> DayOfWeek.Sunday
| _ -> failwith "should never happen"
Related
I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12
I would like to test whether all of elements in a list/sequence equals something
For example,a sequence of integers.
I would like to test if ALL element of the sequence equals to the same number.
My solution so far looks like imperative programming solution.
let test seq =
if Seq.forall(fun num -> num =1) then 1
elif Seq.forall(fun num-> num = 2) then 2
else None
Your solution is fine! Checking that all elements of a sequence have some value is not something you can nicely express using pattern matching - you have to use when clause, but that's doing exactly the same thing as your code (but with longer syntax). In cases like this, there is absolutely nothing wrong with using if.
You can extend pattern matching by definining custom active patterns, which gives you a nice option here. This is fairly advanced F#, but you can define a custom pattern ForAll n that succeeds when the input is a sequence containing just n values:
let (|ForAll|_|) n seq =
if Seq.forall (fun num -> num = n) seq then Some() else None
Note that success is represented as Some and failure as None. Now, you can solve your problem very nicely using pattern matching:
let test = function
| ForAll 1 -> Some 1
| ForAll 2 -> Some 2
| _ -> None
This looks quite nice, but it's relying on more advanced features - I would do this if this is something that you need in more than one place. If I needed this just in one place, I'd go with ordinary if.
You can rewrite it using pattern matching with a guard clause:
let testList = [2;2;2]
let isOne x = x = 1
let isTwo x = x = 2
let forAll = function
| list when list |> List.forall isOne -> Some 1
| list when list |> List.forall isTwo -> Some 2
| _ -> None
let res = forAll testList //Some 2
Instead of the function you could use partial application on the equals operator.
> let yes = [1;1;1];;
val yes : int list = [1; 1; 1]
> let no = [1;2;3];;
val no : int list = [1; 2; 3]
> yes |> List.forall ((=) 1);;
val it : bool = true
> no |> List.forall ((=) 1);;
val it : bool = false
Maybe this looks more functional? And I think you should return Some 1 in your code, otherwise you'd get type errors since Option and int are not the same type...
If you want to check if all elements are equal (not just if they equal some constant), you could do this:
> [1;2] |> List.pairwise |> List.forall (fun (a,b) -> a = b)
;;
val it : bool = false
> [1;1;1] |> List.pairwise |> List.forall (fun (a,b) -> a = b)
;;
val it : bool = true
There you split your list into tuples and checks if the tuples are equal. This means transitively that all elements are equal.
I have a function that I use a lot and hence the performance needs to be as good as possible. It takes data from excel and then sums, averages or counts over parts of the data based on whether the data is within a certain period and whether it is a peak hour (Mo-Fr 8-20).
The data is usually around 30,000 rows and 2 columns (hourly date, value). One important feature of the data is that the date column is chronologically ordered
I have three implementations, c# with extension methods (dead slow and I m not going to show it unless somebody is interested).
Then I have this f# implementation:
let ispeak dts =
let newdts = DateTime.FromOADate dts
match newdts.DayOfWeek, newdts.Hour with
| DayOfWeek.Saturday, _ | DayOfWeek.Sunday, _ -> false
| _, h when h >= 8 && h < 20 -> true
| _ -> false
let internal isbetween a std edd =
match a with
| r when r >= std && r < edd+1. -> true
| _ -> false
[<ExcelFunction(Name="aggrF")>]
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let newd =
[0 .. (Array.length data) - 1]
|> List.map (fun i -> (data.[i], data2.[i]))
|> Seq.filter (fun (date, _) ->
let dateInRange = isbetween date std edd
match pob with
| "Peak" -> ispeak date && dateInRange
| "Offpeak" -> not(ispeak date) && dateInRange
| _ -> dateInRange)
match sac with
| 0 -> newd |> Seq.averageBy (fun (_, value) -> value)
| 2 -> newd |> Seq.sumBy (fun (_, value) -> 1.0)
| _ -> newd |> Seq.sumBy (fun (_, value) -> value)
I see two issues with this:
I need to prepare the data because both date and value are double[]
I do not utilize the knowledge that dates are chronological hence I do unnecessary iterations.
Here comes now what I would call a brute force imperative c# version:
public static bool ispeak(double dats)
{
var dts = System.DateTime.FromOADate(dats);
if (dts.DayOfWeek != DayOfWeek.Sunday & dts.DayOfWeek != DayOfWeek.Saturday & dts.Hour > 7 & dts.Hour < 20)
return true;
else
return false;
}
[ExcelFunction(Description = "Aggregates HFC/EG into average or sum over period, start date inclusive, end date exclusive")]
public static double aggrI(double[] dts, double[] vals, double std, double edd, string pob, double sumavg)
{
double accsum = 0;
int acccounter = 0;
int indicator = 0;
bool peakbool = pob.Equals("Peak", StringComparison.OrdinalIgnoreCase);
bool offpeakbool = pob.Equals("Offpeak", StringComparison.OrdinalIgnoreCase);
bool basebool = pob.Equals("Base", StringComparison.OrdinalIgnoreCase);
for (int i = 0; i < vals.Length; ++i)
{
if (dts[i] >= std && dts[i] < edd + 1)
{
indicator = 1;
if (peakbool && ispeak(dts[i]))
{
accsum += vals[i];
++acccounter;
}
else if (offpeakbool && (!ispeak(dts[i])))
{
accsum += vals[i];
++acccounter;
}
else if (basebool)
{
accsum += vals[i];
++acccounter;
}
}
else if (indicator == 1)
{
break;
}
}
if (sumavg == 0)
{
return accsum / acccounter;
}
else if (sumavg == 2)
{
return acccounter;
}
else
{
return accsum;
}
}
This is much faster (I m guessing mainly because of the exit of loop when period ended) but oviously less succinct.
My questions:
Is there a way to stop iterations in the f# Seq module for sorted series?
Is there another way to speed up the f# version?
can somebody think of an even better way of doing this?
Thanks a lot!
Update: Speed comparison
I set up a test array with hourly dates from 1/1/13-31/12/15 (roughly 30,000 rows) and corresponding values. I made 150 calls spread out over the date array and repeated this 100 times - 15000 function calls:
My csharp implementation above (with string.compare outside of loop)
1.36 secs
Matthews recursion fsharp
1.55 secs
Tomas array fsharp
1m40secs
My original fsharp
2m20secs
Obviously this is always subjective to my machine but gives an idea and people asked for it...
I also think one should keep in mind this doesnt mean recursion or for loops are always faster than array.map etc, just in this case it does a lot of unnecessary iterations as it doesnt have the early exit from iterations that the c# and the f# recursion method have
Using Array instead of List and Seq makes this about 3-4 times faster. You do not need to generate a list of indices and then map over that to lookup items in the two arrays - instead you can use Array.zip to combine the two arrays into a single one and then use Array.filter.
In general, if you want performance, then using array as your data structure will make sense (unless you have a long pipeline of things). Functions like Array.zip and Array.map can calculate the entire array size, allocate it and then do efficient imperative operation (while still looking functional from the outside).
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let newd =
Array.zip data data2
|> Array.filter (fun (date, _) ->
let dateInRange = isbetween date std edd
match pob with
| "Peak" -> ispeak date && dateInRange
| "Offpeak" -> not(ispeak date) && dateInRange
| _ -> dateInRange)
match sac with
| 0 -> newd |> Array.averageBy (fun (_, value) -> value)
| 2 -> newd |> Array.sumBy (fun (_, value) -> 1.0)
| _ -> newd |> Array.sumBy (fun (_, value) -> value)
I also changed isbetween - it can be simplified into just an expression and you can mark it inline, but that does not add that much:
let inline isbetween r std edd = r >= std && r < edd+1.
Just for completeness, I tested this with the following code (using F# Interactive):
#time
let d1 = Array.init 1000000 float
let d2 = Array.init 1000000 float
aggrF d1 d2 0.0 1000000.0 "Test" 0
The original version was about ~600ms and the new version using arrays takes between 160ms and 200ms. The version by Matthew takes about ~520ms.
Aside, I spent the last two months at BlueMountain Capital working on a time series/data frame library for F# that would make this a lot simpler. It is work in progress and also the name of the library will change, but you can find it in BlueMountain GitHub. The code would look something like this (it uses the fact that the time series is ordered and uses slicing to get the relevant part before filtering):
let ts = Series(times, values)
ts.[std .. edd] |> Series.filter (fun k _ -> not (ispeak k)) |> Series.mean
Currently, this will not be as fast as direct array operations, but I'll look into that :-).
An immediate way to speed it up would be to combine these:
[0 .. (Array.length data) - 1]
|> List.map (fun i -> (data.[i], data2.[i]))
|> Seq.filter (fun (date, _) ->
into a single list comprehension, and also as the other matthew said, do a single string comparison:
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let isValidTime = match pob with
| "Peak" -> (fun x -> ispeak x)
| "Offpeak" -> (fun x -> not(ispeak x))
| _ -> (fun _ -> true)
let data = [ for i in 0 .. (Array.length data) - 1 do
let (date, value) = (data.[i], data2.[i])
if isbetween date std edd && isValidTime date then
yield (date, value)
else
() ]
match sac with
| 0 -> data |> Seq.averageBy (fun (_, value) -> value)
| 2 -> data.Length
| _ -> data |> Seq.sumBy (fun (_, value) -> value)
Or use a tail recursive function:
let aggrF (data:float[]) (data2:float[]) std edd pob sac =
let isValidTime = match pob with
| "Peak" -> (fun x -> ispeak x)
| "Offpeak" -> (fun x -> not(ispeak x))
| _ -> (fun _ -> true)
let endDate = edd + 1.0
let rec aggr i sum count =
if i >= (Array.length data) || data.[i] >= endDate then
match sac with
| 0 -> sum / float(count)
| 2 -> float(count)
| _ -> float(sum)
else if data.[i] >= std && isValidTime data.[i] then
aggr (i + 1) (sum + data2.[i]) (count + 1)
else
aggr (i + 1) sum count
aggr 0 0.0 0
I recently started solving Project Euler problems in Scala, however when I got to problem 14, I got the StackOverflowError, so I rewrote my solution in F#, since (I am told) the F# compiler, unlike Scala's (which produces Java bytecode), translates recursive calls into loops.
My question to you therefore is, how is it possible that the code below throws the StackOverflowException after reaching some number above 113000? I think that the recursion doesn't have to be a tail recursion in order to be translated/optimized, does it?
I tried several rewrites of my code, but without success. I really don't want to have to write the code in imperative style using loops, and I don't think I could turn the len function to be tail-recursive, even if that was the problem preventing it from being optimized.
module Problem14 =
let lenMap = Dictionary<'int,'int>()
let next n =
if n % 2 = 0 then n/2
else 3*n+1
let memoize(num:int, lng:int):int =
lenMap.[num]<-lng
lng
let rec len(num:int):int =
match num with
| 1 -> 1
| _ when lenMap.ContainsKey(num) -> lenMap.[num]
| _ -> memoize(num, 1+(len (next num)))
let cand = seq{ for i in 999999 .. -1 .. 1 -> i}
let tuples = cand |> Seq.map(fun n -> (n, len(n)))
let Result = tuples |> Seq.maxBy(fun n -> snd n) |> fst
NOTE: I am aware that the code below is very far from optimal and several lines could be a lot simpler, but I am not very proficient in F# and did not bother looking up ways to simplify it and make it more elegant (yet).
Thank you.
Your current code runs without error and gets the correct result if I change all the int to int64 and append an L after every numeric literal (e.g. -1L). If the actual problem is that you're overflowing a 32-bit integer, I'm not sure why you get a StackOverflowException.
module Problem14 =
let lenMap = System.Collections.Generic.Dictionary<_,_>()
let next n =
if n % 2L = 0L then n/2L
else 3L*n+1L
let memoize(num, lng) =
lenMap.[num]<-lng
lng
let rec len num =
match num with
| 1L -> 1L
| _ when lenMap.ContainsKey(num) -> lenMap.[num]
| _ -> memoize(num, 1L+(len (next num)))
let cand = seq{ for i in 999999L .. -1L .. 1L -> i}
let tuples = cand |> Seq.map(fun n -> (n, len(n)))
let Result = tuples |> Seq.maxBy(fun n -> snd n) |> fst
Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x