Help Needed Creating a Binary Tree Given Truth Table - f#

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.
I need to generate tree similar to the following when given a truth table
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
learnedTree is of type BinaryTree which I've defined as follows:
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table
A1 | A2 | A3 | Class
1 0 0 1
0 1 0 1
0 0 0 0
1 0 1 0
0 0 0 0
1 1 0 1
0 1 1 0
And I decide to split on attribute A1 I would end up with the following:
(A1 = 1) A1 (A1 = 0)
A2 | A3 | Class A2 | A3 | Class
0 0 1 1 0 1
0 1 0 0 0 0
1 0 1 0 0 0
0 1 1
Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
Here is what I've kind of "hacked" together thus far, but I think I might be way off:
let rec createTree (listToSplit : list<list<float>>) index =
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
if leftSideSplit.Length > 0 then
let pureCheck = isListPure leftSideSplit
if pureCheck = 0 then
printfn "%s" "Pure left node class 0"
createTree leftSideSplit (index + 1)
else if pureCheck = 1 then
printfn "%s" "Pure left node class 1"
createTree leftSideSplit (index + 1)
else
printfn "%s - %A" "Recursing Left" leftSideSplit
createTree leftSideSplit (index + 1)
else printfn "%s" "Pure left node class 0"
Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!

Edit: I've since posted an implementation of ID3 on my blog at:
http://blogs.msdn.com/chrsmith
Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.
In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.
type Attribute = string
type Value = string
type Record =
{
Weather : string
Temperature : string
PlayTennis : bool
}
override this.ToString() =
sprintf
"{Weather = %s, Temp = %s, PlayTennis = %b}"
this.Weather
this.Temperature
this.PlayTennis
type Decision = Attribute * Value
type DecisionTreeNode =
| Branch of Decision * DecisionTreeNode * DecisionTreeNode
| Leaf of Record list
// ------------------------------------
// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data =
// Just group by weather, then by temperature
let uniqueWeathers =
List.fold
(fun acc item -> Set.add item.Weather acc)
Set.empty
data
let uniqueTemperatures =
List.fold
(fun acc item -> Set.add item.Temperature acc)
Set.empty
data
if uniqueWeathers.Count = 1 then
let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
let left, right =
List.partition
(fun item -> item.Temperature = uniqueTemperatures.MinimumElement)
data
(bestSplit, left, right)
else
let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
let left, right =
List.partition
(fun item -> item.Weather = uniqueWeathers.MinimumElement)
data
(bestSplit, left, right)
let rec determineBranch data =
if List.length data < 4 then
Leaf(data)
else
// Use the entropy function to break the dataset on
// the category / value that best splits the data
let bestDecision, leftBranch, rightBranch = bestSplit data
Branch(
bestDecision,
determineBranch leftBranch,
determineBranch rightBranch)
// ------------------------------------
let rec printID3Result indent branch =
let padding = new System.String(' ', indent)
match branch with
| Leaf(data) ->
data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
| Branch(decision, lhs, rhs) ->
printfn "%sBranch predicate [%A]" padding decision
printfn "%sWhere predicate is true:" padding
printID3Result (indent + 4) lhs
printfn "%sWhere predicate is false:" padding
printID3Result (indent + 4) rhs
// ------------------------------------
let dataset =
[
{ Weather = "windy"; Temperature = "hot"; PlayTennis = false }
{ Weather = "windy"; Temperature = "cool"; PlayTennis = false }
{ Weather = "nice"; Temperature = "cool"; PlayTennis = true }
{ Weather = "nice"; Temperature = "cold"; PlayTennis = true }
{ Weather = "humid"; Temperature = "hot"; PlayTennis = false }
]
printfn "Given input list:"
dataset |> List.iter (printfn "%A")
printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result

You can use List.partition instead of your two List.choose calls.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html
(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )
It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.
And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.
Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.

Thanks Brian & Chris! I was actually able to figure this out and I ended up with the following. This calculates the information gain for determining the best place to split. I'm sure there are probably better ways for me to arrive at this solution especially around the chosen data structures, but this is a start. I plan to refine things later.
#light
open System
let trainList =
[
[1.;0.;0.;1.;];
[0.;1.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;1.;0.;];
[0.;0.;0.;0.;];
[1.;1.;0.;1.;];
[0.;1.;1.;0.;];
[1.;0.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;0.;1.;];
]
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
let entropyList nums =
let sumOfnums =
nums
|> Seq.sum
nums
|> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.))))
|> Seq.sum
let entropyBinaryList (dataListOfLists:list<list<float>>) =
let classList =
dataListOfLists
|> List.map (fun x -> x.Item(x.Length - 1))
let ListOfNo =
classList
|> List.choose (fun x -> if x = 0. then Some(x) else None)
let ListOfYes =
classList
|> List.choose (fun x -> if x = 1. then Some(x) else None)
let numberOfYes : float = float ListOfYes.Length
let numberOfNo : float = float ListOfNo.Length
let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo]
entropyList ListOfNumYesAndSumNo
let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber =
let NoAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None)
let YesAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None)
let numberOfYes : float = float YesAttributeList.Length
let numberOfNo : float = float NoAttributeList.Length
let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes))
let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes))
[noConditionalEntropy; yesConditionalEntropy]
let findBestSplitIndex(listOfInstances : list<list<float>>) =
let IGList =
[0..(listOfInstances.Item(0).Length - 2)]
|> List.mapi (fun i x -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x))))
IGList
|> List.maxBy snd
|> fst
let isListPure (listToCheck : list<list<float>>) =
let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None)
if splitList.Length = listToCheck.Length then 1
else if splitList.Length = 0 then 0
else -1
let rec createTree (listToSplit : list<list<float>>) =
let pureCheck = isListPure listToSplit
if pureCheck = 0 then
printfn "%s" "Pure - Leaf(0)"
else if pureCheck = 1 then
printfn "%s" "Pure - Leaf(1)"
else
printfn "%A - is not pure" listToSplit
if listToSplit.Length > 1 then // There are attributes we can split on
// Chose best place to split list
let splitIndex = findBestSplitIndex(listToSplit)
printfn "spliting at index %A" splitIndex
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None)
createTree leftSideSplit
createTree rightSideSplit
else
printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)"

Related

Imperative to Functional

I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12

Tail Recursion in F# : Stack Overflow

I'm trying to implement Kosaraju's algorithm on a large graph
as part of an assignment [MOOC Algo I Stanford on Coursera]
https://en.wikipedia.org/wiki/Kosaraju%27s_algorithm
The current code works on a small graph, but I'm hitting Stack Overflow during runtime execution.
Despite having read the relevant chapter in Expert in F#, or other available examples on websites and SO, i still don't get how to use continuation to solve this problem
Below is the full code for general purpose, but it will already fail when executing DFSLoop1 and the recursive function DFSsub inside. I think I'm not making the function tail recursive [because of the instructions
t<-t+1
G.[n].finishingtime <- t
?]
but i don't understand how i can implement the continuation properly.
When considering only the part that fails, DFSLoop1 is taking as argument a graph to which we will apply Depth-First Search. We need to record the finishing time as part of the algo to proceed to the second part of the algo in a second DFS Loop (DFSLoop2) [of course we are failing before that].
open System
open System.Collections.Generic
open System.IO
let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - SCC.txt";;
// let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - test1.txt";;
// val x : string [] =
let splitAtTab (text:string)=
text.Split [|'\t';' '|]
let splitIntoKeyValue (A: int[]) =
(A.[0], A.[1])
let parseLine (line:string)=
line
|> splitAtTab
|> Array.filter (fun s -> not(s=""))
|> Array.map (fun s-> (int s))
|> splitIntoKeyValue
let y =
x |> Array.map parseLine
//val it : (int * int) []
type Children = int[]
type Node1 =
{children : Children ;
mutable finishingtime : int ;
mutable explored1 : bool ;
}
type Node2 =
{children : Children ;
mutable leader : int ;
mutable explored2 : bool ;
}
type DFSgraphcore = Dictionary<int,Children>
let directgraphcore = new DFSgraphcore()
let reversegraphcore = new DFSgraphcore()
type DFSgraph1 = Dictionary<int,Node1>
let reversegraph1 = new DFSgraph1()
type DFSgraph2 = Dictionary<int,Node2>
let directgraph2 = new DFSgraph2()
let AddtoGraph (G:DFSgraphcore) (n,c) =
if not(G.ContainsKey n) then
let node = [|c|]
G.Add(n,node)
else
let c'= G.[n]
G.Remove(n) |> ignore
G.Add (n, Array.append c' [|c|])
let inline swaptuple (a,b) = (b,a)
y|> Array.iter (AddtoGraph directgraphcore)
y|> Array.map swaptuple |> Array.iter (AddtoGraph reversegraphcore)
for i in directgraphcore.Keys do
if reversegraphcore.ContainsKey(i) then do
let node = {children = reversegraphcore.[i] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
else
let node = {children = [||] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
directgraphcore.Clear |> ignore
reversegraphcore.Clear |> ignore
// for i in reversegraph1.Keys do printfn "%d %A" i reversegraph1.[i].children
printfn "pause"
Console.ReadKey() |> ignore
let num_nodes =
directgraphcore |> Seq.length
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let mutable k = num_nodes
let rec DFSsub (G:DFSgraph1)(n:int) (cont:int->int) =
//how to make it tail recursive ???
G.[n].explored1 <- true
// G.[n].leader <- s
for j in G.[n].children do
if not(G.[j].explored1) then DFSsub G j cont
t<-t+1
G.[n].finishingtime <- t
// end of DFSsub
for i in num_nodes .. -1 .. 1 do
printfn "%d" i
if not(G.[i].explored1) then do
s <- i
( DFSsub G i (fun s -> s) ) |> ignore
// printfn "%d %d" i G.[i].finishingtime
DFSLoop1 reversegraph1
printfn "pause"
Console.ReadKey() |> ignore
for i in directgraphcore.Keys do
let node = {children =
directgraphcore.[i]
|> Array.map (fun k -> reversegraph1.[k].finishingtime) ;
leader = -1 ;
explored2= false ;
}
directgraph2.Add (reversegraph1.[i].finishingtime,node)
let z = 0
let DFSLoop2 (G:DFSgraph2) =
let mutable t = 0
let mutable s = -1
let mutable k = num_nodes
let rec DFSsub (G:DFSgraph2)(n:int) (cont:int->int) =
G.[n].explored2 <- true
G.[n].leader <- s
for j in G.[n].children do
if not(G.[j].explored2) then DFSsub G j cont
t<-t+1
// G.[n].finishingtime <- t
// end of DFSsub
for i in num_nodes .. -1 .. 1 do
if not(G.[i].explored2) then do
s <- i
( DFSsub G i (fun s -> s) ) |> ignore
// printfn "%d %d" i G.[i].leader
DFSLoop2 directgraph2
printfn "pause"
Console.ReadKey() |> ignore
let table = [for i in directgraph2.Keys do yield directgraph2.[i].leader]
let results = table |> Seq.countBy id |> Seq.map snd |> Seq.toList |> List.sort |> List.rev
printfn "%A" results
printfn "pause"
Console.ReadKey() |> ignore
Here is a text file with a simple graph example
1 4
2 8
3 6
4 7
5 2
6 9
7 1
8 5
8 6
9 7
9 3
(the one which is causing overflow is 70Mo big with around 900,000 nodes)
EDIT
to clarify a few things first
Here is the "pseudo code"
Input: a directed graph G = (V,E), in adjacency list representation. Assume that the vertices V are labeled
1, 2, 3, . . . , n.
1. Let Grev denote the graph G after the orientation of all arcs have been reversed.
2. Run the DFS-Loop subroutine on Grev, processing vertices according to the given order, to obtain a
finishing time f(v) for each vertex v ∈ V .
3. Run the DFS-Loop subroutine on G, processing vertices in decreasing order of f(v), to assign a leader
to each vertex v ∈ V .
4. The strongly connected components of G correspond to vertices of G that share a common leader.
Figure 2: The top level of our SCC algorithm. The f-values and leaders are computed in the first and second
calls to DFS-Loop, respectively (see below).
Input: a directed graph G = (V,E), in adjacency list representation.
1. Initialize a global variable t to 0.
[This keeps track of the number of vertices that have been fully explored.]
2. Initialize a global variable s to NULL.
[This keeps track of the vertex from which the last DFS call was invoked.]
3. For i = n downto 1:
[In the first call, vertices are labeled 1, 2, . . . , n arbitrarily. In the second call, vertices are labeled by
their f(v)-values from the first call.]
(a) if i not yet explored:
i. set s := i
ii. DFS(G, i)
Figure 3: The DFS-Loop subroutine.
Input: a directed graph G = (V,E), in adjacency list representation, and a source vertex i ∈ V .
1. Mark i as explored.
[It remains explored for the entire duration of the DFS-Loop call.]
2. Set leader(i) := s
3. For each arc (i, j) ∈ G:
(a) if j not yet explored:
i. DFS(G, j)
4. t + +
5. Set f(i) := t
Figure 4: The DFS subroutine. The f-values only need to be computed during the first call to DFS-Loop, and
the leader values only need to be computed during the second call to DFS-Loop.
EDIT
i have amended the code, with the help of an experienced programmer (a lisper but who has no experience in F#) simplifying somewhat the first part to have more quickly an example without bothering about non-relevant code for this discussion.
The code focuses only on half of the algo, running DFS once to get finishing times of the reversed tree.
This is the first part of the code just to create a small example
y is the original tree. the first element of a tuple is the parent, the second is the child. But we will be working with the reverse tree
open System
open System.Collections.Generic
open System.IO
let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - SCC.txt";;
// let x = File.ReadAllLines "C:\Users\Fagui\Documents\GitHub\Learning Fsharp\Algo Stanford I\PA 4 - test1.txt";;
// val x : string [] =
let splitAtTab (text:string)=
text.Split [|'\t';' '|]
let splitIntoKeyValue (A: int[]) =
(A.[0], A.[1])
let parseLine (line:string)=
line
|> splitAtTab
|> Array.filter (fun s -> not(s=""))
|> Array.map (fun s-> (int s))
|> splitIntoKeyValue
// let y =
// x |> Array.map parseLine
//let y =
// [|(1, 4); (2, 8); (3, 6); (4, 7); (5, 2); (6, 9); (7, 1); (8, 5); (8, 6);
// (9, 7); (9, 3)|]
// let y = Array.append [|(1,1);(1,2);(2,3);(3,1)|] [|for i in 4 .. 10000 do yield (i,4)|]
let y = Array.append [|(1,1);(1,2);(2,3);(3,1)|] [|for i in 4 .. 99999 do yield (i,i+1)|]
//val it : (int * int) []
type Children = int list
type Node1 =
{children : Children ;
mutable finishingtime : int ;
mutable explored1 : bool ;
}
type Node2 =
{children : Children ;
mutable leader : int ;
mutable explored2 : bool ;
}
type DFSgraphcore = Dictionary<int,Children>
let directgraphcore = new DFSgraphcore()
let reversegraphcore = new DFSgraphcore()
type DFSgraph1 = Dictionary<int,Node1>
let reversegraph1 = new DFSgraph1()
let AddtoGraph (G:DFSgraphcore) (n,c) =
if not(G.ContainsKey n) then
let node = [c]
G.Add(n,node)
else
let c'= G.[n]
G.Remove(n) |> ignore
G.Add (n, List.append c' [c])
let inline swaptuple (a,b) = (b,a)
y|> Array.iter (AddtoGraph directgraphcore)
y|> Array.map swaptuple |> Array.iter (AddtoGraph reversegraphcore)
// définir reversegraph1 = ... with....
for i in reversegraphcore.Keys do
let node = {children = reversegraphcore.[i] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
for i in directgraphcore.Keys do
if not(reversegraphcore.ContainsKey(i)) then do
let node = {children = [] ;
finishingtime = -1 ;
explored1 = false ;
}
reversegraph1.Add (i,node)
directgraphcore.Clear |> ignore
reversegraphcore.Clear |> ignore
// for i in reversegraph1.Keys do printfn "%d %A" i reversegraph1.[i].children
printfn "pause"
Console.ReadKey() |> ignore
let num_nodes =
directgraphcore |> Seq.length
So basically the graph is (1->2->3->1)::(4->5->6->7->8->....->99999->10000)
and the reverse graph is (1->3->2->1)::(10000->9999->....->4)
here is the main code written in direct style
//////////////////// main code is below ///////////////////
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let rec iter (n:int) (f:'a->unit) (list:'a list) : unit =
match list with
| [] -> (t <- t+1) ; (G.[n].finishingtime <- t)
| x::xs -> f x ; iter n f xs
let rec DFSsub (G:DFSgraph1) (n:int) : unit =
let my_f (j:int) : unit = if not(G.[j].explored1) then (DFSsub G j)
G.[n].explored1 <- true
iter n my_f G.[n].children
for i in num_nodes .. -1 .. 1 do
// printfn "%d" i
if not(G.[i].explored1) then do
s <- i
DFSsub G i
printfn "%d %d" i G.[i].finishingtime
// End of DFSLoop1
DFSLoop1 reversegraph1
printfn "pause"
Console.ReadKey() |> ignore
its not tail recursive, so we use continuations, here is the same code adapted to CPS style:
//////////////////// main code is below ///////////////////
let DFSLoop1 (G:DFSgraph1) =
let mutable t = 0
let mutable s = -1
let rec iter_c (n:int) (f_c:'a->(unit->'r)->'r) (list:'a list) (cont: unit->'r) : 'r =
match list with
| [] -> (t <- t+1) ; (G.[n].finishingtime <- t) ; cont()
| x::xs -> f_c x (fun ()-> iter_c n f_c xs cont)
let rec DFSsub (G:DFSgraph1) (n:int) (cont: unit->'r) : 'r=
let my_f_c (j:int)(cont:unit->'r):'r = if not(G.[j].explored1) then (DFSsub G j cont) else cont()
G.[n].explored1 <- true
iter_c n my_f_c G.[n].children cont
for i in maxnum_nodes .. -1 .. 1 do
// printfn "%d" i
if not(G.[i].explored1) then do
s <- i
DFSsub G i id
printfn "%d %d" i G.[i].finishingtime
DFSLoop1 reversegraph1
printfn "faré"
printfn "pause"
Console.ReadKey() |> ignore
both codes compile and give the same results for the small example (the one in comment) or the same tree that we are using , with a smaller size (1000 instead of 100000)
so i don't think its a bug in the algo here, we've got the same tree structure, just a bigger tree is causing problems. it looks to us the continuations are well written. we've typed the code explicitly. and all calls end with a continuation in all cases...
We are looking for expert advice !!! thanks !!!
I did not try to understand the whole code snippet, because it is fairly long, but you'll certainly need to replace the for loop with an iteration implemented using continuation passing style. Something like:
let rec iterc f cont list =
match list with
| [] -> cont ()
| x::xs -> f x (fun () -> iterc f cont xs)
I didn't understand the purpose of cont in your DFSub function (it is never called, is it?), but the continuation based version would look roughly like this:
let rec DFSsub (G:DFSgraph2)(n:int) cont =
G.[n].explored2 <- true
G.[n].leader <- s
G.[n].children
|> iterc
(fun j cont -> if not(G.[j].explored2) then DFSsub G j cont else cont ())
(fun () -> t <- t + 1)
Overflowing the stack when you recurse through hundreds of thousands of entries isn't bad at all, really. A lot of programming language implementations will choke on much shorter recursions than that. You're having serious programmer problems — nothing to be ashamed of!
Now if you want to do deeper recursions than your implementation will handle, you need to transform your algorithm so it is iterative and/or tail-recursive (the two are isomorphic — except that tail-recursion allows for decentralization and modularity, whereas iteration is centralized and non-modular).
To transform an algorithm from recursive to tail-recursive, which is an important skill to possess, you need to understand the state that is implicitly stored in a stack frame, i.e. those free variables in the function body that change across the recursion, and explicitly store them in a FIFO queue (a data structure that replicates your stack, and can be implemented trivially as a linked list). Then you can pass that linked list of reified frame variables as an argument to your tail recursive functions.
In more advanced cases where you have many tail recursive functions each with a different kind of frame, instead of simple self-recursion, you may need to define some mutually recursive data types for the reified stack frames, instead of using a list. But I believe Kosaraju's algorithm only involves self-recursive functions.
OK, so the code given above was the RIGHT code !
the problem lies with the compiler of F#
here is some words about it from Microsoft
http://blogs.msdn.com/b/fsharpteam/archive/2011/07/08/tail-calls-in-fsharp.aspx
Basically, be careful with the settings, in default mode, the compiler may NOT make automatically the tail calls. To do so, in VS2015, go to the Solution Explorer, right click with the mouse and click on "Properties" (the last element of the scrolling list)
Then in the new window, click on "Build" and tick the box "Generate tail calls"
It is also to check if the compiler did its job looking at the disassembly using
ILDASM.exe
you can find the source code for the whole algo in my github repository
https://github.com/FaguiCurtain/Learning-Fsharp/blob/master/Algo%20Stanford/Algo%20Stanford/Kosaraju_cont.fs
on a performance point of view, i'm not very satisfied. The code runs on 36 seconds on my laptop. From the forum with other fellow MOOCers, C/C++/C# typically executes in subsecond to 5s, Java around 10-15, Python around 20-30s.
So my implementation is clearly not optimized. I am now happy to hear about tricks to make it faster !!! thanks !!!!

F# stream of armstrong numbers

I am seeking help, mainly because I am very new to F# environment. I need to use F# stream to generate an infinite stream of Armstrong Numbers. Can any one help with this one. I have done some mambo jumbo but I have no clue where I'm going.
type 'a stream = | Cons of 'a * (unit -> 'a stream)
let rec take n (Cons(x, xsf)) =
if n = 0 then []
else x :: take (n-1) (xsf());;
//to test if two integers are equal
let test x y =
match (x,y) with
| (x,y) when x < y -> false
| (x,y) when x > y -> false
| _ -> true
//to check for armstrong number
let check n =
let mutable m = n
let mutable r = 0
let mutable s = 0
while m <> 0 do
r <- m%10
s <- s+r*r*r
m <- m/10
if (test n s) then true else false
let rec armstrong n =
Cons (n, fun () -> if check (n+1) then armstrong (n+1) else armstrong (n+2))
let pos = armstrong 0
take 5 pos
To be honest your code seems a bit like a mess.
The most basic version I could think of is this:
let isArmstrong (a,b,c) =
a*a*a + b*b*b + c*c*c = (a*100+b*10+c)
let armstrongs =
seq {
for a in [0..9] do
for b in [0..9] do
for c in [0..9] do
if isArmstrong (a,b,c) then yield (a*100+b*10+c)
}
of course assuming a armstrong number is a 3-digit number where the sum of the cubes of the digits is the number itself
this will yield you:
> Seq.toList armstrongs;;
val it : int list = [0; 1; 153; 370; 371; 407]
but it should be easy to add a wider range or remove the one-digit numbers (think about it).
general case
the problem seems so interesting that I choose to implement the general case (see here) too:
let numbers =
let rec create n =
if n = 0 then [(0,[])] else
[
for x in [0..9] do
for (_,xs) in create (n-1) do
yield (n, x::xs)
]
Seq.initInfinite create |> Seq.concat
let toNumber (ds : int list) =
ds |> List.fold (fun s d -> s*10I + bigint d) 0I
let armstrong (m : int, ds : int list) =
ds |> List.map (fun d -> bigint d ** m) |> List.sum
let leadingZero =
function
| 0::_ -> true
| _ -> false
let isArmstrong (m : int, ds : int list) =
if leadingZero ds then false else
let left = armstrong (m, ds)
let right = toNumber ds
left = right
let armstrongs =
numbers
|> Seq.filter isArmstrong
|> Seq.map (snd >> toNumber)
but the numbers get really sparse quickly and using this will soon get you out-of-memory but the
first 20 are:
> Seq.take 20 armstrongs |> Seq.map string |> Seq.toList;;
val it : string list =
["0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "153"; "370"; "371";
"407"; "1634"; "8208"; "9474"; "54748"; "92727"; "93084"]
remark/disclaimer
this is the most basic version - you can get big speed/performance if you just enumerate all numbers and use basic math to get and exponentiate the digits ;) ... sure you can figure it out

How to refactor F# code to not use a mutable accumulator?

The following F# code gives the correct answer to Project Euler problem #7:
let isPrime num =
let upperDivisor = int32(sqrt(float num)) // Is there a better way?
let rec evaluateModulo a =
if a = 1 then
true
else
match num % a with
| 0 -> false
| _ -> evaluateModulo (a - 1)
evaluateModulo upperDivisor
let mutable accumulator = 1 // Would like to avoid mutable values.
let mutable number = 2 // ""
while (accumulator <= 10001) do
if (isPrime number) then
accumulator <- accumulator + 1
number <- number + 1
printfn "The 10001st prime number is %i." (number - 1) // Feels kludgy.
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
I'd like to avoid the mutable values accumulator and number. I'd also like to refactor the while loop into a tail recursive function. Any tips?
Any ideas on how to remove the (number - 1) kludge which displays the result?
Any general comments about this code or suggestions on how to improve it?
Loops are nice, but its more idiomatic to abstract away loops as much as possible.
let isPrime num =
let upperDivisor = int32(sqrt(float num))
match num with
| 0 | 1 -> false
| 2 -> true
| n -> seq { 2 .. upperDivisor } |> Seq.forall (fun x -> num % x <> 0)
let primes = Seq.initInfinite id |> Seq.filter isPrime
let nthPrime n = Seq.nth n primes
printfn "The 10001st prime number is %i." (nthPrime 10001)
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
Sequences are your friend :)
You can refer my F# for Project Euler Wiki:
I got this first version:
let isPrime n =
if n=1 then false
else
let m = int(sqrt (float(n)))
let mutable p = true
for i in 2..m do
if n%i =0 then p <- false
// ~~ I want to break here!
p
let rec nextPrime n =
if isPrime n then n
else nextPrime (n+1)
let problem7 =
let mutable result = nextPrime 2
for i in 2..10001 do
result <- nextPrime (result+1)
result
In this version, although looks nicer, but I still does not early break the loop when the number is not a prime. In Seq module, exist and forall methods support early stop:
let isPrime n =
if n<=1 then false
else
let m = int(sqrt (float(n)))
{2..m} |> Seq.exists (fun i->n%i=0) |> not
// or equivalently :
// {2..m} |> Seq.forall (fun i->n%i<>0)
Notice in this version of isPrime, the function is finally mathematically correct by checking numbers below 2.
Or you can use a tail recursive function to do the while loop:
let isPrime n =
let m = int(sqrt (float(n)))
let rec loop i =
if i>m then true
else
if n%i = 0 then false
else loop (i+1)
loop 2
A more functional version of problem7 is to use Seq.unfold to generate an infinite prime sequence and take nth element of this sequence:
let problem7b =
let primes =
2 |> Seq.unfold (fun p ->
let next = nextPrime (p+1) in
Some( p, next ) )
Seq.nth 10000 primes
Here's my solution, which uses a tail-recursive loop pattern which always allows you to avoid mutables and gain break functionality: http://projecteulerfun.blogspot.com/2010/05/problem-7-what-is-10001st-prime-number.html
let problem7a =
let isPrime n =
let nsqrt = n |> float |> sqrt |> int
let rec isPrime i =
if i > nsqrt then true //break
elif n % i = 0 then false //break
//loop while neither of the above two conditions are true
//pass your state (i+1) to the next call
else isPrime (i+1)
isPrime 2
let nthPrime n =
let rec nthPrime i p count =
if count = n then p //break
//loop while above condition not met
//pass new values in for p and count, emulating state
elif i |> isPrime then nthPrime (i+2) i (count+1)
else nthPrime (i+2) p count
nthPrime 1 1 0
nthPrime 10001
Now, to specifically address some of the questions you had in your solution.
The above nthPrime function allows you to find primes at an arbitrary position, this is how it would look adapted to your approach of finding specifically the 1001 prime, and using your variable names (the solution is tail-recursive and doesn't use mutables):
let prime1001 =
let rec nthPrime i number accumulator =
if accumulator = 1001 then number
//i is prime, so number becomes i in our next call and accumulator is incremented
elif i |> isPrime then prime1001 (i+2) i (accumulator+1)
//i is not prime, so number and accumulator do not change, just advance i to the next odd
else prime1001 (i+2) number accumulator
prime1001 1 1 0
Yes, there is a better way to do square roots: write your own generic square root implementation (reference this and this post for G implementation):
///Finds the square root (integral or floating point) of n
///Does not work with BigRational
let inline sqrt_of (g:G<'a>) n =
if g.zero = n then g.zero
else
let mutable s:'a = (n / g.two) + g.one
let mutable t:'a = (s + (n / s)) / g.two
while t < s do
s <- t
let step1:'a = n/s
let step2:'a = s + step1
t <- step2 / g.two
s
let inline sqrtG n = sqrt_of (G_of n) n
let sqrtn = sqrt_of gn //this has suffix "n" because sqrt is not strictly integral type
let sqrtL = sqrt_of gL
let sqrtI = sqrt_of gI
let sqrtF = sqrt_of gF
let sqrtM = sqrt_of gM

F#: How do i split up a sequence into a sequence of sequences

Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
The return value must be a sequence of sequences to ensure that elements are only produced as needed (cannot use list/array/cacheing)
The solution must NOT be O(n^2), probably ruling out a Seq.take - Seq.skip pattern (cf. Brian's post)
Bonus points for a functionally idiomatic approach (since I want to become more proficient at functional programming), but it's not a requirement.
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ...
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) -> (index+indexres,(time1,elem1),(time2,elem2)) ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> =
LazyList.delayed (fun () ->
match input with
| LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
| LazyList.Cons(x,LazyList.Nil) ->
LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
| LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
let groups = GroupBy comp xs
if comp x y then
LazyList.consf
(LazyList.consf x (fun () ->
let (LazyList.Cons(firstGroup,_)) = groups
firstGroup))
(fun () ->
let (LazyList.Cons(_,otherGroups)) = groups
otherGroups)
else
LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
seq {
let cache = ref (new System.Collections.Generic.List())
for e in input do
(!cache).Add(e)
if not (fun e) then
yield !cache
cache := new System.Collections.Generic.List()
if cache.Length > 0 then
yield !cache
}
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _ [] = [[]]
groupBy1 _ [x] = [[x]]
groupBy1 comp (x : xs#(y : _))
| comp x y = (x : firstGroup) : otherGroups
| otherwise = [x] : groups
where groups#(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
(EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)
Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let rec partitions (first:option<_>) =
seq {
match first with
| Some first' -> //'
(* The following value is always overwritten;
it represents the first element of the next subsequence to output, if any *)
let next = ref None
(* This function generates a subsequence to output,
setting next appropriately as it goes *)
let rec iter item =
seq {
yield item
if (en.MoveNext()) then
let curr = en.Current
if (cmp item curr) then
yield! iter curr
else // consumed one too many - pass it on as the start of the next sequence
next := Some curr
else
next := None
}
yield iter first' (* ' generate the first sequence *)
yield! partitions !next (* recursively generate all remaining sequences *)
| None -> () // return an empty sequence if there are no more values
}
let first = if en.MoveNext() then Some en.Current else None
partitions first
let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) =
groupBy (fun (t,_) (t',_) -> t' - t <= time)
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
let en = sq.GetEnumerator()
let next() = if en.MoveNext() then Some en.Current else None
(* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
let rec seqStartingWith start =
match next() with
| Some y when cmp start y ->
let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
seq { yield start; yield! fst (Lazy.force rest_next) },
lazy Lazy.force (snd (Lazy.force rest_next))
| next -> seq { yield start }, lazy next
let rec iter start =
seq {
match (Lazy.force start) with
| None -> ()
| Some start ->
let (first,next) = seqStartingWith start
yield first
yield! iter next
}
Seq.cache (iter (lazy next()))
Below is some code that does what I think you want. It is not idiomatic F#.
(It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)
But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.
Anyway, here's the code:
type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
// f() = true means the neighbors should be kept together
// f() = false means they should be split
let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
// simple unfold that assumes f captured a mutable variable
let iter f = Seq.unfold (fun _ ->
match f() with
| Some(x) -> Some(x,())
| None -> None) ()
seq {
let state = ref (Unstarted)
use ie = input.GetEnumerator()
let innerMoveNext() =
match !state with
| Unstarted ->
if ie.MoveNext()
then let cur = ie.Current
state := InnerOkay(cur); Some(cur)
else state := Finished; None
| InnerOkay(last) ->
if ie.MoveNext()
then let cur = ie.Current
if f last cur
then state := InnerOkay(cur); Some(cur)
else state := NeedNewInner(cur); None
else state := Finished; None
| NeedNewInner(last) -> state := InnerOkay(last); Some(last)
| Finished -> None
let outerMoveNext() =
match !state with
| Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
| InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
| Finished -> None
yield! iter outerMoveNext }
open System
let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
// Test data
let numbers = {1 .. 15}
let contiguousTimeStamps =
let baseTime = DateTime.Now
seq { for n in numbers -> baseTime.AddMinutes(float n)}
let holeyData =
Seq.zip contiguousTimeStamps numbers
|> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
printfn "Consuming..."
for group in grouped_data do
printfn "about to do a group"
for x in group do
printfn " %A" x
Ok, here's an answer I'm not unhappy with.
(EDIT: I am unhappy - it's wrong! No time to try to fix right now though.)
It uses a bit of imperative state, but it is not too difficult to follow (provided you recall that '!' is the F# dereference operator, and not 'not'). It is as lazy as possible, and takes a seq as input and returns a seq of seqs as output.
let N = 20
let data = // produce some arbitrary data with holes
seq {
for x in 1..N do
if x % 4 <> 0 && x % 7 <> 0 then
printfn "producing %d" x
yield x
}
let rec GroupBy comp (input:seq<_>) = seq {
let doneWithThisGroup = ref false
let areMore = ref true
use e = input.GetEnumerator()
let Next() = areMore := e.MoveNext(); !areMore
// deal with length 0 or 1, seed 'prev'
if not(e.MoveNext()) then () else
let prev = ref e.Current
while !areMore do
yield seq {
while not(!doneWithThisGroup) do
if Next() then
let next = e.Current
doneWithThisGroup := not(comp !prev next)
yield !prev
prev := next
else
// end of list, yield final value
yield !prev
doneWithThisGroup := true }
doneWithThisGroup := false }
let result = data |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
printfn "about to do a group"
for x in group do
printfn " %d" x

Resources