Related
I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12
I'm trying to build F# equivalent code for the Python code appearing here, I've the below code:
let tripleExponentialSmoothing series slen alpha beta gamma nPreds =
let result : float list = []
let mutable smooth = 0.
let mutable trend = 0.
let seasonals = initialSeasonalComponents series 12
for i in 0..(series.Length+nPreds-1) do
match i with
| 0 -> // initial values
smooth <- series |> Array.head |> float
trend <- initialTrend series slen
result |> List.append [series |> Array.head |> float] |> ignore
| i when i >= series.Length -> // we are forecasting
let m = i - series.Length + 1
result |> List.append [(smooth + float m * trend) + seasonals.Item(i%slen)] |> ignore
| _ ->
let v = series |> Array.head |> float
let lastSmooth = smooth
smooth <- alpha*(v-seasonals.Item(i%slen)) + (1.-alpha)*(smooth+trend)
trend <- beta * (smooth-lastSmooth) + (1.-beta)*trend
seasonals.Item(i%slen) <- gamma*(v-smooth) + (1.-gamma)*seasonals.Item(i%slen)
result |> List.append [smooth + trend + seasonals.Item(i%slen)] |> ignore
result
For which I got the below error:
warning FS0020: The result of this expression is implicitly ignored.
Consider using 'ignore' to discard this value explicitly, e.g. 'expr
|> ignore', or 'let' to bind the result to a name, e.g. 'let result =
expr'.
I tried to write the above as conversion of the below Python code:
def triple_exponential_smoothing(series, slen, alpha, beta, gamma, n_preds):
result = []
seasonals = initial_seasonal_components(series, slen)
for i in range(len(series)+n_preds):
if i == 0: # initial values
smooth = series[0]
trend = initial_trend(series, slen)
result.append(series[0])
continue
if i >= len(series): # we are forecasting
m = i - len(series) + 1
result.append((smooth + m*trend) + seasonals[i%slen])
else:
val = series[i]
last_smooth, smooth = smooth, alpha*(val-seasonals[i%slen]) + (1-alpha)*(smooth+trend)
trend = beta * (smooth-last_smooth) + (1-beta)*trend
seasonals[i%slen] = gamma*(val-smooth) + (1-gamma)*seasonals[i%slen]
result.append(smooth+trend+seasonals[i%slen])
return result
What the wrong things I did, and what is the correct code equivalent to the mentioned Python's one.
You are running the for as a side effect.
Maybe you want to a seq expression, I think in Python you have generators but here in F# remember that everything is an expression.
let tripleExponentialSmoothing series slen alpha beta gamma nPreds =
let mutable smooth = 0.
let mutable trend = 0.
let seasonals = initialSeasonalComponents series 12 |> Dictionary
seq {
for i in 0..(series.Length+nPreds-1) do
match i with
| 0 -> // initial values
smooth <- series |> Array.head |> float
trend <- initialTrend series slen
yield series |> Array.head |> float
| i when i >= series.Length -> // we are forecasting
let m = i - series.Length + 1
yield (smooth + float m * trend) + seasonals.[i%slen]
| _ ->
let v = series |> Array.head |> float
let lastSmooth = smooth
smooth <- alpha*(v-seasonals.[i%slen]) + (1.-alpha)*(smooth+trend)
trend <- beta * (smooth-lastSmooth) + (1.-beta)*trend
seasonals.[i%slen] <- gamma*(v-smooth) + (1.-gamma)*seasonals.[i%slen]
yield smooth + trend + seasonals.[i%slen] }
So, a sequence expressions is of the form seq { expr } and inside the expression you use yield to yield results.
As mentioned in the other answer, you are trying to mutate the list of results in the for loop, but this is not possible, because F# lists are immutable by default.
If you want to directly follow the style of the Python code, you can use mutable ResizeArray:
let tripleExponentialSmoothing series slen alpha beta gamma nPreds =
let result = ResizeArray<_>()
let mutable smooth = 0.
let mutable trend = 0.
let seasonals = initialSeasonalComponents series 12
for i in 0..(series.Length+nPreds-1) do
match i with
| 0 -> // initial values
smooth <- series |> Array.head |> float
trend <- initialTrend series slen
result.Add(series |> Array.head |> float)
| i when i >= series.Length -> // we are forecasting
let m = i - series.Length + 1
result.Add((smooth + float m * trend) + seasonals.Item(i%slen))
| _ ->
let v = series |> Array.head |> float
let lastSmooth = smooth
smooth <- alpha*(v-seasonals.Item(i%slen)) + (1.-alpha)*(smooth+trend)
trend <- beta * (smooth-lastSmooth) + (1.-beta)*trend
seasonals.Item(i%slen) <- gamma*(v-smooth) + (1.-gamma)*seasonals.Item(i%slen)
result.Add(smooth + trend + seasonals.Item(i%slen))
This is not very idiomatic F# code, but it solves your immediate problem. For a more functional solution, you can go with sequence expressions as mentioned by Gustavo and yield results one by one, but you still keep smooth and trend as mutable variables, so there probably is even nicer way of doing this - but that's hard to guess without knowing more about your algorithm.
Suppose we have a pair of input arrays, or a list of (key, value) tuples if you prefer. What's an elegant and performant way to combine values that have indices falling in a certain interval? For example, if the interval (or 'bin') size is 10 then the values of all indices from 0 < x <= 10 would be combined, as would the values of indices from 10 < x <= 20 and so on. I want:
let interval = 10
let index = [| 6; 12; 18; 24 |]
let value = [| a; b; c; d |]
result = [| a; b + c; d |]
The crudest way to do this would be to use a whole lot of if, else if statements (the index range has a defined upper limit). I got close with
for i = 0 to index.Length do
result.[Math.Floor(index.[i]/10] += value.[Math.Floor(index.[i]/10]
but this is doing 0 <= x < 10, not 0 < x <= 10.
I also tried assuming the indices are ordered and evenly spaced, with
for i = 1 : ( index.Length - 1 ) / valuesPerBin
valueRange = ((i-1)*valuesPerBin + 1) : i*valuesPerBin )
result(i) = sum(value(valueRange))
which is nice but obviously breaks if there is a non integer number of values per bin.
What's the best way of doing this in F#? Is there a name or an existing function for what I'm trying to do?
let interval = 10
let index = [6;12;18;24]
let value =[101;102;103;104]
let intervals = List.map (fun e -> e/interval) index
let keys = List.map2(fun e1 e2 -> (e1,e2)) intervals value
let skeys = Seq.ofList keys
let result = skeys
|>Seq.groupBy (fun p -> fst p)
|>Seq.map (fun p -> snd p)
|>Seq.map(fun s -> Seq.sumBy (fun p -> snd p) s)
result will be [101;205;104] (as a Seq).
If you want to convert to an array, apply Seq.toArray.
Is it what you wanted ?
Adapt the surrounding code to use
0 <= x < 10 instead of 0 < x <= 10. In my case this was just a simple definition change in another function, allowing me to use
for i = 0 to index.Length do
result.[Math.Floor(index.[i]/10] += value.[Math.Floor(index.[i]/10], which is much simpler and terser syntax than the alternatives.
I am trying to create a save/load function for 2d objects that has been drawn into a form.
type circle = { X : int; Y : int; Diameter : int; Brush : Brush}
type Square = { X : int; Y : int; Length : int; Height: int; Brush : Brush}
When i create the object i put them into 2 lists 1 for each type.
My initial thought was to read and write these objects to a textfile, see below:
saveFile.Click.Add(fun _ ->
for c in listOfCircles do
myfile.WriteLine("Circle," + c.X.ToString() + "," + c.Y.ToString() + "," + c.Diameter.ToString() + "," + c.Brush.ToString())
for s in listOfSquares do
myfile.WriteLine("Square," + s.X.ToString() + "," + s.Y.ToString() + "," + s.Height.ToString() + "," + s.Length.ToString() + "," + s.Brush.ToString())
myfile.Close() // close the file
And in the textfile it looks like this
Circle,200,200,50,System.Drawing.SolidBrush
Square,50,55,45,55,System.Drawing.SolidBrush
From here i want to read these values and then be able to parse them and recreate the objects by adding the objects the lists and redraw them.
let readCircle =
System.IO.File.ReadAllLines path
|> Array.choose (fun s ->
match s.Split ',' with
| [|x; y ; z ; b ; _|] when x = "Circle" -> Some (y, z, b)
| _ -> None )
let readSquare =
System.IO.File.ReadAllLines path
|> Array.choose (fun s ->
match s.Split ',' with
| [|x; y ; z ; b ; a ; _|] when x = "Square" -> Some (y, z, b, a)
| _ -> None )
These functions gives me
val readCircle : (string * string * string) [] = [|("200", "200", "50")|]
val readSquare : (string * string * string * string) [] = [|("50", "55", "45", "55")|]
The problem i have now is im not sure how to obtain the values from the array. Beneath is example with multiple circles.
val readCircle : (string * string * string) [] = [|("200", "200", "50"); ("200", "200","50")|]
Any ideas or comments about how to go from here/how to resolve this issue is very appreciated! Question summary: how could i get the values from the array and put them in for example my already created add functions, see below:
listOfCircles.Add({ X = 200; Y = 200; Diameter = 50; Brush = Brushes.Black})
You could convert the arrays of string tuples you have using Array.map, e.g.
[|("200", "200", "50"); ("200", "200","50")|]
|> Array.map (fun (x,y,d) -> {X = int32 x; Y = int32 y; Diameter = int32 d; Brush = Brushes.Black})
It might be a bit clearer if you converted to circle or square as you parsed the file, then you would have an array of circle or an array of square that you can add directly to your lists.
let readCircle =
System.IO.File.ReadAllLines path
|> Array.choose (fun s ->
match s.Split ',' with
| [|t; x; y; d; _|] when t = "Circle" ->
Some {X = int32 x; Y = int32 y; Diameter = int32 d; Brush = Brushes.Red}
| _ -> None )
But... if you wanted to make larger changes, you could use discriminated unions to represent your shapes, they would then share a common type of Shape and you could parse circles and squares in the same function.
type Shape =
| Circle of X : int * Y : int * Diameter : int * Brush : Brush
| Square of X : int * Y : int * Length : int * Height: int * Brush : Brush
let readShapes (data: string array) =
data
|> Array.choose (fun s ->
match s.Split ',' with
| [|t; x; y; d; _|] when t = "Circle" ->
Some (Circle(X = int32 x, Y = int32 y, Diameter = int32 d, Brush = Brushes.Red))
| [|t; x; y; l; h; _|] when t = "Square" ->
Some (Square(X = int32 x, Y = int32 y, Length = int32 l, Height = int32 h, Brush = Brushes.Red))
| _ -> None )
let listOfShapes = ResizeArray<_>()
let testInput = """
Circle,200,200,50,System.Drawing.SolidBrush
Square,50,55,45,55,System.Drawing.SolidBrush"""
testInput.Split('\n') // System.IO.File.ReadAllLines path
|> readShapes
|> Array.iter (listOfShapes.Add)
Which would result in
val it : System.Collections.Generic.List<Shape> =
seq
[Circle (200,200,50,System.Drawing.SolidBrush {Color = Color [Red];});
Square (50,55,45,55,System.Drawing.SolidBrush {Color = Color [Red];})]
You could then use pattern matching to draw each type of shape
let drawShape shape =
match shape with
| Circle(x,y,d,b) ->
printfn "Pretend I just drew a circle at %d,%d with diameter %d." x y d
| Square(x,y,l,h,b) ->
printfn "Pretend I just drew a rectangle at %d,%d that was %d long and %d high." x y l h
listOfShapes |> Seq.iter drawShape
Giving
Pretend I just drew a circle at 200,200 with diameter 50.
Pretend I just drew a rectangle at 50,55 that was 45 long and 55 high.
If I understand your goal, this is how I would go about it. I've only implemented Circle; you'll need to modify it to handle Square.
open System
open System.Collections.Generic
open System.Drawing
open System.IO
let memoize f =
let cache = Dictionary()
fun key ->
match cache.TryGetValue(key) with
| true, value -> value
| _ ->
let value = f key
cache.Add(key, value)
value
let getBrush =
memoize (fun name -> typeof<Brushes>.GetProperty(name).GetValue(null) :?> SolidBrush)
type Circle =
{ X : int
Y : int
Diameter : int
Brush : SolidBrush } with
override this.ToString() =
sprintf "Circle,%d,%d,%d,%s" this.X this.Y this.Diameter this.Brush.Color.Name
static member Parse(s: string) =
match s.Split(',') with
| [|"Circle";x;y;diameter;brushName|] -> {X=int x; Y=int y; Diameter=int diameter; Brush=getBrush brushName}
| _ -> invalidArg "s" "Cannot parse string"
let writeShapesToFile fileName shapes =
File.WriteAllLines(fileName, Seq.map (sprintf "%O") shapes)
let readShapesFromFile fileName =
File.ReadAllLines(fileName) |> Array.map Circle.Parse
Also, you might consider using a class hierarchy instead of records since much of the structure and behavior of Circle and Square are shared.
This is fun - I approached it in a totally different way than Daniel (but I agree with him that you classes might be a better approach for your shapes). Instead, I took advantage of discriminated unions (and there are better ways to do this - more later):
First, I define a type for a list of parameters for making a shape:
type Parameter =
| Label of string
| Number of int
Now let's convert a string to a parameter:
let toParameter s =
match Int32.TryParse(s) with
| (true, i) -> Number(i)
| (_, _) -> Label(s)
Now to convert a list of strings to a list of Parameter:
let stringListToParameterList stringlist = stringlist |> List.map(function s -> toParameter s)
Now to convert a comma-separated string to a list of string:
let commastringToList (s:string) = s.Split(',') |> Array.toList
OK - great - let's define your records and a master Shape:
type circlerec = { X : int; Y : int; Diameter : int; Brush : Brush}
type squarerec = { X : int; Y : int; Length : int; Height: int; Brush : Brush}
type Shape =
| Circle of circlerec
| Square of squarerec
With this we need a way to make a Shape from a parameter list. This is brute force, but it reads well enough:
let toShape list =
match list with
| Label("Circle") :: Number(x) :: Number(y) :: Number(diam) :: Label(colorName) :: [] ->
Circle({X = x; Y = y; Diameter = diam; Brush = new SolidBrush(Color.FromName(colorName)); })
| Label("Circle") :: rest -> raise <| new ArgumentException("parse error:expected Circle num num num color but got " + list.ToString())
| Label("Square") :: Number(x) :: Number(y) :: Number(length) :: Number(height) :: Label(colorName) :: [] ->
Square({X = x; Y = y; Length = length; Height = height; Brush = new SolidBrush(Color.FromName(colorName)); })
| Label("Square") :: rest -> raise <| new ArgumentException("parse error:expected Square num num num num color but got " + list.ToString())
| _ -> raise <| new ArgumentException("parse error: unknown shape: " + list.ToString())
It's dense, but I'm using F#'s pattern matching to spot the various parameters for each shape. Note that you could now do things like have Square,x,y,size,colorName in your file and make a Square where Length and Height are equal to size by just adding in the pattern.
Finally comes the piece de resistance, converting your file into shapes:
let toShapes path =
System.IO.File.ReadAllLines path |> Array.toList |>
List.map(function s -> s |> commastringToList |>
stringListToParameterList |> toShape)
which maps every line in the file to a list of string which then maps each line to a shape, but piping the comma string to the list converter and then through the parameter list and then to a shape.
Now where this is bad is that the error checking is pretty horrid and that the Parameter type should really include Pigment of Color, which would allow you to look at the string that comes in and if it's valid Color name, map it to a Pigment else a Label.
First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.
I need to generate tree similar to the following when given a truth table
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
learnedTree is of type BinaryTree which I've defined as follows:
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table
A1 | A2 | A3 | Class
1 0 0 1
0 1 0 1
0 0 0 0
1 0 1 0
0 0 0 0
1 1 0 1
0 1 1 0
And I decide to split on attribute A1 I would end up with the following:
(A1 = 1) A1 (A1 = 0)
A2 | A3 | Class A2 | A3 | Class
0 0 1 1 0 1
0 1 0 0 0 0
1 0 1 0 0 0
0 1 1
Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
Here is what I've kind of "hacked" together thus far, but I think I might be way off:
let rec createTree (listToSplit : list<list<float>>) index =
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
if leftSideSplit.Length > 0 then
let pureCheck = isListPure leftSideSplit
if pureCheck = 0 then
printfn "%s" "Pure left node class 0"
createTree leftSideSplit (index + 1)
else if pureCheck = 1 then
printfn "%s" "Pure left node class 1"
createTree leftSideSplit (index + 1)
else
printfn "%s - %A" "Recursing Left" leftSideSplit
createTree leftSideSplit (index + 1)
else printfn "%s" "Pure left node class 0"
Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!
Edit: I've since posted an implementation of ID3 on my blog at:
http://blogs.msdn.com/chrsmith
Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.
In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.
type Attribute = string
type Value = string
type Record =
{
Weather : string
Temperature : string
PlayTennis : bool
}
override this.ToString() =
sprintf
"{Weather = %s, Temp = %s, PlayTennis = %b}"
this.Weather
this.Temperature
this.PlayTennis
type Decision = Attribute * Value
type DecisionTreeNode =
| Branch of Decision * DecisionTreeNode * DecisionTreeNode
| Leaf of Record list
// ------------------------------------
// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data =
// Just group by weather, then by temperature
let uniqueWeathers =
List.fold
(fun acc item -> Set.add item.Weather acc)
Set.empty
data
let uniqueTemperatures =
List.fold
(fun acc item -> Set.add item.Temperature acc)
Set.empty
data
if uniqueWeathers.Count = 1 then
let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
let left, right =
List.partition
(fun item -> item.Temperature = uniqueTemperatures.MinimumElement)
data
(bestSplit, left, right)
else
let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
let left, right =
List.partition
(fun item -> item.Weather = uniqueWeathers.MinimumElement)
data
(bestSplit, left, right)
let rec determineBranch data =
if List.length data < 4 then
Leaf(data)
else
// Use the entropy function to break the dataset on
// the category / value that best splits the data
let bestDecision, leftBranch, rightBranch = bestSplit data
Branch(
bestDecision,
determineBranch leftBranch,
determineBranch rightBranch)
// ------------------------------------
let rec printID3Result indent branch =
let padding = new System.String(' ', indent)
match branch with
| Leaf(data) ->
data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
| Branch(decision, lhs, rhs) ->
printfn "%sBranch predicate [%A]" padding decision
printfn "%sWhere predicate is true:" padding
printID3Result (indent + 4) lhs
printfn "%sWhere predicate is false:" padding
printID3Result (indent + 4) rhs
// ------------------------------------
let dataset =
[
{ Weather = "windy"; Temperature = "hot"; PlayTennis = false }
{ Weather = "windy"; Temperature = "cool"; PlayTennis = false }
{ Weather = "nice"; Temperature = "cool"; PlayTennis = true }
{ Weather = "nice"; Temperature = "cold"; PlayTennis = true }
{ Weather = "humid"; Temperature = "hot"; PlayTennis = false }
]
printfn "Given input list:"
dataset |> List.iter (printfn "%A")
printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result
You can use List.partition instead of your two List.choose calls.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html
(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )
It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.
And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.
Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.
Thanks Brian & Chris! I was actually able to figure this out and I ended up with the following. This calculates the information gain for determining the best place to split. I'm sure there are probably better ways for me to arrive at this solution especially around the chosen data structures, but this is a start. I plan to refine things later.
#light
open System
let trainList =
[
[1.;0.;0.;1.;];
[0.;1.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;1.;0.;];
[0.;0.;0.;0.;];
[1.;1.;0.;1.;];
[0.;1.;1.;0.;];
[1.;0.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;0.;1.;];
]
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
let entropyList nums =
let sumOfnums =
nums
|> Seq.sum
nums
|> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.))))
|> Seq.sum
let entropyBinaryList (dataListOfLists:list<list<float>>) =
let classList =
dataListOfLists
|> List.map (fun x -> x.Item(x.Length - 1))
let ListOfNo =
classList
|> List.choose (fun x -> if x = 0. then Some(x) else None)
let ListOfYes =
classList
|> List.choose (fun x -> if x = 1. then Some(x) else None)
let numberOfYes : float = float ListOfYes.Length
let numberOfNo : float = float ListOfNo.Length
let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo]
entropyList ListOfNumYesAndSumNo
let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber =
let NoAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None)
let YesAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None)
let numberOfYes : float = float YesAttributeList.Length
let numberOfNo : float = float NoAttributeList.Length
let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes))
let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes))
[noConditionalEntropy; yesConditionalEntropy]
let findBestSplitIndex(listOfInstances : list<list<float>>) =
let IGList =
[0..(listOfInstances.Item(0).Length - 2)]
|> List.mapi (fun i x -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x))))
IGList
|> List.maxBy snd
|> fst
let isListPure (listToCheck : list<list<float>>) =
let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None)
if splitList.Length = listToCheck.Length then 1
else if splitList.Length = 0 then 0
else -1
let rec createTree (listToSplit : list<list<float>>) =
let pureCheck = isListPure listToSplit
if pureCheck = 0 then
printfn "%s" "Pure - Leaf(0)"
else if pureCheck = 1 then
printfn "%s" "Pure - Leaf(1)"
else
printfn "%A - is not pure" listToSplit
if listToSplit.Length > 1 then // There are attributes we can split on
// Chose best place to split list
let splitIndex = findBestSplitIndex(listToSplit)
printfn "spliting at index %A" splitIndex
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None)
createTree leftSideSplit
createTree rightSideSplit
else
printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)"