Why is my precondition being ignored on my Property-based test? - f#

Why is my precondition being ignored on my Property-based test?
The precondition for my test is the following:
fun rowCount -> rowCount >= 0
Thus, my actual test is:
[<Fact>]
let ``number of cells in grid equals rowcount squared`` () =
Check.QuickThrowOnFailure <|
fun rowCount -> rowCount >= 0 ==>
fun rowCount -> rowCount |> createGrid
|> Map.toList
|> List.length = rowCount * rowCount
However, my test continues to fail:
Result Message: System.Exception : Falsifiable, after 3 tests (1
shrink) (StdGen (985619705,296133555)): Original: 1
-1 Shrunk: 0
-1
Domain:
let createGrid rowCount =
[for x in 0..rowCount-1 do
for y in 0..rowCount-1 do
yield { X=x; Y=y; State=Dead }
]|> List.map (fun c -> (c.X, c.Y), { X=c.X; Y=c.Y; State=Dead })
|> Map.ofList
[UPDATE]
I've also tried:
let precondition rowCount =
rowCount >= 0
let ``some property`` rowCount =
precondition rowCount ==> rowCount |> createGrid
|> Map.toList
|> List.length = rowCount * rowCount
[<Fact>]
let ``number of cells in grid equals rowcount squared`` () =
Check.QuickThrowOnFailure <| ``some property``
However, I receive the following error:
Type mismatch. Expecting a
Property -> 'a but given a
int -> Map<(int * int),Cell> The type 'Property' does not match the type 'int'

As #FyodorSoikin points out in his comment, you have two nested functions that each take a rowCount.
The second rowCount value shadows the first one, but the ==> precondition function only works on the first rowCount value. Thus, the rowCount value actually used for testing is still unbounded.
Make the test simpler, and it'll work:
open Xunit
open FsCheck
[<Fact>]
let ``number of cells in grid equals rowcount squared`` () =
Check.QuickThrowOnFailure <| fun rowCount ->
rowCount >= 0 ==>
(rowCount
|> createGrid
|> Map.toList
|> List.length = rowCount * rowCount)

Related

Imperative to Functional

I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12

F# Can I use negative indices on Arrays. (like in Python)?

Edit 2021:
Yes you can since F#5.0 see Docs
Original Question:
I would like to use negative indices on Arrays so that i can write myThings.[-2] <- "sth" to set the second last item. Is this possible?
I tried this but it fails to compile with:
Method overrides and interface implementations are not permitted here
type ``[]``<'T> with
/// Allows for negative index too (like python)
override this.Item
with get i = if i<0 then this.[this.Length+i] else this.[i]
and set i v = if i<0 then this.[this.Length+i] <- v else this.[i] <- v
I know, I could use new members like myThings.SetItemNegative(-2,"sth") but this is not as nice as using the index notation:
type ``[]``<'T> with
/// Allows for negative index too (like python)
member this.GetItemNegative (i) =
if i<0 then this.[this.Length+i] else this.[i]
member this.SetItemNegative (i,v) =
if i<0 then this.[this.Length+i] <- v else this.[i] <- v
Unfortunately existing methods in a type have priority over future extension members.
It doesn't make so much sense but that's the way currently is, you can read more about it in this issue: https://github.com/dotnet/fsharp/issues/3692#issuecomment-334297164
That's why if you define such extension it will be ignored, and what's worst: silently ignored !
Anyway there are some proposals to add something similar to negative slices to F#.
Gus explained that existing members of 'T array can not be overwritten.
A workaround is extending 'T seq.
For my F# scripts this is good enough. I am not sure if this is a good idea in general though.
open System
open System.Collections.Generic
open System.Runtime.CompilerServices
//[<assembly:Extension>] do()
/// Converts negative indices to positive ones
/// e.g.: -1 is last item .
let negIdx i len =
let ii = if i<0 then len+i else i
if ii<0 || ii >= len then failwithf "Cannot get index %d of Seq with %d items" i len
ii
let get i (xs:seq<'T>) : 'T =
match xs with
//| :? ('T[]) as xs -> xs.[negIdx i xs.Length] // covered by IList
| :? ('T IList) as xs -> xs.[negIdx i xs.Count]
| :? ('T list) as xs -> List.item (negIdx i (List.length xs)) xs
| _ -> Seq.item (negIdx i (Seq.length xs)) xs
let set i x (xs:seq<_>) :unit =
match xs with
| :? ('T[]) as xs -> xs.[negIdx i xs.Length]<- x // why not covered by IList?
| :? ('T IList) as xs -> xs.[negIdx i xs.Count] <- x
| _ -> failwithf "Cannot set items on this Seq (is it a dict, lazy or immutable ?)"
//[<Extension>]
type Collections.Generic.IEnumerable<'T> with
[<Extension>]
///Allows for negtive indices too (like Python)
member this.Item
with get i = get i this
and set i x = set i x this
///Allows for negative indices too.
///The resulting seq includes the item at slice-ending-index. like F# range expressions include the last integer e.g.: 0..5
[<Extension>]
member this.GetSlice(startIdx,endIdx) : 'T seq = // to use slicing notation e.g. : xs.[ 1 .. -2]
let count = Seq.length this
let st = match startIdx with None -> 0 | Some i -> if i<0 then count+i else i
let len = match endIdx with None -> count-st | Some i -> if i<0 then count+i-st+1 else i-st+1
if st < 0 || st > count-1 then
let err = sprintf "GetSlice: Start index %d is out of range. Allowed values are -%d up to %d for Seq of %d items" startIdx.Value count (count-1) count
raise (IndexOutOfRangeException(err))
if st+len > count then
let err = sprintf "GetSlice: End index %d is out of range. Allowed values are -%d up to %d for Seq of %d items" endIdx.Value count (count-1) count
raise (IndexOutOfRangeException(err))
if len < 0 then
let en = match endIdx with None -> count-1 | Some i -> if i<0 then count+i else i
let err = sprintf "GetSlice: Start index '%A' (= %d) is bigger than end index '%A'(= %d) for Seq of %d items" startIdx st endIdx en count
raise (IndexOutOfRangeException(err))
this |> Seq.skip st |> Seq.take len
// usage :
let modify (xs:seq<_>) =
xs.[-1] <- 0 // set last
xs.[-2] <- 0 // set second last
xs
let slice (xs:seq<_>) =
xs.[-3 .. -1] // last 3 items
modify [|0..9|]
slice [|0..9|]
You cannot extend 'T[], but wouldn't an operator taking _[] as an operand do it?
let (?) (this : _[]) i =
if i<0 then this.[this.Length+i] else this.[i]
// val ( ? ) : this:'a [] -> i:int -> 'a
let (?<-) (this : _[]) i v =
if i<0 then this.[this.Length+i] <- v else this.[i] <- v
// val ( ?<- ) : this:'a [] -> i:int -> v:'a -> unit
[|1..3|]?(-1)
// val it : int = 3
let a = [|1..3|] in a?(-1) <- 0; a
// val it : int [] = [|1; 2; 0|]
Yes you can since F# 5.0 see Docs

Finding an index of a max value of a list in F#

I'm trying to write a function that takes a list for example
let list = [5;23;29;1]
let x = max list // This will return 2 because 29 will be the max value and it's "indexed" at position 2
I'm not sure about how to go about writing the max function
Since my list will only contain four elements I currently have some code like this
let list = (1, newMap1 |> getScore) :: (2, newMap2 |> getScore) :: (3, newMap3 |> getScore) :: (4, newMap4 |> getScore) :: []
I consider this a terrible approach but I'm still stuck on how to return (x, _) after I find the max of (_, y). I'm very confident with imperative approaches but I'm stumped on how to do this functionally
There is a couple of ways to do this. At the low-level, you can write a recursive function to iterate and pattern match over a list. This is good exercise if you are learning F#.
Similarly, you can implement this using the fold function. Here, the idea is that we keep some state, consisting of the "best value" and the index of the best value. At each step, we either keep the original information, or update it:
let _, maxValue, maxIndex =
list |> List.fold (fun (index, maxSoFar, maxIndex) v ->
if v > maxSoFar then (index+1, v, index+1)
else (index+1, maxSoFar, maxIndex)) (-1, System.Int32.MinValue, -1)
Finally, the shortest option I can think of is to use mapi and maxBy functions:
list
|> Seq.mapi (fun i v -> i, v)
|> Seq.maxBy snd
Here's an answer only using pattern matching and recursion.
let list = [5;23;29;1]
let rec findIndexOfMaxValue (maxValue:int) indexOfMaxValue currentIndex aList =
match aList with
| [] -> indexOfMaxValue
| head::tail -> match head with
| head when head > maxValue -> findIndexOfMaxValue head currentIndex (currentIndex + 1) tail
| _ -> findIndexOfMaxValue maxValue indexOfMaxValue (currentIndex + 1) tail
[<EntryPoint>]
let main argv =
let indexOfMaxValue = findIndexOfMaxValue 0 0 0 list
printfn "The index of the maximum value is %A." indexOfMaxValue
//The index of the maximum value is 2.
0
Out of interest, I made a timing script comparing my algorithm with the other ones provided:
open System.Diagnostics
let n = 5000
let random = System.Random 543252
let randomlists =
[for i in [1..n] -> [ for i in [1..n] -> random.Next (0, n*n)]]
let stopWatch =
let sw = Stopwatch ()
sw.Start ()
sw
let timeIt (name : string) (a : int list -> 'T) : unit =
let t = stopWatch.ElapsedMilliseconds
let v = a (randomlists.[0])
for i = 1 to (n - 1) do
a randomlists.[i] |> ignore
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, elapsed %d ms, result %A" name d v
let rec findIndexOfMaxValue (maxValue:int) indexOfMaxValue currentIndex aList =
match aList with
| [] -> indexOfMaxValue
| head::tail -> match head with
| head when head > maxValue -> findIndexOfMaxValue head currentIndex (currentIndex + 1) tail
| _ -> findIndexOfMaxValue maxValue indexOfMaxValue (currentIndex + 1) tail
let findIndexOfMaxValueFoldAlg list =
let _, maxValue, maxIndex =
list |> List.fold (fun (index, maxSoFar, maxIndex) v ->
if v > maxSoFar then (index+1, v, index+1)
else (index+1, maxSoFar, maxIndex)) (-1, System.Int32.MinValue, -1)
maxIndex
let findIndexOfMaxValueSimpleSeq list = list
|> Seq.mapi (fun i v -> i, v)
|> Seq.maxBy snd
|> fst
let findIndexOfMaxValueSimpleList list =
list
|> List.mapi (fun i x -> i, x)
|> List.maxBy snd
|> fst
[<EntryPoint>]
let main argv =
timeIt "recursiveOnly" (findIndexOfMaxValue 0 0 0)
timeIt "simpleSeq" findIndexOfMaxValueSimpleSeq
timeIt "simpleList" findIndexOfMaxValueSimpleList
0
The results I get are:
recursiveOnly, elapsed 356ms, result 3562
foldAlgorithm, elapsed 1602ms, result 3562
simpleSeq, elapsed 4504ms, result 3562
simpleList, elapsed 4395ms, result 3562
I have these functions in my helper library:
module List =
let maxIndexBy projection list =
list
|> List.mapi (fun i x -> i, projection x)
|> List.maxBy snd
|> fst
let maxIndex list = maxIndexBy id list
Returns the index of the max element, optionally using a given projection function. You can write the same functions for the Seq and Array modules easily by replacing the "List" part and renaming the arguments.

What's wrong with this F# Code

let compareDiagonal p x y =
System.Math.Abs((int)(x - (fst p))) <> System.Math.Abs((int)(y - (snd p)));;
let isAllowed p = function
| [] -> true
| list -> List.forall (fun (x, y) -> fst p <> x && snd p <> y && (compareDiagonal p x y)) list;;
let rec solve col list =
let solCount : int = 0
match col with
| col when col < 8 ->
for row in [0 .. 7] do
solCount = solCount + if isAllowed (row, col) list then solve (col + 1) ((row, col) :: list) else 0
solCount
| _ -> 1;;
let solCount = solve 0 [];;
solCount;;
I am getting the error
solCount = solCount + if isAllowed (row, col) list then (solve (col + 1) ((row, col) :: list)) else 0
------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stdin(335,13): warning FS0020: This expression should have type 'unit', but has type 'bool'. If assigning to a property use the syntax 'obj.Prop <- expr'.
Why am I not able to return the number?
There are two related problems.
By default in F# variable are immutable. If you want a mutable variable, you have to declare it, like this:
let mutable solCount : int = 0
And then when you assign values to it instead of using = you have to use <- like this:
solCount <- solCount + if isAllowed (row, col) list then solve (col + 1) ((row, col) :: list) else 0
A complete example followed.
HOWEVER, this is not the correct functional way to do something like this. Instead of using a loop to add up values, use a recursive function to return the cumulative value as you go. Using F# the way functional programs are designed to be used will almost always yield better results, although it takes some getting used to.
Your original example with mutable, not the "functional way":
let compareDiagonal p x y =
System.Math.Abs((int)(x - (fst p))) <> System.Math.Abs((int)(y - (snd p)));;
let isAllowed p = function
| [] -> true
| list -> List.forall (fun (x, y) -> fst p <> x && snd p <> y && (compareDiagonal p x y)) list;;
let rec solve col list =
let mutable solCount : int = 0
match col with
| col when col < 8 ->
for row in [0 .. 7] do
solCount <- solCount + if isAllowed (row, col) list then solve (col + 1) ((row, col) :: list) else 0
solCount
| _ -> 1;;
let solCount = solve 0 [];;
solCount;;

Help Needed Creating a Binary Tree Given Truth Table

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.
I need to generate tree similar to the following when given a truth table
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
learnedTree is of type BinaryTree which I've defined as follows:
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table
A1 | A2 | A3 | Class
1 0 0 1
0 1 0 1
0 0 0 0
1 0 1 0
0 0 0 0
1 1 0 1
0 1 1 0
And I decide to split on attribute A1 I would end up with the following:
(A1 = 1) A1 (A1 = 0)
A2 | A3 | Class A2 | A3 | Class
0 0 1 1 0 1
0 1 0 0 0 0
1 0 1 0 0 0
0 1 1
Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
Here is what I've kind of "hacked" together thus far, but I think I might be way off:
let rec createTree (listToSplit : list<list<float>>) index =
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
if leftSideSplit.Length > 0 then
let pureCheck = isListPure leftSideSplit
if pureCheck = 0 then
printfn "%s" "Pure left node class 0"
createTree leftSideSplit (index + 1)
else if pureCheck = 1 then
printfn "%s" "Pure left node class 1"
createTree leftSideSplit (index + 1)
else
printfn "%s - %A" "Recursing Left" leftSideSplit
createTree leftSideSplit (index + 1)
else printfn "%s" "Pure left node class 0"
Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!
Edit: I've since posted an implementation of ID3 on my blog at:
http://blogs.msdn.com/chrsmith
Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.
In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.
type Attribute = string
type Value = string
type Record =
{
Weather : string
Temperature : string
PlayTennis : bool
}
override this.ToString() =
sprintf
"{Weather = %s, Temp = %s, PlayTennis = %b}"
this.Weather
this.Temperature
this.PlayTennis
type Decision = Attribute * Value
type DecisionTreeNode =
| Branch of Decision * DecisionTreeNode * DecisionTreeNode
| Leaf of Record list
// ------------------------------------
// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data =
// Just group by weather, then by temperature
let uniqueWeathers =
List.fold
(fun acc item -> Set.add item.Weather acc)
Set.empty
data
let uniqueTemperatures =
List.fold
(fun acc item -> Set.add item.Temperature acc)
Set.empty
data
if uniqueWeathers.Count = 1 then
let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
let left, right =
List.partition
(fun item -> item.Temperature = uniqueTemperatures.MinimumElement)
data
(bestSplit, left, right)
else
let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
let left, right =
List.partition
(fun item -> item.Weather = uniqueWeathers.MinimumElement)
data
(bestSplit, left, right)
let rec determineBranch data =
if List.length data < 4 then
Leaf(data)
else
// Use the entropy function to break the dataset on
// the category / value that best splits the data
let bestDecision, leftBranch, rightBranch = bestSplit data
Branch(
bestDecision,
determineBranch leftBranch,
determineBranch rightBranch)
// ------------------------------------
let rec printID3Result indent branch =
let padding = new System.String(' ', indent)
match branch with
| Leaf(data) ->
data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
| Branch(decision, lhs, rhs) ->
printfn "%sBranch predicate [%A]" padding decision
printfn "%sWhere predicate is true:" padding
printID3Result (indent + 4) lhs
printfn "%sWhere predicate is false:" padding
printID3Result (indent + 4) rhs
// ------------------------------------
let dataset =
[
{ Weather = "windy"; Temperature = "hot"; PlayTennis = false }
{ Weather = "windy"; Temperature = "cool"; PlayTennis = false }
{ Weather = "nice"; Temperature = "cool"; PlayTennis = true }
{ Weather = "nice"; Temperature = "cold"; PlayTennis = true }
{ Weather = "humid"; Temperature = "hot"; PlayTennis = false }
]
printfn "Given input list:"
dataset |> List.iter (printfn "%A")
printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result
You can use List.partition instead of your two List.choose calls.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html
(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )
It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.
And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.
Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.
Thanks Brian & Chris! I was actually able to figure this out and I ended up with the following. This calculates the information gain for determining the best place to split. I'm sure there are probably better ways for me to arrive at this solution especially around the chosen data structures, but this is a start. I plan to refine things later.
#light
open System
let trainList =
[
[1.;0.;0.;1.;];
[0.;1.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;1.;0.;];
[0.;0.;0.;0.;];
[1.;1.;0.;1.;];
[0.;1.;1.;0.;];
[1.;0.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;0.;1.;];
]
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
let entropyList nums =
let sumOfnums =
nums
|> Seq.sum
nums
|> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.))))
|> Seq.sum
let entropyBinaryList (dataListOfLists:list<list<float>>) =
let classList =
dataListOfLists
|> List.map (fun x -> x.Item(x.Length - 1))
let ListOfNo =
classList
|> List.choose (fun x -> if x = 0. then Some(x) else None)
let ListOfYes =
classList
|> List.choose (fun x -> if x = 1. then Some(x) else None)
let numberOfYes : float = float ListOfYes.Length
let numberOfNo : float = float ListOfNo.Length
let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo]
entropyList ListOfNumYesAndSumNo
let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber =
let NoAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None)
let YesAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None)
let numberOfYes : float = float YesAttributeList.Length
let numberOfNo : float = float NoAttributeList.Length
let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes))
let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes))
[noConditionalEntropy; yesConditionalEntropy]
let findBestSplitIndex(listOfInstances : list<list<float>>) =
let IGList =
[0..(listOfInstances.Item(0).Length - 2)]
|> List.mapi (fun i x -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x))))
IGList
|> List.maxBy snd
|> fst
let isListPure (listToCheck : list<list<float>>) =
let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None)
if splitList.Length = listToCheck.Length then 1
else if splitList.Length = 0 then 0
else -1
let rec createTree (listToSplit : list<list<float>>) =
let pureCheck = isListPure listToSplit
if pureCheck = 0 then
printfn "%s" "Pure - Leaf(0)"
else if pureCheck = 1 then
printfn "%s" "Pure - Leaf(1)"
else
printfn "%A - is not pure" listToSplit
if listToSplit.Length > 1 then // There are attributes we can split on
// Chose best place to split list
let splitIndex = findBestSplitIndex(listToSplit)
printfn "spliting at index %A" splitIndex
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None)
createTree leftSideSplit
createTree rightSideSplit
else
printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)"

Resources