F# solution for Store Credit - f#

I want to solve this excercise: http://code.google.com/codejam/contest/351101/dashboard#s=p0 using F#.
I am new to functional programming and F# but I like the concept and the language a lot. And I love the codejam excercise too it looks so easy but real life. Could somebody point me out a solution?
At the moment I have written this code which is just plain imperative and looks ugly from the functional perspective:
(*
C - Credit
L - Items
I - List of Integer, wher P is single integer
How does the data look like inside file
N
[...
* Money
* Items in store
...]
*)
let lines = System.IO.File.ReadAllLines("../../../../data/A-small-practice.in")
let CBounds c = c >= 5 && c <= 1000
let PBounds p = p >= 1 && p <= 1000
let entries = int(lines.[0]) - 1
let mutable index = 1 (* First index is how many entries*)
let mutable case = 1
for i = 0 to entries do
let index = (i*3) + 1
let C = int(lines.[index])
let L = int(lines.[index+1])
let I = lines.[index+2]
let items = I.Split([|' '|]) |> Array.map int
// C must be the sum of some items
// Ugly imperative way which contains duplicates
let mutable nIndex = 0
for n in items do
nIndex <- nIndex + 1
let mutable mIndex = nIndex
for m in items.[nIndex..] do
mIndex <- mIndex + 1
if n + m = C then do
printfn "Case #%A: %A %A" case nIndex mIndex
case <- case + 1
I would like to find out items which add up to C value but not in a usual imperative way - I want functional approach.

You don't specify how you would solve the problem, so it's hard to give advices.
Regarding reading inputs, you can express it as a series of transformation on Seq. High-order functions from Seq module are very handy:
let data =
"../../../../data/A-small-practice.in"
|> System.IO.File.ReadLines
|> Seq.skip 1
|> Seq.windowed 3
|> Seq.map (fun lines -> let C = int(lines.[0])
let L = int(lines.[1])
let items = lines.[2].Split([|' '|]) |> Array.map int
(C, L, items))
UPDATE:
For the rest of your example, you could use sequence expression. It is functional enough and easy to express nested computations:
let results =
seq {
for (C, _, items) in data do
for j in 1..items.Length-1 do
for i in 0..j-1 do
if items.[j] + items.[i] = C then yield (i, j)
}
Seq.iteri (fun case (i, j) -> printfn "Case #%A: %A %A" case i j) results

Related

Imperative to Functional

I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12

Is it possible to repeat an array?

I need to add 1 to each element in an array, and if it goes out of range, I need to start over.
let arr = [| 1; 2; 3 |]
for i = 0 to Array.length arr - 1 do
arr.[i] <- arr.[i] + 1
printfn "i %A" (arr.[i])
I want to add 5 points to the array, so that it iterates over the array and gives one point in each element, so the array would partially be [| 2; 3; 4 |] and iterate through the array again and end up being arr = [| 3; 4; 4 |]
Actually you can calculate exactly how much you should add to each element of array. So you can solve the problem by going through the array in only one time.
let addPoints arr points =
let len = arr |> Array.length
let added = points / len
let extraCount = points % len
arr
|> Array.mapi (fun i x ->
if i < extraCount then x + added + 1
else x + added)
addPoints [| 1; 2; 3 |] 5
|> printfn "%A" // [|3; 4; 4|]
Mutating the array or not, it's up to you.
Rather than mutating the array, a more idiomatic F# approach is to create a new array with the newly calculated results. You can use the built-in Array.map function to do apply the same transformation to each element of the array. To increment all by one, you can write:
let arr = [| 1; 2; 3 |]
arr |> Array.map (fun v -> v + 1)
If you want to restrict the maximal value to 4, you'll need to do that in the body of the function, i.e. v + 1. To make it easier to do this repeatedly, it's helpful to define a function.
let step arr =
arr |> Array.map (fun v -> min 4 (v + 1))
Here, step is a function you can call to do one step of the transformation. min 4 (v + 1) ensures that when v + 1 is more than 4, you get just 4 as the result. Now you can run step repeatedly using |>:
let arr1 = arr |> step
let arr2 = arr |> step |> step
I agree with #TomasPetricek in that the way to go should be to create new arrays using map. However, if you must mutate the array, the following loop-based approach should work just fine:
let incArrayElements n (a : _ []) =
let rec loop k i =
if k > 0 then
a.[i] <- a.[i] + 1
let ii = i + 1
if ii >= a.Length then 0 else ii
|> loop (k - 1)
if n > 0 then loop n 0
If required, this can also be easily modified to include a parameter for the starting index.

F# Parallel.ForEach invalid method overload

Creating a Parallel.ForEach expression of this form:
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
Parallel.ForEach(Partitioner.Create(low, high+1, rangesize), (fun j ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
)) |> ignore
Causes me to receive the error: No overloads match for method 'ForEach'. However I am using the Parallel.ForEach<TSource> Method (Partitioner<TSource>, Action<TSource>) and it seems right to me. Am I missing something?
Edited: I am trying to obtain the same results as the code below (that does not use a Partitioner):
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
let A = [| low .. high |]
Parallel.ForEach(A, fun (j:int) ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
) |> ignore
Are you sure that you have opened all necessary namespaces, all the values you are using (low, high and PROCS) are defined and that your code does not accidentally redefine some of the names that you're using (like Partitioner)?
I created a very simple F# script with this code and it seems to be working fine (I refactored the code to create a partitioner called p, but that does not affect the behavior):
open System.Threading.Tasks
open System.Collections.Concurrent
let PROCS = 10
let low, high = 0, 100
let p = Partitioner.Create(low, high+1, high+1-low/(PROCS*3))
Parallel.ForEach(p, (fun j ->
printfn "%A" j // Print the desired range (using %A as it is a tuple)
)) |> ignore
It is important that the value j is actually a pair of type int * int, so if the body uses it in a wrong way (e.g. as an int), you will get the error. In that case, you can add a type annotation to j and you would get a more useful error elsewhere:
Parallel.ForEach(p, (fun (j:int * int) ->
printfn "%d" j // Error here, because `j` is used as an int, but it is a pair!
)) |> ignore
This means that if you want to perform something for all j values in the original range, you need to write something like this:
Parallel.ForEach(p, (fun (loJ, hiJ) ->
for j in loJ .. hiJ - 1 do // Iterate over all js in this partition
printfn "%d" j // process the current j
)) |> ignore
Aside, I guess that the last argument to Partitioner.Create should actually be (high+1-low)/(PROCS*3) - you probably want to divide the total number of steps, not just the low value.

How to refactor F# code to not use a mutable accumulator?

The following F# code gives the correct answer to Project Euler problem #7:
let isPrime num =
let upperDivisor = int32(sqrt(float num)) // Is there a better way?
let rec evaluateModulo a =
if a = 1 then
true
else
match num % a with
| 0 -> false
| _ -> evaluateModulo (a - 1)
evaluateModulo upperDivisor
let mutable accumulator = 1 // Would like to avoid mutable values.
let mutable number = 2 // ""
while (accumulator <= 10001) do
if (isPrime number) then
accumulator <- accumulator + 1
number <- number + 1
printfn "The 10001st prime number is %i." (number - 1) // Feels kludgy.
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
I'd like to avoid the mutable values accumulator and number. I'd also like to refactor the while loop into a tail recursive function. Any tips?
Any ideas on how to remove the (number - 1) kludge which displays the result?
Any general comments about this code or suggestions on how to improve it?
Loops are nice, but its more idiomatic to abstract away loops as much as possible.
let isPrime num =
let upperDivisor = int32(sqrt(float num))
match num with
| 0 | 1 -> false
| 2 -> true
| n -> seq { 2 .. upperDivisor } |> Seq.forall (fun x -> num % x <> 0)
let primes = Seq.initInfinite id |> Seq.filter isPrime
let nthPrime n = Seq.nth n primes
printfn "The 10001st prime number is %i." (nthPrime 10001)
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
Sequences are your friend :)
You can refer my F# for Project Euler Wiki:
I got this first version:
let isPrime n =
if n=1 then false
else
let m = int(sqrt (float(n)))
let mutable p = true
for i in 2..m do
if n%i =0 then p <- false
// ~~ I want to break here!
p
let rec nextPrime n =
if isPrime n then n
else nextPrime (n+1)
let problem7 =
let mutable result = nextPrime 2
for i in 2..10001 do
result <- nextPrime (result+1)
result
In this version, although looks nicer, but I still does not early break the loop when the number is not a prime. In Seq module, exist and forall methods support early stop:
let isPrime n =
if n<=1 then false
else
let m = int(sqrt (float(n)))
{2..m} |> Seq.exists (fun i->n%i=0) |> not
// or equivalently :
// {2..m} |> Seq.forall (fun i->n%i<>0)
Notice in this version of isPrime, the function is finally mathematically correct by checking numbers below 2.
Or you can use a tail recursive function to do the while loop:
let isPrime n =
let m = int(sqrt (float(n)))
let rec loop i =
if i>m then true
else
if n%i = 0 then false
else loop (i+1)
loop 2
A more functional version of problem7 is to use Seq.unfold to generate an infinite prime sequence and take nth element of this sequence:
let problem7b =
let primes =
2 |> Seq.unfold (fun p ->
let next = nextPrime (p+1) in
Some( p, next ) )
Seq.nth 10000 primes
Here's my solution, which uses a tail-recursive loop pattern which always allows you to avoid mutables and gain break functionality: http://projecteulerfun.blogspot.com/2010/05/problem-7-what-is-10001st-prime-number.html
let problem7a =
let isPrime n =
let nsqrt = n |> float |> sqrt |> int
let rec isPrime i =
if i > nsqrt then true //break
elif n % i = 0 then false //break
//loop while neither of the above two conditions are true
//pass your state (i+1) to the next call
else isPrime (i+1)
isPrime 2
let nthPrime n =
let rec nthPrime i p count =
if count = n then p //break
//loop while above condition not met
//pass new values in for p and count, emulating state
elif i |> isPrime then nthPrime (i+2) i (count+1)
else nthPrime (i+2) p count
nthPrime 1 1 0
nthPrime 10001
Now, to specifically address some of the questions you had in your solution.
The above nthPrime function allows you to find primes at an arbitrary position, this is how it would look adapted to your approach of finding specifically the 1001 prime, and using your variable names (the solution is tail-recursive and doesn't use mutables):
let prime1001 =
let rec nthPrime i number accumulator =
if accumulator = 1001 then number
//i is prime, so number becomes i in our next call and accumulator is incremented
elif i |> isPrime then prime1001 (i+2) i (accumulator+1)
//i is not prime, so number and accumulator do not change, just advance i to the next odd
else prime1001 (i+2) number accumulator
prime1001 1 1 0
Yes, there is a better way to do square roots: write your own generic square root implementation (reference this and this post for G implementation):
///Finds the square root (integral or floating point) of n
///Does not work with BigRational
let inline sqrt_of (g:G<'a>) n =
if g.zero = n then g.zero
else
let mutable s:'a = (n / g.two) + g.one
let mutable t:'a = (s + (n / s)) / g.two
while t < s do
s <- t
let step1:'a = n/s
let step2:'a = s + step1
t <- step2 / g.two
s
let inline sqrtG n = sqrt_of (G_of n) n
let sqrtn = sqrt_of gn //this has suffix "n" because sqrt is not strictly integral type
let sqrtL = sqrt_of gL
let sqrtI = sqrt_of gI
let sqrtF = sqrt_of gF
let sqrtM = sqrt_of gM

Help Needed Creating a Binary Tree Given Truth Table

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.
I need to generate tree similar to the following when given a truth table
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
learnedTree is of type BinaryTree which I've defined as follows:
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table
A1 | A2 | A3 | Class
1 0 0 1
0 1 0 1
0 0 0 0
1 0 1 0
0 0 0 0
1 1 0 1
0 1 1 0
And I decide to split on attribute A1 I would end up with the following:
(A1 = 1) A1 (A1 = 0)
A2 | A3 | Class A2 | A3 | Class
0 0 1 1 0 1
0 1 0 0 0 0
1 0 1 0 0 0
0 1 1
Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
Here is what I've kind of "hacked" together thus far, but I think I might be way off:
let rec createTree (listToSplit : list<list<float>>) index =
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
if leftSideSplit.Length > 0 then
let pureCheck = isListPure leftSideSplit
if pureCheck = 0 then
printfn "%s" "Pure left node class 0"
createTree leftSideSplit (index + 1)
else if pureCheck = 1 then
printfn "%s" "Pure left node class 1"
createTree leftSideSplit (index + 1)
else
printfn "%s - %A" "Recursing Left" leftSideSplit
createTree leftSideSplit (index + 1)
else printfn "%s" "Pure left node class 0"
Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!
Edit: I've since posted an implementation of ID3 on my blog at:
http://blogs.msdn.com/chrsmith
Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.
In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.
type Attribute = string
type Value = string
type Record =
{
Weather : string
Temperature : string
PlayTennis : bool
}
override this.ToString() =
sprintf
"{Weather = %s, Temp = %s, PlayTennis = %b}"
this.Weather
this.Temperature
this.PlayTennis
type Decision = Attribute * Value
type DecisionTreeNode =
| Branch of Decision * DecisionTreeNode * DecisionTreeNode
| Leaf of Record list
// ------------------------------------
// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data =
// Just group by weather, then by temperature
let uniqueWeathers =
List.fold
(fun acc item -> Set.add item.Weather acc)
Set.empty
data
let uniqueTemperatures =
List.fold
(fun acc item -> Set.add item.Temperature acc)
Set.empty
data
if uniqueWeathers.Count = 1 then
let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
let left, right =
List.partition
(fun item -> item.Temperature = uniqueTemperatures.MinimumElement)
data
(bestSplit, left, right)
else
let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
let left, right =
List.partition
(fun item -> item.Weather = uniqueWeathers.MinimumElement)
data
(bestSplit, left, right)
let rec determineBranch data =
if List.length data < 4 then
Leaf(data)
else
// Use the entropy function to break the dataset on
// the category / value that best splits the data
let bestDecision, leftBranch, rightBranch = bestSplit data
Branch(
bestDecision,
determineBranch leftBranch,
determineBranch rightBranch)
// ------------------------------------
let rec printID3Result indent branch =
let padding = new System.String(' ', indent)
match branch with
| Leaf(data) ->
data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
| Branch(decision, lhs, rhs) ->
printfn "%sBranch predicate [%A]" padding decision
printfn "%sWhere predicate is true:" padding
printID3Result (indent + 4) lhs
printfn "%sWhere predicate is false:" padding
printID3Result (indent + 4) rhs
// ------------------------------------
let dataset =
[
{ Weather = "windy"; Temperature = "hot"; PlayTennis = false }
{ Weather = "windy"; Temperature = "cool"; PlayTennis = false }
{ Weather = "nice"; Temperature = "cool"; PlayTennis = true }
{ Weather = "nice"; Temperature = "cold"; PlayTennis = true }
{ Weather = "humid"; Temperature = "hot"; PlayTennis = false }
]
printfn "Given input list:"
dataset |> List.iter (printfn "%A")
printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result
You can use List.partition instead of your two List.choose calls.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html
(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )
It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.
And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.
Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.
Thanks Brian & Chris! I was actually able to figure this out and I ended up with the following. This calculates the information gain for determining the best place to split. I'm sure there are probably better ways for me to arrive at this solution especially around the chosen data structures, but this is a start. I plan to refine things later.
#light
open System
let trainList =
[
[1.;0.;0.;1.;];
[0.;1.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;1.;0.;];
[0.;0.;0.;0.;];
[1.;1.;0.;1.;];
[0.;1.;1.;0.;];
[1.;0.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;0.;1.;];
]
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
let entropyList nums =
let sumOfnums =
nums
|> Seq.sum
nums
|> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.))))
|> Seq.sum
let entropyBinaryList (dataListOfLists:list<list<float>>) =
let classList =
dataListOfLists
|> List.map (fun x -> x.Item(x.Length - 1))
let ListOfNo =
classList
|> List.choose (fun x -> if x = 0. then Some(x) else None)
let ListOfYes =
classList
|> List.choose (fun x -> if x = 1. then Some(x) else None)
let numberOfYes : float = float ListOfYes.Length
let numberOfNo : float = float ListOfNo.Length
let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo]
entropyList ListOfNumYesAndSumNo
let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber =
let NoAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None)
let YesAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None)
let numberOfYes : float = float YesAttributeList.Length
let numberOfNo : float = float NoAttributeList.Length
let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes))
let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes))
[noConditionalEntropy; yesConditionalEntropy]
let findBestSplitIndex(listOfInstances : list<list<float>>) =
let IGList =
[0..(listOfInstances.Item(0).Length - 2)]
|> List.mapi (fun i x -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x))))
IGList
|> List.maxBy snd
|> fst
let isListPure (listToCheck : list<list<float>>) =
let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None)
if splitList.Length = listToCheck.Length then 1
else if splitList.Length = 0 then 0
else -1
let rec createTree (listToSplit : list<list<float>>) =
let pureCheck = isListPure listToSplit
if pureCheck = 0 then
printfn "%s" "Pure - Leaf(0)"
else if pureCheck = 1 then
printfn "%s" "Pure - Leaf(1)"
else
printfn "%A - is not pure" listToSplit
if listToSplit.Length > 1 then // There are attributes we can split on
// Chose best place to split list
let splitIndex = findBestSplitIndex(listToSplit)
printfn "spliting at index %A" splitIndex
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None)
createTree leftSideSplit
createTree rightSideSplit
else
printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)"

Resources