How do I reduce code duplication with nested 'if' statements? - f#

let's consider this code:
let getBuildDate (assembly: Assembly) : DateTime option =
let buildVersionMetadataPrefix = "+build"
let attribute = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
if attribute <> null && attribute.InformationalVersion <> null then
let value = attribute.InformationalVersion
let index = value.IndexOf(buildVersionMetadataPrefix)
if index > 0 then
let value = value.Substring(index + buildVersionMetadataPrefix.Length)
let success, timestamp = DateTime.TryParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None)
if success then
Some timestamp
else
None
else
None
else
None
Is there a way to get rid of all the 'else None' statements to have only one?
On one side, I can imagine that for some people the code is more clear with all the None statements spelled out, but on the other side, coming from the C world, I see it as clutter that reduces readability.
There are many cases where you need a series of conditions to be met and all the failed cases go to one place.
If I have a list of conditions that depend on each others' success, how can I make a concise short exit without duplication.

Another approach might be to use the Option functions - each of these steps will effectively short circuit if the input from the previous step is None.
let getBuildDate (assembly: Assembly) : DateTime option =
let tryDate value =
match DateTime.TryParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None) with
| true, date -> Some date
| false, _ -> None
let buildVersionMetadataPrefix = "+build"
let attribute = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
Option.ofObj attribute
|> Option.bind (fun attr -> Option.ofObj attr.InformationalVersion)
|> Option.map (fun infVer -> infVer, infVer.IndexOf buildVersionMetadataPrefix)
|> Option.filter (fun (_, index) -> index > 0)
|> Option.map (fun (infVer, index) -> infVer.Substring(index + buildVersionMetadataPrefix.Length))
|> Option.bind tryDate
Whether this is 'better' is arguable - and definitely a matter of opinion!

The other answers show how to do this using more sophisticated functional programming methods, like using computation expressions or option values. Those are definitely useful and make sense if this is something that you are doing in many places throughout your system.
However, if you just want a simple way to change the code so that the control flow is more clear (without making it more clever), I would negate the conditions. Previously, you had:
if something then
moreStuff()
Some result
else
None
You can rewrite this by returning None if not something. I think the F# coding convention in this case also allows you to remove the indentation, so it looks more like imperative early return:
if not something then None else
moreStuff()
Some result
With this, you can write your original function as follows - without any extra clever tricks:
let getBuildDate (assembly: Assembly) : DateTime option =
let buildVersionMetadataPrefix = "+build"
let attribute = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
if attribute = null || attribute.InformationalVersion = null then None else
let value = attribute.InformationalVersion
let index = value.IndexOf(buildVersionMetadataPrefix)
if index <= 0 then None else
let value = value.Substring(index + buildVersionMetadataPrefix.Length)
let success, timestamp = DateTime.TryParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None)
if not success then None else
Some timestamp

A readable approach might be use a computation expression builder for Option.
type OptionBuilder() =
member _.Return v = Some v
member _.Zero () = None
member _.Bind(v, f) = Option.bind f v
member _.ReturnFrom o = o
let opt = OptionBuilder()
You can simulate an imperative style of if-then-return.
let condition num = num % 2 = 0
let result = opt {
if condition 2 then
if condition 4 then
if condition 6 then
return 10
}
Rewriting your example:
let getBuildDate (assembly: Assembly) : DateTime option = opt {
let buildVersionMetadataPrefix = "+build"
let attribute = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
if attribute <> null && attribute.InformationalVersion <> null then
let value = attribute.InformationalVersion
let index = value.IndexOf(buildVersionMetadataPrefix)
if index > 0 then
let value = value.Substring(index + buildVersionMetadataPrefix.Length)
let success, timestamp = DateTime.TryParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None)
if success then
return timestamp
}
No more None.

open System
open System.Reflection
open System.Globalization
let inline guard cond next = if cond then next () else None
let getBuildDate (assembly: Assembly) : DateTime option =
let buildVersionMetadataPrefix = "+build"
let attribute = assembly.GetCustomAttribute<AssemblyInformationalVersionAttribute>()
guard (attribute <> null && attribute.InformationalVersion <> null) <| fun _ ->
let value = attribute.InformationalVersion
let index = value.IndexOf(buildVersionMetadataPrefix)
guard (index > 0) <| fun _ ->
let value = value.Substring(index + buildVersionMetadataPrefix.Length)
let success, timestamp = DateTime.TryParseExact(value, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None)
guard success <| fun _ ->
Some timestamp
If you can stomach the inelegance of having to write <| fun _ -> on every guard, this is an option worth considering.

Have you considered using Result<TSuccess, TError>. It is very structuring - making the code rigid and flat - and makes it possible to provide detailed error information for the step that possible fails. It's a little more code, but IMO more readable and maintainable:
let getBuildDate (assembly: Assembly) : Result<DateTime, string> =
let buildVersionMetadataPrefix = "+build"
let extractAttribute (assem: Assembly) =
match assem.GetCustomAttribute<AssemblyInformationalVersionAttribute>() with
| attrib when attrib <> null -> Ok attrib
| _ -> Error "No attribute found"
let extractDateString (attrib: AssemblyInformationalVersionAttribute) =
match attrib.InformationalVersion.IndexOf (buildVersionMetadataPrefix) with
| x when x > 0 -> Ok (attrib.InformationalVersion.Substring (x + buildVersionMetadataPrefix.Length))
| _ -> Error "Metadata prefix not found"
let toDateTime dateString =
match DateTime.TryParseExact(dateString, "yyyyMMddHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None) with
| true, timeStamp -> Ok timeStamp
| false, _ -> Error "Invalid date time format"
extractAttribute assembly
|> Result.bind extractDateString
|> Result.bind toDateTime
Usage
let optBuildDate = getBuildDate (Assembly.GetExecutingAssembly())
match optBuildDate with
| Ok date -> printfn "%A" date
| Error msg -> printfn "ERROR: %s" msg

There is an approach that I really love which is the use of an array in certain scenarios.
Example:
Instead of using something like:
if (grade >= 90) {
scale = "A";
} else if (grade >= 80) {
scale = "B";
} else if (grade >= 70) {
scale = "C";
} else if (grade >= 60) {
scale = "D";
} else {
scale = "F";
}
Use an array like:
function calculate(scores) {
var grade, scale;
let sum = 0;
for (let i = 0; i < scores.length; i++) {
sum += scores[i];
}
grade = sum / scores.length;
scale = {
[90 <= grade && grade <= 100]: "O",
[80 <= grade && grade < 90]: "E",
[70 <= grade && grade < 80]: "A",
[55 <= grade && grade < 70]: "P",
[40 <= grade && grade < 55]: "D",
[grade < 40]: "T"
};
console.log(scale.true);
}
In python could be like:
def calculate(scores: list) -> str:
grade = sum(scores) / len(scores)
print(grade)
scale = {90 <= grade <= 100: "O", 80 <=
grade < 90: "E", 70 <= grade < 80: "A",
55 <= grade < 70: "P", 40 <= grade < 55: "D",
grade < 40: "T"}
return scale.get(True)

Related

How can I add to a type that contains an integer inside a Result of a discriminated union?

I'm working on a single case discriminated union with a module to create instances of the type and return a Result of either Ok if the input was valid or Error otherwise. Here is what I have so far.
type ErrorMessage = string
type NonNegativeInt = private NonNegativeInt of int
module NonNegativeInt =
let create (inputInt:int) : Result<NonNegativeInt, ErrorMessage> =
if inputInt >= 0 then
Ok (NonNegativeInt inputInt)
else
Error ("inputInt must be >= 0")
let value (NonNegativeInt intVal) = intVal
I would like to add an integer to an instance of this type using the create function so it will block negatives. I've got the first test working this way.
[<Fact>]
member this.``NonNegativeInt test`` () =
let nonNegativeResult = NonNegativeInt.create 5
let newNonNegativeResult = match nonNegativeResult with
| Ok result ->
let intVal = NonNegativeInt.value result
let newIntVal = intVal + 1
NonNegativeInt.create newIntVal
| Error _ ->
nonNegativeResult
match newNonNegativeResult with
| Ok result ->
Assert.Equal(6, NonNegativeInt.value result)
| Error _ ->
Assert.Fail("Error creating new NonNegativeInt")
This is pretty much unusable this way. Is there a more concise way to accomplish this task without all the unwrapping, wrapping, and pattern matching? Is Result.bind the way to go?
Update 1 Trying Result.bind
This is a better, but still feels a bit clumsy. Maybe the NonNegativeInt module needs another function besides create and value to make this easier.
[<Fact>]
member this.``NonNegativeInt test2`` () =
let nni1 = NonNegativeInt.create 5
let nni2 = nni1
|> Result.bind (fun x -> NonNegativeInt.create ((NonNegativeInt.value x) + 1))
let expectedResult = NonNegativeInt.create 6
Assert.Equal(expectedResult, nni2)
Suggestion 1
You could use a computation builder to make the code cleaner:
type ResultBuilder() =
member _.Return(x) = Ok x
member _.ReturnFrom(res : Result<_, _>) = res
member _.Bind(res, f) = Result.bind f res
let result = ResultBuilder()
Then your example becomes:
let test () =
result {
let! nni = NonNegativeInt.create 5
return! NonNegativeInt.create (nni.Value + 1)
}
test () |> printfn "%A" // Ok NonNegativeInt 6
I also added a member to make it easier to access a NonNegativeInteger's value:
type NonNegativeInt =
private NonNegativeInt of int
with
member this.Value =
let (NonNegativeInt n) = this in n
Suggestion 2
Having an NNI type and then wrapping it in a Result is like wearing both a belt and suspenders. To simplify things further, you could get rid of the NNI type entirely, and just keep the validation logic:
module NonNegativeInt =
let create (inputInt:int) : Result<int, ErrorMessage> =
if inputInt >= 0 then
Ok inputInt
else
Error ("inputInt must be >= 0")
let test () =
result {
let! n = NonNegativeInt.create 5
return! NonNegativeInt.create (n + 1)
}
test () |> printfn "%A" // Ok 6
Suggestion 3
Alternatively, you could keep the NNI type and trust the caller to use it with valid values (without wrapping in a Result). This is what FsCheck does, for example:
///Represents an int >= 0
type NonNegativeInt = NonNegativeInt of int with
member x.Get = match x with NonNegativeInt r -> r
override x.ToString() = x.Get.ToString()
static member op_Explicit(NonNegativeInt i) = i

Why does this function using IndexOf always return 0?

I have this function that checks the number of occurrences of a pattern in a string. The problem is that is keeps returning 0 no matter the input. The most frustrating part is that it worked 2 min ago and I did not change anything.
let Counter (text : string) (pattern : string) =
let mutable count = 0
let mutable i = 0
while ((i = text.IndexOf(pattern, i)) <> false) do
i <- i + pattern.Length
count <- count + 1
count
The main problem is that it looks like you're trying to assign a new value to i inside the test in the while loop, but the = operator tests equality and does not perform assignment. The <- assignment operator has return type unit (it does not return the assigned value), so the fix can't be as simple as replacing the call to = with a call to <-.
The most straightforward fix is probably to break that test out into a separate inner function:
let counter (text : string) (pattern : string) =
let mutable i = 0
let moveNext() =
i <- text.IndexOf(pattern, i)
i
let mutable count = 0
while (moveNext() >= 0) do
i <- i + pattern.Length
count <- count + 1
count
However, note that this is not idiomatic F# code. Instead, I'd write it like this:
let counter (text : string) (pattern : string) =
let rec countFrom (i:int) total =
match text.IndexOf(pattern, i) with
| j when j >= 0 -> countFrom (j+pattern.Length) (total+1)
| _ -> total
countFrom 0 0

How to execute a function, that creates a lot of objects, in parallel?

I am using Array.Parallel.map on a function but find that it is not executing at anywhere near full processor capacity. I am assuming this is because the function creates a lot of objects when running List.map and List.map2. Would this be causing a synchronization issue and is there a more appropriate way of doing this? At the moment the only way I can think of getting around this is by running each process as a separate executable using something like xargs under Linux.
I put together the script below to demonstrate the problem. It is a very basic data categorizer which relies on a field having a certain value as a rule to determine if this will predict a category:
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: list<int> }
let InitAssessment categorizeFields rules =
let ruleAssessments = List.init (List.length rules) (fun x -> 0)
List.map (fun categorizeField ->
let fieldIndex, categoryValue = categorizeField
{ CategoryAssessment.fieldIndex = fieldIndex;
value = categoryValue;
ruleAssessments = ruleAssessments })
categorizeFields
let AssessCategory ruleMatches (row : int[]) categoryAssessment =
let fieldIndex = categoryAssessment.fieldIndex
let categoryValue = categoryAssessment.value
let categoryMatch = categoryValue = row.[fieldIndex]
let newRuleAssessments =
List.map2 (fun ruleAssessment ruleMatch ->
if ruleMatch = categoryMatch then
ruleAssessment + 1
else
ruleAssessment)
categoryAssessment.ruleAssessments
ruleMatches
{ categoryAssessment with ruleAssessments = newRuleAssessments }
let MatchRule (row : int[]) rule =
let fieldIndex, eqVal = rule
row.[fieldIndex] = eqVal
let Assess categorizeFields rules input =
printfn "START - Assess"
let d =
Array.fold (fun categoryAssessment row ->
let ruleMatches = List.map (MatchRule row) rules
List.map (AssessCategory ruleMatches row) categoryAssessment)
(InitAssessment categorizeFields rules)
input
printfn "END - Assess"
d
let JoinAssessments assessments =
let numAssessments = Array.length assessments
Array.fold (fun accAssessment assessment ->
List.map2 (fun accCategory category ->
let newRuleAssessments =
List.map2 (+)
accCategory.ruleAssessments
category.ruleAssessments
{ accCategory with
ruleAssessments = newRuleAssessments })
accAssessment
assessment)
assessments.[0]
assessments.[1..(numAssessments-1)]
let numRecords = 10000
let numFields = 20
let numSplits = 10
let numRules = 10000
let inputs = Array.create numSplits
[| for i in 1 .. (numRecords / numSplits) ->
[| for j in 1 .. numFields ->
(i % 10) + j |] |]
let categorizeFields = [ (1, 6); (2, 3); (2, 4); (3, 2) ]
let rules = [ for i in 1 .. numRules -> (i % numFields, i) ]
let assessments =
Array.Parallel.map (Assess categorizeFields rules) inputs
|> JoinAssessments
printfn "Assessments: %A" assessments
0
After a fair bit of investigation, the ultimate answer to my question seems to be to find a way of not creating lots of objects. The easiest change to do this is moving to using arrays instead of lists. I have written up my findings more fully in an article: Beware of Immutable Lists for F# Parallel Processing.
The above program when altered as follows, runs better between threads and runs much quicker even on a single thread. Further improvements can be made by making the ruleAssessments field mutable as demonstrated in the referenced article.
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: int[] }
let InitAssessment categorizeFields rules =
let ruleAssessments = Array.create (Array.length rules) 0
Array.map (fun categorizeField ->
let fieldIndex, categoryValue = categorizeField
{ CategoryAssessment.fieldIndex = fieldIndex;
value = categoryValue;
ruleAssessments = ruleAssessments })
categorizeFields
let AssessCategory ruleMatches (row : int[]) categoryAssessment =
let fieldIndex = categoryAssessment.fieldIndex
let categoryValue = categoryAssessment.value
let categoryMatch = categoryValue = row.[fieldIndex]
let newRuleAssessments =
Array.map2 (fun ruleAssessment ruleMatch ->
if ruleMatch = categoryMatch then
ruleAssessment + 1
else
ruleAssessment)
categoryAssessment.ruleAssessments
ruleMatches
{ categoryAssessment with ruleAssessments = newRuleAssessments }
let MatchRule (row : int[]) rule =
let fieldIndex, eqVal = rule
row.[fieldIndex] = eqVal
let Assess categorizeFields rules input =
printfn "START - Assess"
let d =
Array.fold (fun categoryAssessment row ->
let ruleMatches = Array.map (MatchRule row) rules
Array.map (AssessCategory ruleMatches row) categoryAssessment)
(InitAssessment categorizeFields rules)
input
printfn "END - Assess"
d
let JoinAssessments assessments =
let numAssessments = Array.length assessments
Array.fold (fun accAssessment assessment ->
Array.map2 (fun accCategory category ->
let newRuleAssessments =
Array.map2 (+)
accCategory.ruleAssessments
category.ruleAssessments
{ accCategory with
ruleAssessments = newRuleAssessments })
accAssessment
assessment)
assessments.[0]
assessments.[1..(numAssessments-1)]
let numRecords = 10000
let numFields = 20
let numSplits = 10
let numRules = 10000
let inputs = Array.create numSplits
[| for i in 1 .. (numRecords / numSplits) ->
[| for j in 1 .. numFields ->
(i % 10) + j |] |]
let categorizeFields = [| (1, 6); (2, 3); (2, 4); (3, 2) |]
let rules = [| for i in 1 .. numRules -> (i % numFields, i) |]
let assessments =
Array.Parallel.map (Assess categorizeFields rules) inputs
|> JoinAssessments
printfn "Assessments: %A" assessments
0
This is a version of your program that doesn't require mutability and uses nearly all of the 4 cpus on my iMac.
To pull it off, it's driven by assessing each rule in parallel, not by processing records. That also required the input array to be transposed making it be fields by records.
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: list<int> }
let MatchRule rVal fVal =
rVal = fVal
let AssessRule cMatches (inputs:int[][]) (rIndex, rVal) =
// printfn "START - Assess" // uses more cpu than the code itself
let matches = inputs.[rIndex] |>
Array.map2 (fun cVal fVal -> (MatchRule rVal fVal) = cVal) cMatches
let assessment = matches |>
Array.map ( fun v -> if v then 1 else 0 ) |>
Array.sum
// printfn "END - Assess"
assessment
let Assess categorizeFields rules (inputs:int[][]) =
categorizeFields |> List.map (fun (catIndex, catValue) ->
let catMatches = inputs.[catIndex] |> Array.map( fun v -> v = catValue )
let assessments = rules |> Array.Parallel.map
(AssessRule catMatches inputs)
|> Array.toList
{ CategoryAssessment.fieldIndex = catIndex;
value = catValue;
ruleAssessments = assessments }
)
let numRecords = 10000
let numFields = 20
let numRules = 10000
let inputs = [| for j in 1 .. numFields ->
[| for i in 1 .. numRecords -> (i % 10) + j |] |]
let categorizeFields = [ (1, 6); (2, 3); (2, 4); (3, 2) ]
let rules = [| for i in 1 .. numRules -> (i % numFields, i) |]
let assessments = Assess categorizeFields rules inputs
printfn "Assessments: %A" assessments
Assessing by rule allowed the summing of a single integer across all records for a given rule, avoiding mutable state and extra memory allocations.
I used a lot of array iteration to get the speed up but didn't remove all the lists.
I fear I changed the functionality while refactoring or made assumptions that can't be applied to your actual problem, however I do hope it's a useful example.

Is if .. else .. an idiomatic way of writing things in F#?

What would be an F# idiomatic way of writing the following ? Or would you leave this as is ?
let input = 5
let result =
if input > 0 && input < 5 then
let a = CalculateA(input)
let b = CalculateB(input)
(a+b)/2
else
CalculateC(input)
For one if ... then ... else ... I'd probably leave it like that, if you had more cases I'd either use pattern match with a when guard:
let result =
match input with
| _ when input > 0 && input < 5 -> ...
| _ -> ...
or you might also want to look at active patterns: http://msdn.microsoft.com/en-us/library/dd233248.aspx
What would be an F# idiomatic way of writing the following ? Or would you leave this as is ?
There's nothing wrong with the way you've written it but here is another alternative (inspired by Huusom):
let input = 5
let result =
if input>0 && input<5 then [A; B] else [C]
|> Seq.averageBy (fun f -> f input)
This is minor stylistic change but I find this more readable:
let input = 5
let result =
if input > 0 && input < 5 then
(calculateA input + calculateB input) / 2
else
calculateC input
This is not really an answer because Robert is correct. But it looks like you are working with series of functions, so you could write it like this:
let Calculate input =
let calc = function | [f] -> f input | fl -> fl |> List.map ((|>) input) |> List.sum |> (fun s -> s / fl.Length)
if input > 0 && input < 5
then calc [CalculateA; CalculateB]
else calc [CalculateC]
You could decompose to something with this signature: ((int -> int) list) -> ((int -> int) list) -> (int -> bool) -> int -> int and then build your function by applying the first 3 parameters.

Help Needed Creating a Binary Tree Given Truth Table

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.
I need to generate tree similar to the following when given a truth table
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
learnedTree is of type BinaryTree which I've defined as follows:
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table
A1 | A2 | A3 | Class
1 0 0 1
0 1 0 1
0 0 0 0
1 0 1 0
0 0 0 0
1 1 0 1
0 1 1 0
And I decide to split on attribute A1 I would end up with the following:
(A1 = 1) A1 (A1 = 0)
A2 | A3 | Class A2 | A3 | Class
0 0 1 1 0 1
0 1 0 0 0 0
1 0 1 0 0 0
0 1 1
Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.
let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))
Here is what I've kind of "hacked" together thus far, but I think I might be way off:
let rec createTree (listToSplit : list<list<float>>) index =
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
if leftSideSplit.Length > 0 then
let pureCheck = isListPure leftSideSplit
if pureCheck = 0 then
printfn "%s" "Pure left node class 0"
createTree leftSideSplit (index + 1)
else if pureCheck = 1 then
printfn "%s" "Pure left node class 1"
createTree leftSideSplit (index + 1)
else
printfn "%s - %A" "Recursing Left" leftSideSplit
createTree leftSideSplit (index + 1)
else printfn "%s" "Pure left node class 0"
Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!
Edit: I've since posted an implementation of ID3 on my blog at:
http://blogs.msdn.com/chrsmith
Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.
In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.
type Attribute = string
type Value = string
type Record =
{
Weather : string
Temperature : string
PlayTennis : bool
}
override this.ToString() =
sprintf
"{Weather = %s, Temp = %s, PlayTennis = %b}"
this.Weather
this.Temperature
this.PlayTennis
type Decision = Attribute * Value
type DecisionTreeNode =
| Branch of Decision * DecisionTreeNode * DecisionTreeNode
| Leaf of Record list
// ------------------------------------
// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data =
// Just group by weather, then by temperature
let uniqueWeathers =
List.fold
(fun acc item -> Set.add item.Weather acc)
Set.empty
data
let uniqueTemperatures =
List.fold
(fun acc item -> Set.add item.Temperature acc)
Set.empty
data
if uniqueWeathers.Count = 1 then
let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
let left, right =
List.partition
(fun item -> item.Temperature = uniqueTemperatures.MinimumElement)
data
(bestSplit, left, right)
else
let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
let left, right =
List.partition
(fun item -> item.Weather = uniqueWeathers.MinimumElement)
data
(bestSplit, left, right)
let rec determineBranch data =
if List.length data < 4 then
Leaf(data)
else
// Use the entropy function to break the dataset on
// the category / value that best splits the data
let bestDecision, leftBranch, rightBranch = bestSplit data
Branch(
bestDecision,
determineBranch leftBranch,
determineBranch rightBranch)
// ------------------------------------
let rec printID3Result indent branch =
let padding = new System.String(' ', indent)
match branch with
| Leaf(data) ->
data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
| Branch(decision, lhs, rhs) ->
printfn "%sBranch predicate [%A]" padding decision
printfn "%sWhere predicate is true:" padding
printID3Result (indent + 4) lhs
printfn "%sWhere predicate is false:" padding
printID3Result (indent + 4) rhs
// ------------------------------------
let dataset =
[
{ Weather = "windy"; Temperature = "hot"; PlayTennis = false }
{ Weather = "windy"; Temperature = "cool"; PlayTennis = false }
{ Weather = "nice"; Temperature = "cool"; PlayTennis = true }
{ Weather = "nice"; Temperature = "cold"; PlayTennis = true }
{ Weather = "humid"; Temperature = "hot"; PlayTennis = false }
]
printfn "Given input list:"
dataset |> List.iter (printfn "%A")
printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result
You can use List.partition instead of your two List.choose calls.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html
(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )
It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.
And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.
Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.
Thanks Brian & Chris! I was actually able to figure this out and I ended up with the following. This calculates the information gain for determining the best place to split. I'm sure there are probably better ways for me to arrive at this solution especially around the chosen data structures, but this is a start. I plan to refine things later.
#light
open System
let trainList =
[
[1.;0.;0.;1.;];
[0.;1.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;1.;0.;];
[0.;0.;0.;0.;];
[1.;1.;0.;1.;];
[0.;1.;1.;0.;];
[1.;0.;0.;1.;];
[0.;0.;0.;0.;];
[1.;0.;0.;1.;];
]
type BinaryTree =
| Leaf of int
| Node of int * string * BinaryTree * BinaryTree
let entropyList nums =
let sumOfnums =
nums
|> Seq.sum
nums
|> Seq.map (fun x -> if x=0.00 then x else (-((x/sumOfnums) * Math.Log(x/sumOfnums, 2.))))
|> Seq.sum
let entropyBinaryList (dataListOfLists:list<list<float>>) =
let classList =
dataListOfLists
|> List.map (fun x -> x.Item(x.Length - 1))
let ListOfNo =
classList
|> List.choose (fun x -> if x = 0. then Some(x) else None)
let ListOfYes =
classList
|> List.choose (fun x -> if x = 1. then Some(x) else None)
let numberOfYes : float = float ListOfYes.Length
let numberOfNo : float = float ListOfNo.Length
let ListOfNumYesAndSumNo = [numberOfYes; numberOfNo]
entropyList ListOfNumYesAndSumNo
let conditionalEntropy (dataListOfLists:list<list<float>>) attributeNumber =
let NoAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 0. then Some(x) else None)
let YesAttributeList =
dataListOfLists
|> List.choose (fun x -> if x.Item(attributeNumber) = 1. then Some(x) else None)
let numberOfYes : float = float YesAttributeList.Length
let numberOfNo : float = float NoAttributeList.Length
let noConditionalEntropy = (entropyBinaryList NoAttributeList) * (numberOfNo/(numberOfNo + numberOfYes))
let yesConditionalEntropy = (entropyBinaryList YesAttributeList) * (numberOfYes/(numberOfNo + numberOfYes))
[noConditionalEntropy; yesConditionalEntropy]
let findBestSplitIndex(listOfInstances : list<list<float>>) =
let IGList =
[0..(listOfInstances.Item(0).Length - 2)]
|> List.mapi (fun i x -> (i, (entropyBinaryList listOfInstances) - (List.sum (conditionalEntropy listOfInstances x))))
IGList
|> List.maxBy snd
|> fst
let isListPure (listToCheck : list<list<float>>) =
let splitList = listToCheck |> List.choose (fun x -> if x.Item(x.Length - 1) = 1. then Some(x) else None)
if splitList.Length = listToCheck.Length then 1
else if splitList.Length = 0 then 0
else -1
let rec createTree (listToSplit : list<list<float>>) =
let pureCheck = isListPure listToSplit
if pureCheck = 0 then
printfn "%s" "Pure - Leaf(0)"
else if pureCheck = 1 then
printfn "%s" "Pure - Leaf(1)"
else
printfn "%A - is not pure" listToSplit
if listToSplit.Length > 1 then // There are attributes we can split on
// Chose best place to split list
let splitIndex = findBestSplitIndex(listToSplit)
printfn "spliting at index %A" splitIndex
let leftSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 1. then Some(x) else None)
let rightSideSplit =
listToSplit |> List.choose (fun x -> if x.Item(splitIndex) = 0. then Some(x) else None)
createTree leftSideSplit
createTree rightSideSplit
else
printfn "%s" "Not Pure, but can't split choose based on heuristics - Leaf(0 or 1)"

Resources