F# Binary Search Tree - f#

I am trying to implement BST in F#. Since I am starting my journey with F# I wanted to ask for help.
I have simple a test;
[<Fact>]
let ``Data is retained`` () =
let treeData = create [4]
treeData |> data |> should equal 4
treeData |> left |> should equal None
treeData |> right |> should equal None
Tree type which uses discriminated unions
type Tree<'T> =
| Leaf
| Node of value: 'T * left: Tree<'T> * right: Tree<'T>
a recursive function which inserts data nodes into the tree
let rec insert newValue (targetTree: Tree<'T>) =
match targetTree with
| Leaf -> Node(newValue, Leaf, Leaf)
| Node (value, left, right) when newValue < value ->
let left' = insert newValue left
Node(value, left', right)
| Node (value, left, right) when newValue > value ->
let right' = insert newValue right
Node(value, left, right')
| _ -> targetTree
now I have problems with create function. I have this:
let create items =
List.fold insert Leaf items
and resulting error:
FS0001 Type mismatch. Expecting a
''a -> Tree<'a> -> 'a' but given a
''a -> Tree<'a> -> Tree<'a>' The types ''a' and 'Tree<'a>' cannot be unified.

The List.fold documentation shows its type signature as:
List.fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State
Let's unpack that. The first argument is a function of type 'State -> 'T -> 'State. That means it takes a state and an argument of type T, and returns a new state. Here, the state is your Tree type: starting at a basic Leaf, you're building up the tree step by step. Second argument to List.fold is the initial state (a Leaf in this case), and third argument is the list of items of type T to fold over.
Your second and third arguments are correct, but your first argument doesn't line up with the signature that List.fold is expecting. List.fold wants something of type 'State -> 'T -> 'State, which in your case would be Tree<'a> -> 'a -> Tree<'a>. That is, a function that takes the tree as its first parameter and a single item as its second parameter. But your insert function takes the parameters the other way around (the item as the first parameter, and the tree as the second parameter).
I'll pause here to note that your insert function is correct according to the style rules of idiomatic F#, and you should not change the order of its parameters. When writing functions that deal with collections, you always want to take the collection as the last parameter so that you can write something like tree |> insert 5. So I strongly suggest you don't change the order of the arguments your insert function takes.
So if you shouldn't change the order of arguments of your insert function, yet they're in the wrong order to use with List.fold, what do you do? Simple: you create an anonymous function with the arguments flipped around, so that you can use insert with List.fold:
let create items =
List.fold (fun tree item -> insert item tree) Leaf items
Now we'll go one step further and generalize this. It's actually pretty common in F# programming to find that your two-parameter function has the parameters the right way around for most things, but the wrong way around for one particular use case. To solve that problem, sometimes it's useful to create a general-purpose function called flip:
let flip f = fun a b -> f b a
Then you could just write your create function like this:
let create items =
List.fold (flip insert) Leaf items
Sometimes the use of flip can make code more confusing rather than less confusing, so I don't recommend using it all the time. (This is also why there isn't a flip function in the F# standard library: because it's not always the best solution. And because it's trivial to write yourself, its lack in the standard library is not a big deal). But sometimes using flip makes code simpler, and I think this is one of those cases.
P.S. The flip function could also have been written like this:
let flip f a b = f b a
This definition is identical to the let flip f = fun a b -> f b a definition I used in the main example. Do you know why?

Related

'Anonymous type variables are not permitted in this declaration' error when adding parameters to discriminated union cases in F#

So I have some (I'm assuming rather unusual) code which is for building Function Trees. Here's it is right now:
type FunctionTree<'Function> =
| BranchNode of seq<FunctionTree<'Function>>
| Leaf of (a:'Function -> unit) with
member __.Execute() = do a
The expression a:'Function -> unit is what makes the compiler throw a fit, giving me the error 'Anonymous type variables are not permitted in this declaration' and I have no idea why. I've tried adding a variable to the BranchNode, adding (yucky) double parentheses around the expression but nothing seems to have worked.
Answer to the compiler error question
This does not compile...
Leaf of (a:'Function -> unit)
...because discriminated field names can be added to the types of the DU cases, not to the types of the function types in a DU case. In contrast, this compiles...
Leaf of a: ('Function -> unit)
...because the field name a is being used to name the type (Function -> unit).
Additional discussion about the code
However, there is another issue. The member Execute that you are adding is not being added to the Leaf node, as your code implies. It is being added to the entire function tree. Consequently, you will not have access to the label a inside your implementation of Execute. Think of it like this...
type FunctionTree<'Function> =
| BranchNode of seq<FunctionTree<'Function>>
| Leaf of a: ('Function -> unit)
with member __.Execute() = do a
... with the member shifted to the left to clarify that it applies to the entire union, not just the leaf case. That explains why the above code now has a different compiler error... a is not defined. The field name a is used to clarify the instantiation of a Leaf case. The field name a is not available elsewhere.
let leaf = Leaf(a: myFunc)
Consequently, the label a is not available to your Execute member. You would need to do something like this...
with member x.Execute(input) =
match x with
| BranchNode(b) -> b |> Seq.iter(fun n -> n.Execute(input))
| Leaf(f) -> f(input) |> ignore
Notice in the above code that the x value is a FunctionTree.
Alternative implementation
We could continue going. However, I think the following may implement what you are aiming for:
type FunctionTree<'T> =
| BranchNode of seq<FunctionTree<'T>>
| LeafNode of ('T -> unit)
let rec evaluate input tree =
match tree with
| LeafNode(leaf) -> leaf(input)
| BranchNode(branch) -> branch |> Seq.iter (evaluate input)
BranchNode([
LeafNode(printfn "%d")
LeafNode(printfn "%A")
])
|> evaluate 42

summing elements from a user defined datatype

Upon covering the predefined datatypes in f# (i.e lists) and how to sum elements of a list or a sequence, I'm trying to learn how I can work with user defined datatypes. Say I create a data type, call it list1:
type list1 =
A
| B of int * list1
Where:
A stands for an empty list
B builds a new list by adding an int in front of another list
so 1,2,3,4, will be represented with the list1 value:
B(1, B(2, B(3, B(4, A))))
From the wikibook I learned that with a list I can sum the elements by doing:
let List.sum [1; 2; 3; 4]
But how do I go about summing the elements of a user defined datatype? Any hints would be greatly appreciated.
Edit: I'm able to take advantage of the match operator:
let rec sumit (l: ilist) : int =
match l with
| (B(x1, A)) -> x1
| (B(x1, B(x2, A))) -> (x1+x2)
sumit (B(3, B(4, A)))
I get:
val it : int = 7
How can I make it so that if I have more than 2 ints it still sums the elemets (i.e. (B(3, B(4, B(5, A)))) gets 12?
One good general approach to questions like this is to write out your algorithm in word form or pseudocode form, then once you've figured out your algorithm, convert it to F#. In this case where you want to sum the lists, that would look like this:
The first step in figuring out an algorithm is to carefully define the specifications of the problem. I want an algorithm to sum my custom list type. What exactly does that mean? Or, to be more specific, what exactly does that mean for the two different kinds of values (A and B) that my custom list type can have? Well, let's look at them one at a time. If a list is of type A, then that represents an empty list, so I need to decide what the sum of an empty list should be. The most sensible value for the sum of an empty list is 0, so the rule is "I the list is of type A, then the sum is 0". Now, if the list is of type B, then what does the sum of that list mean? Well, the sum of a list of type B would be its int value, plus the sum of the sublist.
So now we have a "sum" rule for each of the two types that list1 can have. If A, the sum is 0. If B, the sum is (value + sum of sublist). And that rule translates almost verbatim into F# code!
let rec sum (lst : list1) =
match lst with
| A -> 0
| B (value, sublist) -> value + sum sublist
A couple things I want to note about this code. First, one thing you may or may not have seen before (since you seem to be an F# beginner) is the rec keyword. This is required when you're writing a recursive function: due to internal details in how the F# parser is implemented, if a function is going to call itself, you have to declare that ahead of time when you declare the function's name and parameters. Second, this is not the best way to write a sum function, because it is not actually tail-recursive, which means that it might throw a StackOverflowException if you try to sum a really, really long list. At this point in your learning F# you maybe shouldn't worry about that just yet, but eventually you will learn a useful technique for turning a non-tail-recursive function into a tail-recursive one. It involves adding an extra parameter usually called an "accumulator" (and sometimes spelled acc for short), and a properly tail-recursive version of the above sum function would have looked like this:
let sum (lst : list1) =
let rec tailRecursiveSum (acc : int) (lst : list1) =
match lst with
| A -> acc
| B (value, sublist) -> tailRecursiveSum (acc + value) sublist
tailRecursiveSum 0 lst
If you're already at the point where you can understand this, great! If you're not at that point yet, bookmark this answer and come back to it once you've studied tail recursion, because this technique (turning a non-tail-recursive function into a tail-recursive one with the use of an inner function and an accumulator parameter) is a very valuable one that has all sorts of applications in F# programming.
Besides tail-recursion, generic programming may be a concept of importance for the functional learner. Why go to the trouble of creating a custom data type, if it only can hold integer values?
The sum of all elements of a list can be abstracted as the repeated application of the addition operator to all elements of the list and an accumulator primed with an initial state. This can be generalized as a functional fold:
type 'a list1 = A | B of 'a * 'a list1
let fold folder (state : 'State) list =
let rec loop s = function
| A -> s
| B(x : 'T, xs) -> loop (folder s x) xs
loop state list
// val fold :
// folder:('State -> 'T -> 'State) -> state:'State -> list:'T list1 -> 'State
B(1, B(2, B(3, B(4, A))))
|> fold (+) 0
// val it : int = 10
Making also the sum function generic needs a little black magic called statically resolved type parameters. The signature isn't pretty, it essentially tells you that it expects the (+) operator on a type to successfully compile.
let inline sum xs = fold (+) Unchecked.defaultof<_> xs
// val inline sum :
// xs: ^a list1 -> ^b
// when ( ^b or ^a) : (static member ( + ) : ^b * ^a -> ^b)
B(1, B(2, B(3, B(4, A))))
|> sum
// val it : int = 10

Free Monad in F# with generic output type

I am trying to apply the free monad pattern as described in F# for fun and profit to implement data access (for Microsoft Azure Table Storage)
Example
Let's assume we have three database tables and three dao's Foo, Bar, Baz:
Foo Bar Baz
key | col key | col key | col
--------- --------- ---------
foo | 1 bar | 2 |
I want to select Foo with key="foo" and Bar with key="bar" to insert a Baz with key="baz" and col=3
Select<Foo> ("foo", fun foo -> Done foo)
>>= (fun foo -> Select<Bar> ("bar", fun bar -> Done bar)
>>= (fun bar -> Insert<Baz> ((Baz ("baz", foo.col + bar.col), fun () -> Done ()))))
Within the interpreter function
Select results in a function call that takes a key : string and returns an obj
Insert results in a function call that takes an obj and returns unit
Problem
I defined two operations Select and Insert in addition to Done to terminate the computation:
type StoreOp<'T> =
| Select of string * ('T -> StoreOp<'T>)
| Insert of 'T * (unit -> StoreOp<'T>)
| Done of 'T
In order to chain StoreOp's I am trying to implement the correct bind function:
let rec bindOp (f : 'T1 -> StoreOp<'T2>) (op : StoreOp<'T1>) : StoreOp<'T2> =
match op with
| Select (k, next) ->
Select (k, fun v -> bindOp f (next v))
| Insert (v, next) ->
Insert (v, fun () -> bindOp f (next ()))
| Done t ->
f t
let (>>=) = bindOp
However, the f# compiler correctly warns me that:
The type variable 'T1 has been constrained to be type 'T2
For this implementation of bindOp the type is fixed throughout the computation, so instead of:
Foo > Bar > unit
all I can express is:
Foo > Foo > Foo
How should I modify the definition of StoreOp and/or bindOp to work with different types throughout the computation?
As Fyodor mentioned in the comments, the problem is with the type declaration. If you wanted to make it compile at the price of sacrificing type safety, you could use obj in two places - this at least shows where the problem is:
type StoreOp<'T> =
| Select of string * (obj -> StoreOp<'T>)
| Insert of obj * (unit -> StoreOp<'T>)
| Done of 'T
I'm not entirely sure what the two operations are supposed to model - but I guess Select means you are reading something (with string key?) and Insert means that you are storing some value (and then continue with unit). So, here, the data you are storing/reading would be obj.
There are ways of making this type safe, but I think you'd get better answer if you explained what are you trying to achieve by using the monadic structure.
Without knowing more, I think using free monads will only make your code very messy and difficult to understand. F# is a functional-first language, which means that you can write data transformations in a nice functional style using immutable data types and use imperative programming to load your data and store your results. If you are working with table storage, why not just write the normal imperative code to read data from table storage, pass the results to a pure functional transformation and then store the results?

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Why is the signature of foldBack so much different from fold in F#?

There are at least 2 things I don't understand about it:
refactoring from left side to right side folding requires a lot of changes not only in signature but in every place depended on the folder function
there is no way to chain it with regard to the list without flipping the parameters
List.foldBack : ('T -> 'State -> 'State) -> 'T list -> 'State -> 'State
List.fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State
Any good reason for why would someone put all parameters in reverse in the signature of foldBack compared to fold?
It's just a useful mnemonic to help the programmer remember how the list is iterated. Imagine your list is laid out with the beginning on the left and the end on the right. fold starts with an initial state on the left and accumulates state going to right. foldBack does the opposite, it starts with an initial state on the right and goes back over the list to the left.
This is definitely showing F#'s OCaml heritage as some other functional languages (Haskell, Scala, ML) keep the list as the last argument to allow for the more common partial application scenarios.
If I really needed a version of foldBack that looked exactly like fold, I would define my own helper function:
module List =
let foldBack' f acc lst =
let flip f a b = f b a
List.foldBack (flip f) lst acc
It's a relic of F#'s beginnings in OCaml. You can see that the F# function signatures for List.fold and List.foldBack are the same in the OCaml documentation (where they are called List.fold_left and List.fold_right, respectively).

Resources