Why is the signature of foldBack so much different from fold in F#? - f#

There are at least 2 things I don't understand about it:
refactoring from left side to right side folding requires a lot of changes not only in signature but in every place depended on the folder function
there is no way to chain it with regard to the list without flipping the parameters
List.foldBack : ('T -> 'State -> 'State) -> 'T list -> 'State -> 'State
List.fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State
Any good reason for why would someone put all parameters in reverse in the signature of foldBack compared to fold?

It's just a useful mnemonic to help the programmer remember how the list is iterated. Imagine your list is laid out with the beginning on the left and the end on the right. fold starts with an initial state on the left and accumulates state going to right. foldBack does the opposite, it starts with an initial state on the right and goes back over the list to the left.
This is definitely showing F#'s OCaml heritage as some other functional languages (Haskell, Scala, ML) keep the list as the last argument to allow for the more common partial application scenarios.
If I really needed a version of foldBack that looked exactly like fold, I would define my own helper function:
module List =
let foldBack' f acc lst =
let flip f a b = f b a
List.foldBack (flip f) lst acc

It's a relic of F#'s beginnings in OCaml. You can see that the F# function signatures for List.fold and List.foldBack are the same in the OCaml documentation (where they are called List.fold_left and List.fold_right, respectively).

Related

F#: What to call a combination of map and fold, or of map and reduce?

A simple example, inspired by this question:
module SimpleExample =
let fooFold projection folder state source =
source |> List.map projection |> List.fold folder state
// val fooFold :
// projection:('a -> 'b) ->
// folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c
let fooReduce projection reducer source =
source |> List.map projection |> List.reduce reducer
// val fooReduce :
// projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b
let game = [0, 5; 10, 15]
let minX, maxX = fooReduce fst min game, fooReduce fst max game
let minY, maxY = fooReduce snd min game, fooReduce snd max game
What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.
mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:
val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list
-> 'Result list * 'State
Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.
And MapReduce:
MapReduce is an algorithm for processing huge datasets on certain
kinds of distributable problems using a large number of nodes
Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys.
The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.
module ComplexExample =
let fooFold keySelection folder state source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, Seq.fold folder state xs)
// val fooFold :
// keySelection:('a -> 'b) ->
// folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
let fooReduce keySelection projection reducer source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, xs |> Seq.map projection |> Seq.reduce reducer)
// val fooReduce :
// keySelection:('a -> 'b) ->
// projection:('a -> 'c) ->
// reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
type Project = { name : string; state : string; cost : decimal }
let projects =
[ { name = "build roads"; state = "CA"; cost = 1000000M }
{ name = "fight crime"; state = "IL"; cost = 100000M }
{ name = "help farmers"; state = "IL"; cost = 2000000M }
{ name = "help farmers"; state = "CA"; cost = 200000M } ]
fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
What would be the natural name for the functions fooFold and fooReduce here?
I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.
For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.
More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.
(I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)
I have a different opinion as Thomas.
First; I think that not having overloads is a good thing, and giving every operation unique names is also
something good. I also would say that giving long names to functions rarely used is even more important
and should not be avoided.
Writing longer names is usally never a problem as we as programers usually use an IDE with auto-completion.
But reading and understanding is different. Knowing what a functions does because of a long descriptive name
is better then a short name.
A long descriptive function name gets more important the less often a function is used. It helps reading and
understanding the code. A short and less descriptive function name that is rarely used causes confusion. The
confusion would just increase if it even would be just an overload of another function name.
Yes; naming things can be hard, that's the reason why its important and shoudn't be avoided.
To what you describe. I would have name it mapFold and mapReduce. As those exactly describe what they do.
There is already a mapFold in F#, and in my opinion, the F# devs fucked up either with the naming, arguments or the
output of the function. But anyhow, they just fucked up.
I usually would have expected mapFold to do map and then fold. Actually it does, but it also returns the intermediate
list that is created on the run. Something I would not expect it to return. And i would also expect it to pass two
functions instead of one.
When we get to Thomas suggestion on naming it mapAndFold or mapThenFold. Then i would expect different behaviour
for those two functions. mapThenFold exactly tells what it does. map and then fold on it. I think the then is
not important. That's also why I would name it mapFold or mapReduce. Writing it this way already suggest a then.
But mapAndFold or mapAndReduce does not tell something about the order of execution. It just says it does two things
or somehow returns this AND that.
With that in mind, i would say that the F# library should have named its mapFold either mapAndFold, changed the return
value to just return the fold (and have two arguments instead of one). But hey, its fucked up now, we cannot change it anymore.
As for mapReduce, I think you are a little bit mistaken. The mapReduce algorithm is named that way, because it just does
map and then reduce. And that's it.
But functional programming with its stateless and more descriptive operations sometimes have additional benefits. Technically
a map is less powerful compared to a for/fold as it just describes how values are changed, without that the order matters
or the position in a list. But because of this limitation, you can run it in parallel, even on a big computer cluster. And that's all
what mapReduce Algorithm you cite do.
But that doesn't mean a mapReduce must always run its operation on a big cluster or in parallel. In my opinion you could
just name it mapReduce and that's fine. Everybody will know what it does and I think nobody expect it to suddenly run on
cluster.
In general I think the mapFold that F# provides is silly, here are 4 examples how I think it should have been provided.
let double x = x * 2
let add x y = x + y
mapFold double add 0 [1..10] // 110
mapAndFold double add 0 [1..10] // [2;4;6;8;10;12;14;16;18;20] * 110
mapReduce double add [1..10] // Some (110)
mapAndReduce double add [1..10] // Some ([2;4;6;8;10;12;14;16;18;20] * 110)
Well mapFold doesn't work that way, so you have the following options.
Implement mapReduce the way you have it. And ignore the in-consistency with mapFold.
Provide mapAndReduce and mapReduce.
Make your mapReduce return the same crap as the default implementation of mapFold does and provide mapThenReduce.
Like (3) but also add mapThenFold.
Option 4 has the most compatibility and expectation of what already exists in F#. But that doesn't mean you must do it that way.
In my opinion I would just:
implement mapReduce returning the result of map and then reduce.
I wouldn't care about a mapAndReduce version that returns a list and the result.
Provide a mapThenFold expecting two function arguments returning the result just of fold.
As a general notice: Implementing mapReduce just by calling map and then reduce is somewhat pointless. I would
expect it to have a more low-level implementation that does both things by just traversing the data-structure once.
If not, i just can call map and then reduce anyway.
So an implementation should look like:
let mapReduce mapper reducer xs =
let rec loop state xs =
match xs with
| [] -> state
| x::xs -> loop (reducer state (mapper x)) xs
match xs with
| [] -> ValueNone
| [x] -> ValueSome (mapper x)
| x::xs -> ValueSome (loop (mapper x) xs)
let double x = x * 2
let add x y = x + y
let some110 = mapReduce double add [1..10]

F# Binary Search Tree

I am trying to implement BST in F#. Since I am starting my journey with F# I wanted to ask for help.
I have simple a test;
[<Fact>]
let ``Data is retained`` () =
let treeData = create [4]
treeData |> data |> should equal 4
treeData |> left |> should equal None
treeData |> right |> should equal None
Tree type which uses discriminated unions
type Tree<'T> =
| Leaf
| Node of value: 'T * left: Tree<'T> * right: Tree<'T>
a recursive function which inserts data nodes into the tree
let rec insert newValue (targetTree: Tree<'T>) =
match targetTree with
| Leaf -> Node(newValue, Leaf, Leaf)
| Node (value, left, right) when newValue < value ->
let left' = insert newValue left
Node(value, left', right)
| Node (value, left, right) when newValue > value ->
let right' = insert newValue right
Node(value, left, right')
| _ -> targetTree
now I have problems with create function. I have this:
let create items =
List.fold insert Leaf items
and resulting error:
FS0001 Type mismatch. Expecting a
''a -> Tree<'a> -> 'a' but given a
''a -> Tree<'a> -> Tree<'a>' The types ''a' and 'Tree<'a>' cannot be unified.
The List.fold documentation shows its type signature as:
List.fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State
Let's unpack that. The first argument is a function of type 'State -> 'T -> 'State. That means it takes a state and an argument of type T, and returns a new state. Here, the state is your Tree type: starting at a basic Leaf, you're building up the tree step by step. Second argument to List.fold is the initial state (a Leaf in this case), and third argument is the list of items of type T to fold over.
Your second and third arguments are correct, but your first argument doesn't line up with the signature that List.fold is expecting. List.fold wants something of type 'State -> 'T -> 'State, which in your case would be Tree<'a> -> 'a -> Tree<'a>. That is, a function that takes the tree as its first parameter and a single item as its second parameter. But your insert function takes the parameters the other way around (the item as the first parameter, and the tree as the second parameter).
I'll pause here to note that your insert function is correct according to the style rules of idiomatic F#, and you should not change the order of its parameters. When writing functions that deal with collections, you always want to take the collection as the last parameter so that you can write something like tree |> insert 5. So I strongly suggest you don't change the order of the arguments your insert function takes.
So if you shouldn't change the order of arguments of your insert function, yet they're in the wrong order to use with List.fold, what do you do? Simple: you create an anonymous function with the arguments flipped around, so that you can use insert with List.fold:
let create items =
List.fold (fun tree item -> insert item tree) Leaf items
Now we'll go one step further and generalize this. It's actually pretty common in F# programming to find that your two-parameter function has the parameters the right way around for most things, but the wrong way around for one particular use case. To solve that problem, sometimes it's useful to create a general-purpose function called flip:
let flip f = fun a b -> f b a
Then you could just write your create function like this:
let create items =
List.fold (flip insert) Leaf items
Sometimes the use of flip can make code more confusing rather than less confusing, so I don't recommend using it all the time. (This is also why there isn't a flip function in the F# standard library: because it's not always the best solution. And because it's trivial to write yourself, its lack in the standard library is not a big deal). But sometimes using flip makes code simpler, and I think this is one of those cases.
P.S. The flip function could also have been written like this:
let flip f a b = f b a
This definition is identical to the let flip f = fun a b -> f b a definition I used in the main example. Do you know why?

Make Fish in F#

The Kleisli composition operator >=>, also known as the "fish" in Haskell circles, may come in handy in many situations where composition of specialized functions is needed. It works kind of like the >> operator, but instead of composing simple functions 'a -> 'b it confers some special properties on them possibly best expressed as 'a -> m<'b>, where m is either a monad-like type or some property of the function's return value.
Evidence of this practice in the wider F# community can be found e.g. in Scott Wlaschin's Railway oriented programming (part 2) as composition of functions returning the Result<'TSuccess,'TFailure> type.
Reasoning that where there's a bind, there must be also fish, I try to parametrize the canonical Kleisli operator's definition let (>=>) f g a = f a >>= g with the bind function itself:
let mkFish bind f g a = bind g (f a)
This works wonderfully with the caveat that generally one shouldn't unleash special operators on user-facing code. I can compose functions returning options...
module Option =
let (>=>) f = mkFish Option.bind f
let odd i = if i % 2 = 0 then None else Some i
let small i = if abs i > 10 then None else Some i
[0; -1; 9; -99] |> List.choose (odd >=> small)
// val it : int list = [-1; 9]
... or I can devise a function application to the two topmost values of a stack and push the result back without having to reference the data structure I'm operating on explicitly:
module Stack =
let (>=>) f = mkFish (<||) f
type 'a Stack = Stack of 'a list
let pop = function
| Stack[] -> failwith "Empty Stack"
| Stack(x::xs) -> x, Stack xs
let push x (Stack xs) = Stack(x::xs)
let apply2 f =
pop >=> fun x ->
pop >=> fun y ->
push (f x y)
But what bothers me is that the signature val mkFish : bind:('a -> 'b -> 'c) -> f:('d -> 'b) -> g:'a -> a:'d -> 'c makes no sense. Type variables are in confusing order, it's overly general ('a should be a function), and I'm not seeing a natural way to annotate it.
How can I abstract here in the absence of formal functors and monads, not having to define the Kleisli operator explicitly for each type?
You can't do it in a natural way without Higher Kinds.
The signature of fish should be something like:
let (>=>) (f:'T -> #Monad<'U>``) (g:' U -> #Monad<'V>) (x:'T) : #Monad<'V> = bind (f x) g
which is unrepresentable in current .NET type system, but you can replace #Monad with your specific monad, ie: Async and use its corresponding bind function in the implementation.
Having said that, if you really want to use a generic fish operator you can use F#+ which has it already defined by using static constraints. If you look at the 5th code sample here you will see it in action over different types.
Of course you can also define your own, but there is a lot of things to code, in order to make it behave properly in most common scenarios. You can grab the code from the library or if you want I can write a small (but limited) code sample.
The generic fish is defined in this line.
I think in general you really feel the lack of generic functions when using operators, because as you discovered, you need to open and close modules. It's not like functions that you prefix them with the module name, you can do that with operators as well (something like Option.(>=>)) , but then it defeats the whole purpose of using operators, I mean it's no longer an operator.

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Does this pipe tuple operator already exist* somewhere?

I'm aware of (||>) which does (a' * 'b) -> ('a -> b' -> 'c) -> 'c
But I've been finding this quite useful, and wondered if I was reinventing the wheel:
// ('a * 'a) -> ('a -> 'b) -> ('b * 'b)
let inline (|>>) (a,b) f = (f a, f b)
(*It can happen, I only discovered the ceil function half an hour ago!)
No, it doesn't.
However, you will encounter its variant very often if you use FParsec. Here is the type signature in FParsec documentation:
val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u>
I think the library has a very well-designed set of operators which can be generalized for other purposes as well. The list of FParsec operators can be found here.
I did a bit of digging; |>> operator doesn't seem to have built-in Haskell counterpart although it is easy to be defined using Control.Arrow.
The operator you described is essentially the map function for a two-element tuple. The map function, in general has a signature (for some F<'a> which could be seq<'a> or many other types in F# libraries):
map : ('a -> 'b) -> F<'a> -> F<'b>
So, if you define F<'a> as a two element tuple, then your function is actually just map (if you flip the arguments):
type F<'a> = 'a * 'a
let map f (a, b) = (f a, f b)
The operation is not built-in anywhere in the F# library, but it is useful to realize that it actually matches a pattern that is quite common in F# libraries elsewhere (list, seq, array, etc.)
Looking at the Haskell answer referenced by #pad - in principle, Haskell makes it possible to define the same function for all types that support such operations using type classes (so you would write just fmap instead of Seq.map or instead of your TwoElementTuple.map, but it actually does not work for various technical reasons - so Haskellers need to call it differently).
In F#, this is not easily possible to define a single map function for different types, but you can still think of your function as a map for two-element tuples (even if you find it easier to give it a symbolic operator name, rather than the name map.)

Resources