I just bumped into a case where it would be useful to have fold/foldi methods on an Array2D and I wondered, if there is a reason, why Array2D does not have them.
As my array2d is quite huge, I would not want to transform it first to some other format.
Is it simply a rare use case or are there technical reasons, why those methods were not added? Or is there a way to achieve the same without touching the data in the array (as in move it)?
I think having this function in the standard Array2D module would be quite useful. You can open an issue for the Visual F# repository and help us add it :-).
In addition to what #scrwtp wrote, you can also use more direct mutable implementation. For basic functions like this, I think using mutation is fine and it is going to be a bit faster:
let foldi (folder: int -> int -> 'S -> 'T -> 'S) (state: 'S) (array: 'T[,]) =
let mutable state = state
for x in 0 .. Array2D.length1 array - 1 do
for y in 0 .. Array2D.length2 array - 1 do
state <- folder x y state (array.[x, y])
state
I don't think there's any particular reason why Array2D doesn't come with those functions in standard, since it does have map/mapi. Probably the use cases where you want to process a multidimensional array are better served by using a jagged array anyway, so there's little incentive to add them.
There's no reason why you can't define them yourself.
Here's an example of a foldi:
let foldi (folder: int -> int -> 'S -> 'T -> 'S) (state: 'S) (array: 'T[,]) =
seq {
for x in 0 .. Array2D.length1 array - 1 do
for y in 0 .. Array2D.length2 array - 1 do
yield (x, y, array.[x, y])
}
|> Seq.fold (fun acc (x, y, e) -> folder x y acc e) state
For a regular fold and a more in-depth explanation how it works, you may look here.
I generally use Seq.cast if I need to use folding functions, but this works only for fold (not for foldi).
I am adding this solution, if one needs, for example, scan:
let array2dFold folder (state:'State) (source:'T[,]) =
source
|> ( Seq.cast<'T> >> Seq.fold folder state )
Since it does not rely on mutation it is highly likely that it is slower than previous solutions.
Related
A simple example, inspired by this question:
module SimpleExample =
let fooFold projection folder state source =
source |> List.map projection |> List.fold folder state
// val fooFold :
// projection:('a -> 'b) ->
// folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c
let fooReduce projection reducer source =
source |> List.map projection |> List.reduce reducer
// val fooReduce :
// projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b
let game = [0, 5; 10, 15]
let minX, maxX = fooReduce fst min game, fooReduce fst max game
let minY, maxY = fooReduce snd min game, fooReduce snd max game
What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.
mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:
val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list
-> 'Result list * 'State
Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.
And MapReduce:
MapReduce is an algorithm for processing huge datasets on certain
kinds of distributable problems using a large number of nodes
Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys.
The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.
module ComplexExample =
let fooFold keySelection folder state source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, Seq.fold folder state xs)
// val fooFold :
// keySelection:('a -> 'b) ->
// folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
let fooReduce keySelection projection reducer source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, xs |> Seq.map projection |> Seq.reduce reducer)
// val fooReduce :
// keySelection:('a -> 'b) ->
// projection:('a -> 'c) ->
// reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
type Project = { name : string; state : string; cost : decimal }
let projects =
[ { name = "build roads"; state = "CA"; cost = 1000000M }
{ name = "fight crime"; state = "IL"; cost = 100000M }
{ name = "help farmers"; state = "IL"; cost = 2000000M }
{ name = "help farmers"; state = "CA"; cost = 200000M } ]
fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
What would be the natural name for the functions fooFold and fooReduce here?
I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.
For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.
More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.
(I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)
I have a different opinion as Thomas.
First; I think that not having overloads is a good thing, and giving every operation unique names is also
something good. I also would say that giving long names to functions rarely used is even more important
and should not be avoided.
Writing longer names is usally never a problem as we as programers usually use an IDE with auto-completion.
But reading and understanding is different. Knowing what a functions does because of a long descriptive name
is better then a short name.
A long descriptive function name gets more important the less often a function is used. It helps reading and
understanding the code. A short and less descriptive function name that is rarely used causes confusion. The
confusion would just increase if it even would be just an overload of another function name.
Yes; naming things can be hard, that's the reason why its important and shoudn't be avoided.
To what you describe. I would have name it mapFold and mapReduce. As those exactly describe what they do.
There is already a mapFold in F#, and in my opinion, the F# devs fucked up either with the naming, arguments or the
output of the function. But anyhow, they just fucked up.
I usually would have expected mapFold to do map and then fold. Actually it does, but it also returns the intermediate
list that is created on the run. Something I would not expect it to return. And i would also expect it to pass two
functions instead of one.
When we get to Thomas suggestion on naming it mapAndFold or mapThenFold. Then i would expect different behaviour
for those two functions. mapThenFold exactly tells what it does. map and then fold on it. I think the then is
not important. That's also why I would name it mapFold or mapReduce. Writing it this way already suggest a then.
But mapAndFold or mapAndReduce does not tell something about the order of execution. It just says it does two things
or somehow returns this AND that.
With that in mind, i would say that the F# library should have named its mapFold either mapAndFold, changed the return
value to just return the fold (and have two arguments instead of one). But hey, its fucked up now, we cannot change it anymore.
As for mapReduce, I think you are a little bit mistaken. The mapReduce algorithm is named that way, because it just does
map and then reduce. And that's it.
But functional programming with its stateless and more descriptive operations sometimes have additional benefits. Technically
a map is less powerful compared to a for/fold as it just describes how values are changed, without that the order matters
or the position in a list. But because of this limitation, you can run it in parallel, even on a big computer cluster. And that's all
what mapReduce Algorithm you cite do.
But that doesn't mean a mapReduce must always run its operation on a big cluster or in parallel. In my opinion you could
just name it mapReduce and that's fine. Everybody will know what it does and I think nobody expect it to suddenly run on
cluster.
In general I think the mapFold that F# provides is silly, here are 4 examples how I think it should have been provided.
let double x = x * 2
let add x y = x + y
mapFold double add 0 [1..10] // 110
mapAndFold double add 0 [1..10] // [2;4;6;8;10;12;14;16;18;20] * 110
mapReduce double add [1..10] // Some (110)
mapAndReduce double add [1..10] // Some ([2;4;6;8;10;12;14;16;18;20] * 110)
Well mapFold doesn't work that way, so you have the following options.
Implement mapReduce the way you have it. And ignore the in-consistency with mapFold.
Provide mapAndReduce and mapReduce.
Make your mapReduce return the same crap as the default implementation of mapFold does and provide mapThenReduce.
Like (3) but also add mapThenFold.
Option 4 has the most compatibility and expectation of what already exists in F#. But that doesn't mean you must do it that way.
In my opinion I would just:
implement mapReduce returning the result of map and then reduce.
I wouldn't care about a mapAndReduce version that returns a list and the result.
Provide a mapThenFold expecting two function arguments returning the result just of fold.
As a general notice: Implementing mapReduce just by calling map and then reduce is somewhat pointless. I would
expect it to have a more low-level implementation that does both things by just traversing the data-structure once.
If not, i just can call map and then reduce anyway.
So an implementation should look like:
let mapReduce mapper reducer xs =
let rec loop state xs =
match xs with
| [] -> state
| x::xs -> loop (reducer state (mapper x)) xs
match xs with
| [] -> ValueNone
| [x] -> ValueSome (mapper x)
| x::xs -> ValueSome (loop (mapper x) xs)
let double x = x * 2
let add x y = x + y
let some110 = mapReduce double add [1..10]
I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.
I need my state to be passed along while being able to chain functions with the maybe workflow. Is there a way for 2 workflows to share the same context? If no, what is the way of doing it?
UPDATE:
Well, I have a state that represents a segment of available ID's for the entities that I am going to create in the database. So once an ID is acquired the state has to be transformed to a newer state with the next available ID and thrown away so that nobody can use it again. I don't want to mutate the state for the sake of being idiomatic. The State monad looks like a way to go as it hides the transformation and passes the state along. Once the state workflow is in place I cannot use the Maybe workflow which is something I use everywhere.
As stated in the previous answer, one way to combine workflows in F# (Monads in Haskell) is by using a technique called Monad Transformers.
In F# this is really tricky, here is a project that deals with that technique.
It's possible to write the example of the previous answer by automatically combining State and Maybe (option), using that library:
#r #"c:\packages\FSharpPlus-1.0.0\lib\net45\FSharpPlus.dll"
open FSharpPlus
open FSharpPlus.Data
// Stateful computation
let computation =
monad {
let! x = get
let! y = OptionT (result (Some 10))
do! put (x + y)
let! x = get
return x
}
printfn "Result: %A" (State.eval (OptionT.run computation) 1)
So this is the other alternative, instead of creating your custom workflow, use a generic workflow that will be automatically inferred (a-la Haskell).
In F# you cannot easily mix different types of computation expressions as you would do in Haskell by using Monad Transformers or similar techniques. You could however build your own Monad, embedding state threading and optional values, as in:
type StateMaybe<'T> =
MyState -> option<'T> * MyState
// Runs the computation given an initial value and ignores the state result.
let evalState (sm: StateMaybe<'T>) = sm >> fst
// Computation expression for SateMaybe monad.
type StateMaybeBuilder() =
member this.Return<'T> (x: 'T) : StateMaybe<'T> = fun s -> (Some x, s)
member this.Bind(sm: StateMaybe<'T>, f: 'T -> StateMaybe<'S>) = fun s ->
let mx,s' = sm s
match mx with
| Some x -> f x s'
| None -> (None, s)
// Computation expression builder.
let maybeState = new StateMaybeBuilder()
// Lifts an optional value to a StateMaybe.
let liftOption<'T> (x: Option<'T>) : StateMaybe<'T> = fun s -> (x,s)
// Gets the current state.
let get : StateMaybe<MyState> = fun s -> (Some s,s)
// Puts a new state.
let put (x: MyState) : StateMaybe<unit> = fun _ -> (Some (), x)
Here's an example computation:
// Stateful computation
let computation =
maybeState {
let! x = get
let! y = liftOption (Some 10)
do! put (x + y)
let! x = get
return x
}
printfn "Result: %A" (evalState computation 1)
StateMaybe may be generalized further by making the type of the state component generic.
Others already gave you a direct answer to your question. However, I think that the way the question is stated leads to a solution that is not very idiomatic from the F# perspective - this might work for you as long as you are the only person working on the code, but I would recommend against doing that.
Even with the added details, the question is still fairly general, but here are two suggestions:
There is nothing wrong with reasonably used mutable state in F#. For example, it is perfectly fine to create a function that generates IDs and pass it along:
let createGenerator() =
let currentID = ref 0
(fun () -> incr currentID; !currentID)
Do you really need to generate the IDs while you are building the entities? It sounds like you could just generate a list of entities without ID and then use Seq.zip to zip the final list of entities with list of IDs.
As for the maybe computation, are you using it to handle regular, valid states, or to handle exceptional states? (It sounds like the first, which is the right way of doing things - but if you need to handle truly exceptional states, then you might want to use ordinary .NET exceptions).
for i in a..b do
res <- res * myarray.[i]
res
Do I have to use like
Array.fold (*) 1 (Array.sub myarray a (b - a + 1))
, which I believe is rather slow and not that concise?
Don't know if you'll find it any better, but you could do:
Seq.fold (fun r i -> r * myarray.[i]) 1 {a .. b}
Daniel's solution is pretty neat and I think it should be nearly as efficient as the for loop, because it does not need to clone the array.
If you wanted a more concise solution, then you can use indexer instead of Array.sub, which does need to clone some part of the array, but it looks quite neat:
myarray.[a .. b] |> Seq.fold (*) 1
This clones a part of the array because the slicing operation returns an array. You could define your own slicing operation that returns the elements as seq<'T> (and thus does not clone the whole array):
module Array =
let slice a b (arr:'T[]) =
seq { for i in a .. b -> arr.[i] }
Using this function, you could write:
myarray |> Array.slice a b |> Seq.fold (*) 1
I believe this more directly expresses the functionality that you're trying to implement. As always with performance - you should measure the performance to see if you need to make such optimizations or if the first version is fast enough for your purpose.
If you're concerned with speed then I'd shy away from using seq unless you're prototyping. Either stick with the for loop or rewrite as a recursive function. The example you gave is simplistic and sometimes more complex problems are better represented as recursion.
let rec rangeProduct a b total (array : _[]) =
if a <= b then
rangeProduct (a + 1) b (total * array.[a]) array
else
total
let res = myArray |> rangeProduct a b res
There is no overhead here, it's as fast as possible, there is no mutation, and it's functional.
Ok, only just in F# and this is how I understand it now :
Some problems are recursive in nature (building or reading out a treestructure to name just one) and then you use recursion. In these cases you preferably use tail-recursion to give the stack a break
Some languagues are pure functional, so you have to use recursion in stead of while-loops, even if the problem is not recursive in nature
So my question : since F# also support the imperative paradigm, would you use tail recursion in F# for problems that aren't naturally recursive ones? Especially since I have read the compiler recongnizes tail recursion and just transforms it in a while loop anyway?
If so : why ?
The best answer is 'neither'. :)
There's some ugliness associated with both while loops and tail recursion.
While loops require mutability and effects, and though I have nothing against using these in moderation, especially when encapsulated in the context of a local function, you do sometimes feel like you're cluttering/uglifying your program when you start introducing effects merely to loop.
Tail recursion often has the disadvantage of requiring an extra accumulator parameter or continuation-passing style. This clutters the program with extra boilerplate to massage the startup conditions of the function.
The best answer is to use neither while loops nor recursion. Higher-order functions and lambdas are your saviors here, especially maps and folds. Why fool around with messy control structures for looping when you can encapsulate those in reusable libraries and then just state the essence of your computation simply and declaratively?
If you get in the habit of often calling map/fold rather than using loops/recursion, as well as providing a fold function along with any new tree-structured data type you introduce, you'll go far. :)
For those interested in learning more about folds in F#, why not check out my first three blog posts in a series on the topic?
In order of preference and general programming style, I will write code as follows:
Map/fold if its available
let x = [1 .. 10] |> List.map ((*) 2)
Its just convenient and easy to use.
Non-tail recursive function
> let rec map f = function
| x::xs -> f x::map f xs
| [] -> [];;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Most algorithms are easiest to read and express without tail-recursion. This works particularly well when you don't need to recurse too deeply, making it suitable for many sorting algorithms and most operations on balanced data structures.
Remember, log2(1,000,000,000,000,000) ~= 50, so log(n) operation without tail-recursion isn't scary at all.
Tail-recursive with accumulator
> let rev l =
let rec loop acc = function
| [] -> acc
| x::xs -> loop (x::acc) xs
loop [] l
let map f l =
let rec loop acc = function
| [] -> rev acc
| x::xs -> loop (f x::acc) xs
loop [] l;;
val rev : 'a list -> 'a list
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
It works, but the code is clumsy and elegance of the algorithm is slightly obscured. The example above isn't too bad to read, but once you get into tree-like data structures, it really starts to become a nightmare.
Tail-recursive with continuation passing
> let rec map cont f = function
| [] -> cont []
| x::xs -> map (fun l -> cont <| f x::l) f xs;;
val map : ('a list -> 'b) -> ('c -> 'a) -> 'c list -> 'b
> [1 .. 10] |> map id ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Whenever I see code like this, I say to myself "now that's a neat trick!". At the cost of readability, it maintains the shape of the non-recursive function, and found it really interesting for tail-recursive inserts into binary trees.
Its probably my monad-phobia speaking here, or maybe my inherent lack of familiarity with Lisp's call/cc, but I think those occasions when CSP actually simplifies algorithms are few and far between. Counter-examples are welcome in the comments.
While loops / for loops
It occurs to me that, aside from sequence comprehensions, I've never used while or for loops in my F# code. In any case...
> let map f l =
let l' = ref l
let acc = ref []
while not <| List.isEmpty !l' do
acc := (!l' |> List.hd |> f)::!acc
l' := !l' |> List.tl
!acc |> List.rev;;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Its practically a parody of imperative programming. You might be able to maintain a little sanity by declaring let mutable l' = l instead, but any non-trivial function will require the use of ref.
Honestly, any problem that you can solve with a loop is already a naturally recursive one, as you can translate both into (usually conditional) jumps in the end.
I believe you should stick with tail calls in almost all cases where you must write an explicit loop. It is just more versatile:
A while loop restricts you to one loop body, while a tail call can allow you to switch between many different states while the "loop" is running.
A while loop restricts you to one condition to check for termination, with the tail recursion you can have an arbitrarily complicated match expression waiting to shunt the control flow off somewhere else.
Your tail calls all return useful values and can produce useful type errors. A while loop does not return anything and relies on side effects to do its work.
While loops are not first class while functions with tail calls (or while loops in them) are. Recursive bindings in local scope can be inspected for their type.
A tail recursive function can easily be broken apart into parts that use tail calls to call each other in the needed sequence. This may make things easier to read, and will help if you find you want to start in the middle of a loop. This is not true of a while loop.
All in all while loops in F# are only worthwhile if you really are going to be working with mutable state, inside a function body, doing the same thing repeatedly until a certain condition is met. If the loop is generally useful or very complicated, you may want to factor it out into some other top level binding. If the data types are themselves immutable (a lot of .NET value types are), you may gain very little from using mutable references to them anyway.
I'd say that you should only resort to while loops for niche cases where a while loop is perfect for the job, and is relatively short. In many imperative programming languages, while loops are often twisted into unnatural roles like driving stuff repeatedly over a case statement. Avoid those kinds of things, and see if you can use tail calls or, even better, a higher order function, to achieve the same ends.
Many problems have a recursive nature, but having thought imperatively for a long time often prevents us from seeing this.
In general I would use a functional technique wherever possible in a functional language - Loops are never functional since they exclusively rely on side-effects. So when dealing with imperative code or algorithms, using loops is adequate, but in functional context, they're aren't considered very nice.
Functional technique doesn't only mean recursion but also using appropriate higher-order functions.
So when summing a list, neither a for-loop nor a recursive function but a fold is the solution for having comprehensible code without reinventing the wheel.
for problems that aren't naturally recursive ones
..
just transforms it in a while loop anyway
You answered this yourself.
Use recursion for recursive problems and loop for things that aren't functional in nature.
Just always think: Which feels more natural, which is more readable.