How to write efficient list/seq functions in F#? (mapFoldWhile) - f#

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?

In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Related

How to split F# result type list into lists of inner type

I have a list/sequence as follows Result<DataEntry, exn> []. This list is populated by calling multiple API endpoints in parallel based on some user inputs.
I don't care if some of the calls fail as long as at least 1 succeeds. I then need to perform multiple operations on the success list.
My question is how to partition the Result list into exn [] and DataEntry [] lists. I tried the following:
// allData is Result<DataEntry, exn> []
let filterOutErrors (input: Result<DataEntry, exn>) =
match input with
| Ok v -> true
| _ -> false
let values, err = allData |> Array.partition filterOutErrors
This in principle meets the requirement since values contains all the success cases but understandably the compiler can't infer the types so both values and err contains Result<DataEntry, exn>.
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Is there any way to split a list of result Result<Success, Err> such that you end up with separate lists of the inner type?
Remember that Seq / List / Array are foldable, so you can use fold to convert a Seq / List / Array of 'Ts into any other type 'S. Here you want to go from []Result<DataEntry, exn> to, e.g., the tuple list<DataEntry> * list<exn>. We can define the following folder function, that takes an initial state s of type list<'a> * list<'b> and a Result Result<'a, 'b> and returns your tuple of lists list<'a> * list<'b>:
let listFolder s r =
match r with
| Ok data -> (data :: (fst s), snd s)
| Error err -> (fst s, err :: (snd s))
then you can fold over your array as follows:
let (values, err) = Seq.fold listFolder ([], []) allData
You can extract the good and the bad like this.
let values =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok ok -> Some ok
| Result.Error _ -> None)
let err =
allData
|> Array.choose (fun r ->
match r with
| Result.Ok _ -> None
| Result.Error error -> Some error)
You seem confused about whether you have arrays or lists. The F# code you use, in the snippet and in your question text, all points to use of arrays, in spite of you several times mentioning lists.
It has recently been recommended that we use array instead of the [] symbol in types, since there are inconsistencies in the way F# uses the symbol [] to mean list in some places, and array in other places. There is also the symbol [||] for arrays, which may add more confusion.
So that would be recommending Result<DataEntry,exn> array in this case.
The answer from Víctor G. Adán is functional, but it's a downside that the API requires you to pass in two empty lists, exposing the internal implementation.
You could wrap this into a "starter" function, but then the code grows, requires nested functions or using modules and the intention is obscured.
The answer from Bent Tranberg, while more readable requires two passes of the data, and it seems inefficient to map into Option type just to be able to filter on it using .Choose.
I propose KISS'ing it with some good old mutation.
open System.Collections.Generic
let splitByOkAndErrors xs =
let oks = List<'T>()
let errors = List<'V>()
for x in xs do
match x with
| Ok v -> oks.Add v
| Error e -> errors.Add e
(oks |> seq, errors |> seq)
I know I know, mutation, yuck right? I believe you should not shy away from that even in F#, use the right tool for every situation: the mutation is kept local to the function, so it's still pure. The API is clean just taking in the list of Result to split, there is no concepts like folding, recursive calls, list cons pattern matching etc. to understand, and the function won't reverse the input list, you also have the option to return array or seq, that is, you are not confined to a linked list that can only be appended to in O(1) in the head - which in my experience seldom fits well into business case, win win win in my book.
I general, I hope to see F# grow into a more multi-paradigm programming language in the community's mind. It's nice to see these functional solutions, but I fear they scare some people away unnecessarily, as F# is already multi-paradigm!

F#: What to call a combination of map and fold, or of map and reduce?

A simple example, inspired by this question:
module SimpleExample =
let fooFold projection folder state source =
source |> List.map projection |> List.fold folder state
// val fooFold :
// projection:('a -> 'b) ->
// folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c
let fooReduce projection reducer source =
source |> List.map projection |> List.reduce reducer
// val fooReduce :
// projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b
let game = [0, 5; 10, 15]
let minX, maxX = fooReduce fst min game, fooReduce fst max game
let minY, maxY = fooReduce snd min game, fooReduce snd max game
What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.
mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:
val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list
-> 'Result list * 'State
Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.
And MapReduce:
MapReduce is an algorithm for processing huge datasets on certain
kinds of distributable problems using a large number of nodes
Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys.
The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.
module ComplexExample =
let fooFold keySelection folder state source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, Seq.fold folder state xs)
// val fooFold :
// keySelection:('a -> 'b) ->
// folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
let fooReduce keySelection projection reducer source =
source |> Seq.groupBy keySelection
|> Seq.map (fun (k, xs) ->
k, xs |> Seq.map projection |> Seq.reduce reducer)
// val fooReduce :
// keySelection:('a -> 'b) ->
// projection:('a -> 'c) ->
// reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
// when 'b : equality
type Project = { name : string; state : string; cost : decimal }
let projects =
[ { name = "build roads"; state = "CA"; cost = 1000000M }
{ name = "fight crime"; state = "IL"; cost = 100000M }
{ name = "help farmers"; state = "IL"; cost = 2000000M }
{ name = "help farmers"; state = "CA"; cost = 200000M } ]
fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
// val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]
What would be the natural name for the functions fooFold and fooReduce here?
I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.
For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.
More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.
(I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)
I have a different opinion as Thomas.
First; I think that not having overloads is a good thing, and giving every operation unique names is also
something good. I also would say that giving long names to functions rarely used is even more important
and should not be avoided.
Writing longer names is usally never a problem as we as programers usually use an IDE with auto-completion.
But reading and understanding is different. Knowing what a functions does because of a long descriptive name
is better then a short name.
A long descriptive function name gets more important the less often a function is used. It helps reading and
understanding the code. A short and less descriptive function name that is rarely used causes confusion. The
confusion would just increase if it even would be just an overload of another function name.
Yes; naming things can be hard, that's the reason why its important and shoudn't be avoided.
To what you describe. I would have name it mapFold and mapReduce. As those exactly describe what they do.
There is already a mapFold in F#, and in my opinion, the F# devs fucked up either with the naming, arguments or the
output of the function. But anyhow, they just fucked up.
I usually would have expected mapFold to do map and then fold. Actually it does, but it also returns the intermediate
list that is created on the run. Something I would not expect it to return. And i would also expect it to pass two
functions instead of one.
When we get to Thomas suggestion on naming it mapAndFold or mapThenFold. Then i would expect different behaviour
for those two functions. mapThenFold exactly tells what it does. map and then fold on it. I think the then is
not important. That's also why I would name it mapFold or mapReduce. Writing it this way already suggest a then.
But mapAndFold or mapAndReduce does not tell something about the order of execution. It just says it does two things
or somehow returns this AND that.
With that in mind, i would say that the F# library should have named its mapFold either mapAndFold, changed the return
value to just return the fold (and have two arguments instead of one). But hey, its fucked up now, we cannot change it anymore.
As for mapReduce, I think you are a little bit mistaken. The mapReduce algorithm is named that way, because it just does
map and then reduce. And that's it.
But functional programming with its stateless and more descriptive operations sometimes have additional benefits. Technically
a map is less powerful compared to a for/fold as it just describes how values are changed, without that the order matters
or the position in a list. But because of this limitation, you can run it in parallel, even on a big computer cluster. And that's all
what mapReduce Algorithm you cite do.
But that doesn't mean a mapReduce must always run its operation on a big cluster or in parallel. In my opinion you could
just name it mapReduce and that's fine. Everybody will know what it does and I think nobody expect it to suddenly run on
cluster.
In general I think the mapFold that F# provides is silly, here are 4 examples how I think it should have been provided.
let double x = x * 2
let add x y = x + y
mapFold double add 0 [1..10] // 110
mapAndFold double add 0 [1..10] // [2;4;6;8;10;12;14;16;18;20] * 110
mapReduce double add [1..10] // Some (110)
mapAndReduce double add [1..10] // Some ([2;4;6;8;10;12;14;16;18;20] * 110)
Well mapFold doesn't work that way, so you have the following options.
Implement mapReduce the way you have it. And ignore the in-consistency with mapFold.
Provide mapAndReduce and mapReduce.
Make your mapReduce return the same crap as the default implementation of mapFold does and provide mapThenReduce.
Like (3) but also add mapThenFold.
Option 4 has the most compatibility and expectation of what already exists in F#. But that doesn't mean you must do it that way.
In my opinion I would just:
implement mapReduce returning the result of map and then reduce.
I wouldn't care about a mapAndReduce version that returns a list and the result.
Provide a mapThenFold expecting two function arguments returning the result just of fold.
As a general notice: Implementing mapReduce just by calling map and then reduce is somewhat pointless. I would
expect it to have a more low-level implementation that does both things by just traversing the data-structure once.
If not, i just can call map and then reduce anyway.
So an implementation should look like:
let mapReduce mapper reducer xs =
let rec loop state xs =
match xs with
| [] -> state
| x::xs -> loop (reducer state (mapper x)) xs
match xs with
| [] -> ValueNone
| [x] -> ValueSome (mapper x)
| x::xs -> ValueSome (loop (mapper x) xs)
let double x = x * 2
let add x y = x + y
let some110 = mapReduce double add [1..10]

Avoid mutation in this example in F#

Coming from an OO background, I am having trouble wrapping my head around how to solve simple issues with FP when trying to avoid mutation.
let mutable run = true
let player1List = ["he"; "ho"; "ha"]
let addValue lst value =
value :: lst
while run do
let input = Console.ReadLine()
addValue player1List input |> printfn "%A"
if player1List.Length > 5 then
run <- false
printfn "all done" // daz never gunna happen
I know it is ok to use mutation in certain cases, but I am trying to train myself to avoid mutation as the default. With that said, can someone please show me an example of the above w/o using mutation in F#?
The final result should be that player1List continues to grow until the length of items are 6, then exit and print 'all done'
The easiest way is to use recursion
open System
let rec makelist l =
match l |> List.length with
|6 -> printfn "all done"; l
| _ -> makelist ((Console.ReadLine())::l)
makelist []
I also removed some the addValue function as it is far more idiomatic to just use :: in typical F# code.
Your original code also has a common problem for new F# coders that you use run = false when you wanted run <- false. In F#, = is always for comparison. The compiler does actually warn about this.
As others already explained, you can rewrite imperative loops using recursion. This is useful because it is an approach that always works and is quite fundamental to functional programming.
Alternatively, F# provides a rich set of library functions for working with collections, which can actually nicely express the logic that you need. So, you could write something like:
let player1List = ["he"; "ho"; "ha"]
let player2List = Seq.initInfinite (fun _ -> Console.ReadLine())
let listOf6 = Seq.append player1List list2 |> Seq.take 6 |> List.ofSeq
The idea here is that you create an infinite lazy sequence that reads inputs from the console, append it at the end of your initial player1List and then take first 6 elements.
Depending on what your actual logic is, you might do this a bit differently, but the nice thing is that this is probably closer to the logic that you want to implement...
In F#, we use recursion to do loop. However, if you know how many times you need to iterate, you could use F# List.fold like this to hide the recursion implementation.
[1..6] |> List.fold (fun acc _ -> Console.ReadLine()::acc) []
I would remove the pipe from match for readability but use it in the last expression to avoid extra brackets:
open System
let rec makelist l =
match List.length l with
| 6 -> printfn "all done"; l
| _ -> Console.ReadLine()::l |> makelist
makelist []

Return item at position x in a list

I was reading this post While or Tail Recursion in F#, what to use when? were several people say that the 'functional way' of doing things is by using maps/folds and higher order functions instead of recursing and looping.
I have this function that returns the item at position x in a list:
let rec getPos l c = if c = 0 then List.head l else getPos (List.tail l) (c - 1)
how can it be converted to be more functional?
This is a primitive list function (also known as List.nth).
It is okay to use recursion, especially when creating the basic building blocks. Although it would be nicer with pattern matching instead of if-else, like this:
let rec getPos l c =
match l with
| h::_ when c = 0 -> h
| _::t -> getPos t (c-1)
| [] -> failwith "list too short"
It is possible to express this function with List.fold, however the result is less clear than the recursive version.
I'm not sure what you mean by more functional.
Are you rolling this yourself as a learning exercise?
If not, you could just try this:
> let mylist = [1;2;3;4];;
> let n = 2;;
> mylist.[n];;
Your definition is already pretty functional since it uses a tail-recursive function instead of an imperative loop construct. However, it also looks like something a Scheme programmer might have written because you're using head and tail.
I suspect you're really asking how to write it in a more idiomatic ML style. The answer is to use pattern matching:
let rec getPos list n =
match list with
| hd::tl ->
if n = 0 then hd
else getPos tl (n - 1)
| [] -> failWith "Index out of range."
The recursion on the structure of the list is now revealed in the code. You also get a warning if the pattern matching is non-exhaustive so you're forced to deal with the index too big error.
You're right that functional programming also encourages the use of combinators like map or fold (so called points-free style). But too much of it just leads to unreadable code. I don't think it's warranted in this case.
Of course, Benjol is right, in practice you would just write mylist.[n].
If you'd like to use high-order functions for this, you could do:
let nth n = Seq.take (n+1) >> Seq.fold (fun _ x -> Some x) None
let nth n = Seq.take (n+1) >> Seq.reduce (fun _ x -> x)
But the idea is really to have basic constructions and combine them build whatever you want. Getting the nth element of a sequence is clearly a basic block that you should use. If you want the nth item, as Benjol mentioned, do myList.[n].
For building basic constructions, there's nothing wrong to use recursion or mutable loops (and often, you have to do it this way).
Not as a practical solution, but as an exercise, here is one of the ways to express nth via foldr or, in F# terms, List.foldBack:
let myNth n xs =
let step e f = function |0 -> e |n -> f (n-1)
let error _ = failwith "List is too short"
List.foldBack step xs error n

cons operator (::) in F#

The :: operator in F# always prepends elements to the list. Is there an operator that appends to the list? I'm guessing that using # operator
[1; 2; 3] # [4]
would be less efficient, than appending one element.
As others said, there is no such operator, because it wouldn't make much sense. I actually think that this is a good thing, because it makes it easier to realize that the operation will not be efficient. In practice, you shouldn't need the operator - there is usually a better way to write the same thing.
Typical scenario: I think that the typical scenario where you could think that you need to append elements to the end is so common that it may be useful to describe it.
Adding elements to the end seems necessary when you're writing a tail-recursive version of a function using the accumulator parameter. For example a (inefficient) implementation of filter function for lists would look like this:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> acc
| x::xs when f x -> filterUtil (acc # [x]) xs
| x::xs -> filterUtil acc xs
filterUtil [] l
In each step, we need to append one element to the accumulator (which stores elements to be returned as the result). This code can be easily modified to use the :: operator instead of appending elements to the end of the acc list:
let filter f l =
let rec filterUtil acc l =
match l with
| [] -> List.rev acc // (1)
| x::xs when f x -> filterUtil (x::acc) xs // (2)
| x::xs -> filterUtil acc xs
filterUtil [] l
In (2), we're now adding elements to the front of the accumulator and when the function is about to return the result, we reverse the list (1), which is a lot more efficient than appending elements one by one.
Lists in F# are singly-linked and immutable. This means consing onto the front is O(1) (create an element and have it point to an existing list), whereas snocing onto the back is O(N) (as the entire list must be replicated; you can't change the existing final pointer, you must create a whole new list).
If you do need to "append one element to the back", then e.g.
l # [42]
is the way to do it, but this is a code smell.
The cost of appending two standard lists is proportional to the length of the list on the left. In particular, the cost of
xs # [x]
is proportional to the length of xs—it is not a constant cost.
If you want a list-like abstraction with a constant-time append, you can use John Hughes's function representation, which I'll call hlist. I'll try to use OCaml syntax, which I hope is close enough to F#:
type 'a hlist = 'a list -> 'a list (* a John Hughes list *)
let empty : 'a hlist = let id xs = xs in id
let append xs ys = fun tail -> xs (ys tail)
let singleton x = fun tail -> x :: tail
let cons x xs = append (singleton x) xs
let snoc xs x = append xs (singleton x)
let to_list : 'a hlist -> 'a list = fun xs -> xs []
The idea is that you represent a list functionally as a function from "the rest of the elements" to "the final list". This works great if you are going to build up the whole list before you look at any of the elements. Otherwise you'll have to deal with the linear cost of append or use another data structure entirely.
I'm guessing that using # operator [...] would be less efficient, than appending one element.
If it is, it will be a negligible difference. Both appending a single item and concatenating a list to the end are O(n) operations. As a matter of fact I can't think of a single thing that # has to do, which a single-item append function wouldn't.
Maybe you want to use another data structure. We have double-ended queues (or short "Deques") in fsharpx. You can read more about them at http://jackfoxy.com/double-ended-queues-for-fsharp
The efficiency (or lack of) comes from iterating through the list to find the final element. So declaring a new list with [4] is going to be negligible for all but the most trivial scenarios.
Try using a double-ended queue instead of list. I recently added 4 versions of deques (Okasaki's spelling) to FSharpx.Core (Available through NuGet. Source code at FSharpx.Core.Datastructures). See my article about using dequeus Double-ended queues for F#
I've suggested to the F# team the cons operator, ::, and the active pattern discriminator be made available for other data structures with a head/tail signature.3

Resources