Itertools for F# - f#

I'm used to Python's itertools for doing functional things with iterators (F#: sequences) and wondered if there were equivalents in F# or a commonly used library since they're so handy.
The top tools for me are:
product : cartesian product, equivalent to a nested for-loop
combinations
permutations
takewhile
dropwhile
chain : chain multiple iterators together into a new longer iterator
repeat* : repeat(5) -> 5, 5, 5...
count* : count(10) -> 10, 11, 12...
cycle* : cycle([1,2,3]) -> 1,2,3,1,2...
* I suppose these 3 would yield monads in F#? How do you make them infinite?
I'm prompted to ask because I saw this question on permutations in F# and was surprised it was not part of a library or built into the language.

I don't know if there's a commonly used library that contains functions like product, combinations and permutations, but the others you've mentioned are already in Seq and List modules or can be implemented without much trouble, and there are also useful methods in System.Linq.Enumerable.
takewhile -> Seq.takeWhile
dropwhile -> Seq.skipWhile
chain -> Seq.concat
repeat -> Seq.initInfinite
count(10) -> Seq.initInfinite ((+) 10)
cycle([1, 2, 3]) -> Seq.concat <| Seq.initInfinite (fun _ -> [1; 2; 3])
You also might want to check out the excellent FSharpx library -- it contains a lot of useful functions to work with collections and whatnot.

For a cartesian product in F# (avoiding a nested for loop ^_^), you need to use List.allPairs list1 list2 :
https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-listmodule.html#allPairs

Related

How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.
I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.
This was myfirst attempt:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.
So I looked up what F#'s very own map does, which was:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail stuff is also in this file:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).
Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?
In most cases, using List.rev is a perfectly sufficient solution.
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.
Running your function with the following:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.
For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

Why is the signature of foldBack so much different from fold in F#?

There are at least 2 things I don't understand about it:
refactoring from left side to right side folding requires a lot of changes not only in signature but in every place depended on the folder function
there is no way to chain it with regard to the list without flipping the parameters
List.foldBack : ('T -> 'State -> 'State) -> 'T list -> 'State -> 'State
List.fold : ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State
Any good reason for why would someone put all parameters in reverse in the signature of foldBack compared to fold?
It's just a useful mnemonic to help the programmer remember how the list is iterated. Imagine your list is laid out with the beginning on the left and the end on the right. fold starts with an initial state on the left and accumulates state going to right. foldBack does the opposite, it starts with an initial state on the right and goes back over the list to the left.
This is definitely showing F#'s OCaml heritage as some other functional languages (Haskell, Scala, ML) keep the list as the last argument to allow for the more common partial application scenarios.
If I really needed a version of foldBack that looked exactly like fold, I would define my own helper function:
module List =
let foldBack' f acc lst =
let flip f a b = f b a
List.foldBack (flip f) lst acc
It's a relic of F#'s beginnings in OCaml. You can see that the F# function signatures for List.fold and List.foldBack are the same in the OCaml documentation (where they are called List.fold_left and List.fold_right, respectively).

'MaxDegreeOfParallelism' for Array.Parallel?

Is it possible to set 'MaxDegreeOfParallelism' (that is maximum number of threads to use) for Array.Parallel module since under the hood it uses Parallel.For?
According to this post, it seems that there is no way to limit the number of threads globally in the final version of Parallel Extensions. An alternative to what brian suggests would be to use PLINQ (which works with parallel sequences) instead of functions that work with arrays.
This can be done using the PSeq module from F# PowerPack. It provides functions such as PSeq.map, PSeq.filter and many other that work with parallel sequences (which can be also nicely composed using pipelining). For parallel sequences, you can use the WithDegreeOfParallelism extension method to specify the behavior.
You could implement a wrapper function for it:
[EDIT: It is already there!]
let withDegreeOfParallelism n (pq:ParallelQuery<_>) =
pq.WithDegreeOfParallelsm(n)
And then write:
let res =
data |> PSeq.map (fun n -> ...)
|> PSeq.withDegreeOfParallelism ParallelOptions.MaxDegreeOfParallelism
|> Array.ofSeq
This may have different perfromance, because it is implemented differently than functions in the Array.Parallel module, but this certainly depends on your scenario.
No, I don't think so.
You can always create your own versions of any of the methods in the Array.Parallel module, using the source code from array.fs (in the CTP release) as a starter.
Assuming I want say at most 10 threads I've been replacing:
myArray
|> Array.Parallel.iter (fun item -> doWork item)
with
let maxPara = 10
myArray
|> Array.splitInto maxPara
|> Array.Parallel.iter (fun items -> items |> List.iter (fun item -> doWork item))

What is the name of |> in F# and what does it do?

A real F# noob question, but what is |> called and what does it do?
It's called the forward pipe operator. It pipes the result of one function to another.
The Forward pipe operator is simply defined as:
let (|>) x f = f x
And has a type signature:
'a -> ('a -> 'b) -> 'b
Which resolves to: given a generic type 'a, and a function which takes an 'a and returns a 'b, then return the application of the function on the input.
You can read more detail about how it works in an article here.
I usually refer to |> as the pipelining operator, but I'm not sure whether the official name is pipe operator or pipelining operator (though it probably doesn't really matter as the names are similar enough to avoid confusion :-)).
#LBushkin already gave a great answer, so I'll just add a couple of observations that may be also interesting. Obviously, the pipelining operator got it's name because it can be used for creating a pipeline that processes some data in several steps. The typical use is when working with lists:
[0 .. 10]
|> List.filter (fun n -> n % 3 = 0) // Get numbers divisible by three
|> List.map (fun n -> n * n) // Calculate squared of such numbers
This gives the result [0; 9; 36; 81]. Also, the operator is left-associative which means that the expression input |> f |> g is interpreted as (input |> f) |> g, which makes it possible to sequence multiple operations using |>.
Finally, I find it quite interesting that pipelining operaor in many cases corresponds to method chaining from object-oriented langauges. For example, the previous list processing example would look like this in C#:
Enumerable.Range(0, 10)
.Where(n => n % 3 == 0) // Get numbers divisible by three
.Select(n => n * n) // Calculate squared of such numbers
This may give you some idea about when the operator can be used if you're comming fromt the object-oriented background (although it is used in many other situations in F#).
As far as F# itself is concerned, the name is op_PipeRight (although no human would call it that). I pronounce it "pipe", like the unix shell pipe.
The spec is useful for figuring out these kinds of things. Section 4.1 has the operator names.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/spec.html
Don't forget to check out the library reference docs:
http://msdn.microsoft.com/en-us/library/ee353754(v=VS.100).aspx
which list the operators.

While or Tail Recursion in F#, what to use when?

Ok, only just in F# and this is how I understand it now :
Some problems are recursive in nature (building or reading out a treestructure to name just one) and then you use recursion. In these cases you preferably use tail-recursion to give the stack a break
Some languagues are pure functional, so you have to use recursion in stead of while-loops, even if the problem is not recursive in nature
So my question : since F# also support the imperative paradigm, would you use tail recursion in F# for problems that aren't naturally recursive ones? Especially since I have read the compiler recongnizes tail recursion and just transforms it in a while loop anyway?
If so : why ?
The best answer is 'neither'. :)
There's some ugliness associated with both while loops and tail recursion.
While loops require mutability and effects, and though I have nothing against using these in moderation, especially when encapsulated in the context of a local function, you do sometimes feel like you're cluttering/uglifying your program when you start introducing effects merely to loop.
Tail recursion often has the disadvantage of requiring an extra accumulator parameter or continuation-passing style. This clutters the program with extra boilerplate to massage the startup conditions of the function.
The best answer is to use neither while loops nor recursion. Higher-order functions and lambdas are your saviors here, especially maps and folds. Why fool around with messy control structures for looping when you can encapsulate those in reusable libraries and then just state the essence of your computation simply and declaratively?
If you get in the habit of often calling map/fold rather than using loops/recursion, as well as providing a fold function along with any new tree-structured data type you introduce, you'll go far. :)
For those interested in learning more about folds in F#, why not check out my first three blog posts in a series on the topic?
In order of preference and general programming style, I will write code as follows:
Map/fold if its available
let x = [1 .. 10] |> List.map ((*) 2)
Its just convenient and easy to use.
Non-tail recursive function
> let rec map f = function
| x::xs -> f x::map f xs
| [] -> [];;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Most algorithms are easiest to read and express without tail-recursion. This works particularly well when you don't need to recurse too deeply, making it suitable for many sorting algorithms and most operations on balanced data structures.
Remember, log2(1,000,000,000,000,000) ~= 50, so log(n) operation without tail-recursion isn't scary at all.
Tail-recursive with accumulator
> let rev l =
let rec loop acc = function
| [] -> acc
| x::xs -> loop (x::acc) xs
loop [] l
let map f l =
let rec loop acc = function
| [] -> rev acc
| x::xs -> loop (f x::acc) xs
loop [] l;;
val rev : 'a list -> 'a list
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
It works, but the code is clumsy and elegance of the algorithm is slightly obscured. The example above isn't too bad to read, but once you get into tree-like data structures, it really starts to become a nightmare.
Tail-recursive with continuation passing
> let rec map cont f = function
| [] -> cont []
| x::xs -> map (fun l -> cont <| f x::l) f xs;;
val map : ('a list -> 'b) -> ('c -> 'a) -> 'c list -> 'b
> [1 .. 10] |> map id ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Whenever I see code like this, I say to myself "now that's a neat trick!". At the cost of readability, it maintains the shape of the non-recursive function, and found it really interesting for tail-recursive inserts into binary trees.
Its probably my monad-phobia speaking here, or maybe my inherent lack of familiarity with Lisp's call/cc, but I think those occasions when CSP actually simplifies algorithms are few and far between. Counter-examples are welcome in the comments.
While loops / for loops
It occurs to me that, aside from sequence comprehensions, I've never used while or for loops in my F# code. In any case...
> let map f l =
let l' = ref l
let acc = ref []
while not <| List.isEmpty !l' do
acc := (!l' |> List.hd |> f)::!acc
l' := !l' |> List.tl
!acc |> List.rev;;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Its practically a parody of imperative programming. You might be able to maintain a little sanity by declaring let mutable l' = l instead, but any non-trivial function will require the use of ref.
Honestly, any problem that you can solve with a loop is already a naturally recursive one, as you can translate both into (usually conditional) jumps in the end.
I believe you should stick with tail calls in almost all cases where you must write an explicit loop. It is just more versatile:
A while loop restricts you to one loop body, while a tail call can allow you to switch between many different states while the "loop" is running.
A while loop restricts you to one condition to check for termination, with the tail recursion you can have an arbitrarily complicated match expression waiting to shunt the control flow off somewhere else.
Your tail calls all return useful values and can produce useful type errors. A while loop does not return anything and relies on side effects to do its work.
While loops are not first class while functions with tail calls (or while loops in them) are. Recursive bindings in local scope can be inspected for their type.
A tail recursive function can easily be broken apart into parts that use tail calls to call each other in the needed sequence. This may make things easier to read, and will help if you find you want to start in the middle of a loop. This is not true of a while loop.
All in all while loops in F# are only worthwhile if you really are going to be working with mutable state, inside a function body, doing the same thing repeatedly until a certain condition is met. If the loop is generally useful or very complicated, you may want to factor it out into some other top level binding. If the data types are themselves immutable (a lot of .NET value types are), you may gain very little from using mutable references to them anyway.
I'd say that you should only resort to while loops for niche cases where a while loop is perfect for the job, and is relatively short. In many imperative programming languages, while loops are often twisted into unnatural roles like driving stuff repeatedly over a case statement. Avoid those kinds of things, and see if you can use tail calls or, even better, a higher order function, to achieve the same ends.
Many problems have a recursive nature, but having thought imperatively for a long time often prevents us from seeing this.
In general I would use a functional technique wherever possible in a functional language - Loops are never functional since they exclusively rely on side-effects. So when dealing with imperative code or algorithms, using loops is adequate, but in functional context, they're aren't considered very nice.
Functional technique doesn't only mean recursion but also using appropriate higher-order functions.
So when summing a list, neither a for-loop nor a recursive function but a fold is the solution for having comprehensible code without reinventing the wheel.
for problems that aren't naturally recursive ones
..
just transforms it in a while loop anyway
You answered this yourself.
Use recursion for recursive problems and loop for things that aren't functional in nature.
Just always think: Which feels more natural, which is more readable.

Resources