F#: generated IL code for seq{} vs other computational workflows - f#

When I compare IL code that F# generates for seq{} expressions vs that for user-defined computational workflows, it's quite obvious that seq{} is implemented very differently: it generates a state machine similar to the once C# uses for its' iterator methods. User-defined workflows, on the other hand, use the corresponding builder object as you'd expect.
So I am wondering - why the difference?
Is this for historical reasons, e.g. "seq was there before workflows"?
Or, is there significant performance to be gained?
Some other reason?

This is an optimization performed by the F# compiler. As far as I know, it has actually been implemented later - F# compiler first had list comprehensions, then a general-purpose version of computation expressions (also used for seq { ... }) but that was less efficient, so the optimization was added in some later version.
The main reason is that this removes many allocations and indirections. Let's say you have something like:
seq { for i in input do
yield i
yield i * 10 }
When using computation expressions, this gets translated to something like:
seq.Delay(fun () -> seq.For(input, fun i ->
seq.Combine(seq.Yield(i), seq.Delay(fun () -> seq.Yield(i * 10)))))
There is a couple of function allocations and the For loop always needs to invoke the lambda function. The optimization turns this into a state machine (similar to the C# state machine), so the MoveNext() operation on the generated enumerator just mutates some state of the class and then returns...
You can easily compare the performance by defining a custom computation builder for sequences:
type MSeqBuilder() =
member x.For(en, f) = Seq.collect f en
member x.Yield(v) = Seq.singleton v
member x.Delay(f) = Seq.delay f
member x.Combine(a, b) = Seq.concat [a; b]
let mseq = MSeqBuilder()
let input = [| 1 .. 100 |]
Now we can test this (using #time in F# interactive):
for i in 0 .. 10000 do
mseq { for x in input do
yield x
yield x * 10 }
|> Seq.length |> ignore
On my computer, this takes 2.644sec when using the custom mseq builder but only 0.065sec when using the built-in optimized seq expression. So the optimization makes sequence expressions significantly more efficient.

Historically, computations expressions ("workflows") were a generalization of sequence expressions: http://blogs.msdn.com/b/dsyme/archive/2007/09/22/some-details-on-f-computation-expressions-aka-monadic-or-workflow-syntax.aspx.
But, the answer is certainly that there is significant performance to be gained. I can't turn up any solid links (though there is a mention of "optimizations related to 'when' filters in sequence expressions" in http://blogs.msdn.com/b/dsyme/archive/2007/11/30/full-release-notes-for-f-1-9-3-7.aspx), but I do recall that this was an optimization that made its way in at some point in time. I'd like to say that the benefit is self-evident: sequence expressions are a "core" language feature and deserving of any optimizations that can be made.
Similarly, you'll see that certain tail-recursive functions will be optimized in to loops, rather than tail calls.

Related

This prime factorization code works for small numbers but fails with an OutOfMemoryException for large numbers?

I'm trying to get the prime factors for a large number..
let factors (x:int64) =
[1L..x]
|> Seq.filter(fun n -> x%n = 0L)
let isPrime (x:int64) =
factors x
|> Seq.length = 2
let primeFactors (x:int64)=
factors x
|> Seq.filter isPrime
This works for say 13195 but fails with an OutOfMemoryException for 600851475143?
Sorry if i'm missing something obvious, it's only my third day on F# and I didn't know what a prime factor was until this morning.
The expresion [1L..x] creates a list, which in your example gets too large to be stored in memory.
Sequences in contrast are lazy, so if used with care you can avoid computing the whole intermediate list. Your code already uses sequences but as said before it begins with a list, to avoid converting from a list you can use curly brackets: {1L..x}
Using sequence expressions is another option:
let factors (x:int64) = seq {
for i = 1L to x do
if x%i = 0L then yield i}
Having solved the OutOfMemoryException problem your prime function is very slow, you can optimise it as suggested in the comments by returning false immediately after finding a divisor between 1 and its square root. Further optimisations may be achieved by dividing the number by the prime factors as you find them and using a sieve for the primes, you can also have a look at some efficient algorithms here.
The expression [...] creates a list of the items specified. In F#, a List can be defined something like this:
type List<'t> =
| empty
| item of 't * List<'t>
As an example, `[1..5]' would become a structure looking like this:
item(1, item(2, item(3, item(4, item(5, empty)))))
As you can see, this will not be a problem for small numbers of items, but for larger numbers of items this will eventually use up all the available memory and cause an OutOfMemoryExcepion. As Gustavo mentioned, to avoid this, you can use a sequence, which will create each item on demand rather than all at the beginning. This reduces the number of things in memory at one time and thus avoids an OutOfMemoryException.
Since you're already using the Seq module instead of the List module (i.e. Seq.filter vs List.filter etc), you can simply use a sequence instead of a list which would look like this: {1L..x}.

Choosing between continuation passing style and memoization

While writing up examples for memoization and continuation passing style (CPS) functions in a functional language, I ended up using the Fibonacci example for both. However, Fibonacci doesn't really benefit from CPS, as the loop still has to run exponentially often, while with memoization its only O(n) the first time and O(1) every following time.
Combining both CPS and memoization has a slight benefit for Fibonacci, but are there examples around where CPS is the best way that prevents you from running out of stack and improves performance and where memoization is not a solution?
Or: is there any guideline for when to choose one over the other or both?
On CPS
While CPS is useful as an intermediate language in a compiler, on the source language level it is mostly a device to (1) encode sophisticated control flow (not really performance-related) and (2) transform a non-tail-call consuming stack space into a continuation-allocating tail-call consuming heap space. For example when you write (code untested)
let rec fib = function
| 0 | 1 -> 1
| n -> fib (n-1) + fib (n-2)
let rec fib_cps n k = match n with
| 0 | 1 -> k 1
| n -> fib_cps (n-1) (fun a -> fib_cps (n-2) (fun b -> k (a+b)))
The previous non-tail-call fib (n-2), which allocated a new stack frame, is turned into the tail-call fib (n-2) (fun b -> k (a+b)) which allocates the closure fun b -> k (a+b) (on the heap) to pass it as argument.
This does not asymptotically reduce the memory usage of your program (some further domain-specific optimizations might, but that's another story). You're just trading stack space for heap space, which is interesting on systems where stack space is severely limited by the OS (not the case with some implementations of ML such as SML/NJ, which track their call stack on the heap instead of using the system stack), and potentially performance-degrading because of the additional GC costs and potential locality decrease.
CPS transformation is unlikely to improve performances much (though details of your implementation and runtime systems might make it so), and is a generally applicable transformation that allows to avoid the snark "Stack Overflow" of recursive functions with a deep call stack.
On Memoization
Memoization is useful to introduce sharing of subcalls of recursive functions. A recursive function typically solves a "problem" ("compute fibonacci of n", etc.) by decomposing it into several strictly simpler "sub-problems" (the recursive subcalls), with some base cases for which the problem is solvable right away.
For any recursive function (or recursive formulation of a problem), you can observe the structure of the subproblem space. Which simpler instances of Fib(k) will Fib(n) need to return its result? Which simpler instances will those instances in turn need?
In the general case, this space of subproblem is a graph (generally acyclic for termination purposes): there are some nodes that have several parents, that is several distinct problems for which they are subproblems. For example, Fib(n-2) is a subproblem both of Fib(n) and Fib(n-2). The amount of node sharing in this graph depends on the particular problem / recursive functions. In the case of Fibonacci, all nodes are shared between two parents, so there is a lot of sharing.
A direct recursive call without memoization will not be able observe this sharing. The structure of the calls of a recursive function is a tree, not a graph, and shared subproblems such as Fib(n-2) will be fully visited several times (as many as there are paths from the starting node to the subproblem node in the graph). Memoization introduces sharing by letting some subcalls return directly with "we've already computed this node and here is the result". For problems that have a lot of sharing, this can result in dramatic reduction of (useless) computation: Fibonacci moves from exponential to linear complexity when memoization is introduced -- note that there are other ways to write the function, without using memoization but a different subcalls structure, to have a linear complexity.
let rec fib_pair = function
| 0 -> (1,1)
| n -> let (u,v) = fib_pair (n-1) in (v,u+v)
The technique of using some form of sharing (usually through large tables storing the results) to avoid useless duplication of subcomputations is well-known in the algorithmic community, it is called Dynamic Programming. When you recognize that a problem is amenable to this treatment (you notice the sharing among the subproblems), this can provide large performance benefits.
Does a comparison make sense?
The two seems mostly independent of each other.
There are a lot of problems where memoization is not applicable, because the subproblem graph structure does not have any sharing (it is a tree). On the contrary, CPS transformation is applicable for all recursive functions, but does not by itself lead to performance benefits (other than potential constant factors due to the particular implementation and runtime system you're using, though they're likely to make the code slower rather than faster).
Improving performances by inspecting non-tail contexts
There are optimizations technique that is related to CPS that can improve performance of recursive functions. They consist in looking at the computations "left to be done" after the recursive calls (what would be turned into a function in straightforward CPS style) and finding an alternate, more efficient representation for it, that does not result in systematic closure allocation. Consider for example:
let rec length = function
| [] -> 0
| _::t -> 1 + length t
let rec length_cps li k = match li with
| [] -> k 0
| _::t -> length_cps t (fun a -> k (a + 1))
You can notice that the context of the non-recursive call, namely [_ + 1], has a simple structure: it adds an integer. Instead of representing this as a function fun a -> k (a+1), you may just store the integer to be added corresponding to several application of this function, making k an integer instead of a function.
let rec length_acc li k = match li with
| [] -> k + 0
| _::t -> length_acc t (k + 1)
This function runs in constant, rather than linear, space. By turning the representation of the tail contexts from functions to integers, we have eliminated the allocation step that made memory usage linear.
Close inspection of the order in which the additions are performed will reveal that they are now performed in a different direction: we are adding the 1's corresponding to the beginning of the list first, while the cps version was adding them last. This order-reversal is valid because _ + 1 is an associative operation (if you have several nested contexts, foo + 1 + 1 + 1, it is valid to start compute them either from the inside, ((foo+1)+1)+1, or the outside, foo+(1+(1+1))). The optimization above can be used for all such "associative" contexts around a non-tail-call.
There are certainly other optimizations available based on the same idea (I'm no expert on such optimizations), which is to look at the structure of the continuations involved and represent them under a more efficient form than functions allocated on the heap.
This is related to the transformation of "defunctionalization" that changes the representation of continuations from functions to data-structures, without changing the memory consumption (a defunctionalized program will allocate a data node when a closure would have been allocated in the original program), but allows to express the tail-recursive CPS version in a first-order language (without first-class functions) and can be more efficient on systems where data structures and pattern-matching is more efficient than closure allocation and indirect calls.
type length_cont =
| Linit
| Lcons of length_cont
let rec length_cps_defun li k = match li with
| [] -> length_cont_eval 0 k
| _::t -> length_cps_defun t (Lcons k)
and length_cont_eval acc = function
| Linit -> acc
| Lcons k -> length_cont_eval (acc+1) k
let length li = length_cps_defun li Linit
type fib_cont =
| Finit
| Fminus1 of int * fib_cont
| Fminus2 of fib_cont * int
let rec fib_cps_defun n k = match n with
| 0 | 1 -> fib_cont_eval 1 k
| n -> fib_cps_defun (n-1) (Fminus1 (n, k))
and fib_cont_eval acc = function
| Finit -> acc
| Fminus1 (n, k) -> fib_cps_defun (n-2) (Fminus2 (k, acc))
| Fminus2 (k, acc') -> fib_cont_eval (acc+acc') k
let fib n = fib_cps_defun n Finit
One benefit of CPS is error handling. If you need to fail you just call your failure method.
I think the biggest situation is when you are not talking about calculations, where memoization is great. If you are instead talking about IO or other operations, the benefits of CPS are there but memoization doesn't work.
As to an instance where CPS and memoization are both applicable and CPS is better, I am not sure since I consider them different pieces of functionality.
Finally CPS is a bit diminished in F# since tail recursion makes the whole stack overflow thing a non-issue already.

Is F#'s implementation of monads unique with respect to the amount of keywords available to it?

I only know F#. I haven't learned the other functional programming languages. All the examples that I have seen for monads only describe the bind and unit methods. F# has lots of keywords (e.g. let!, do!, etc.) that allow you to do different things within the same computational expression. This seemingly gives you more power than your basic bind and unit methods. Is this unique to F# or is it common across functional programming languages?
Yes, I think that the F# syntax for computation expressions is unique in that it provides direct syntactic support for different types of computations. It can be used for working with monoids, usual monads and also MonadPlus computations from Haskell.
I wrote about these in the introduction of my Master thesis. I believe it is quite readable part, so you can go to page 27 to read it. Anyway, I'll copy the examples here:
Monoid is used just for concatenating values using some "+" operation (Combine). You can use it for example for building strings (this is inefficient, but it demonstrates the idea):
type StringMonoid() =
member x.Combine(s1, s2) = String.Concat(s1, s2)
member x.Zero() = ""
member x.Yield(s) = s
let str = new StringMonoid()
let hello = str { yield "Hello "
yield "world!" };;
Monads are the familiar example that uses bind and return operations of comptuation expressions. For example maybe monad represents computations that can fail at any point:
type MaybeMonad() =
member x.Bind(m, f) =
match m with Some(v) -> f v | None -> None
member x.Return(v) = Some(v)
let maybe = new MaybeMonad()
let rec productNameByID() = maybe {
let! id = tryReadNumber()
let! prod = db.TryFindProduct(id)
return prod.Name }
Additive monads (aka MonadPlus in Haskell) is a combination of the two. It is a bit like monadic computation that can produce multiple values. A common example is list (or sequence), which can implement both bind and combine:
type ListMonadPlus() =
member x.Zero() = []
member x.Yield(v) = [v]
member x.Combine(a, b) = a # b
member x.Bind(l, f) = l |> List.map f |> List.concat
let list = new ListMonadPlus()
let cities = list {
yield "York"
yield "Orleans" }
let moreCities = list {
let! n = cities
yield n
yield "New " + n }
// Creates: [ "York"; "New York"; "Orleans"; "New Orleans" ]
There are some additional keywords that do not directly correspond to any theoretical idea. The use keyword deals with resources and for and while can be used to implement looping. The sequence/list comprehension actually use for instead of let!, because that makes much more sense from the syntactic point of view (and for usually takes some sequence - although it may be e.g. asynchronous).
Monads are defined in terms of bind and unit operations (only). There are other structures which are defined by other operations (e.g. in Haskell, the MonadPlus typeclass has zero and plus operations - these correspond to Zero and Combine in F# computation expressions). As far as I know, F#'s computation builders are unique in terms of providing nice syntax for the wide range of operations that they support, but most of the operations are unrelated to monads.
F# binding forms ending in ! denote computation expressions, including let! use! do! yield! return!.
let! pat = expr in comp-expr -- binding computation
do! expr in comp-expr -- sequential computation
use! pat = expr in comp-expr -- auto cleanup computation
yield! expr -- yield computation
return! expr -- return computation
Computation expressions are used "for sequences and other non-standard interpretations of the F# expression syntax". These syntax forms offer ways to overload that syntax, for example, to encode monadic computations, or monoidal computations, and appear to be similar to e.g. the do-notation of Haskell, and corresponding (non-magic) bindings forms in that language.
So I would say that they support some overloading of syntax to support other interpretations of the expression syntax of the language, and this they have in common with many languages, including Haskell and OCaml. It is certainly a powerful and useful language feature.
References: The F# 2.0 Language Specification.
(Recall from memory, I may be off.)
While I think unit and bind are the typical basis for monads, I think maybe map and join for a different basis that I've seen in academic papers. This is kinda like how LINQ works in C# and VB, where the various from syntax desugars into Select or SelectMany which are similar to map and join. LINQ also has some 'extra' keywords, a little like F# though more ad-hoc (and mostly suited to querying enumerations/databases).
I don't know offhand of other functional languages like F# that effectively "lift" most of the control flow and other syntax into monads (well, "computation expressions", which may or may not be monads).

Seq.iter vs for - what difference?

I can do
for event in linq.Deltas do
or I can do
linq.Deltas |> Seq.iter(fun event ->
So I'm not sure if that is the same. If that is not the same I want to know the difference. I don't know what to use: iter or for.
added - so if that is the matter of choice I prefer to use iter on a top level and for is for closures
added some later - looking like iter is map + ignore - it's the way to run from using imperative ignore word. So it's looking like functional way ...
As others mentioned, there are some differences (iter supports non-generic IEnumerator and you can mutate mutable values in for). These are sometimes important differences, but most of the times you can freely choose which one to use.
I generally prefer for (if there is a language construct, why not use it?). The cases where iter looks nicer are when you have a function that you need to call (e.g. using partial application):
// I would write this:
strings |> Seq.iter (printfn "%c")
// instead of:
for s in strings do printfn "%c" s
Similarly, using iter is nicer if you use it at the end of some processing pipeline:
// I would write this:
inputs |> Seq.filter (fun x -> x > 0)
|> Seq.iter (fun x -> foo x)
// instead of:
let filtered = inputs |> Seq.filter (fun x -> x > 0)
for x in filtered do foo x
You can modify mutable variables from the body of a for loop. You can't do that from a closure, which implies you can't do that using iter. (Note: I'm talking about mutable variables declared outside of the for / iter. Local mutable variables are accessible.)
Considering that the point of iter is to perform some side effect, the difference can be important.
I personally seldom use iter, as I find for to be clearer.
For most of the situations, they are the same. I would prefer the first use. It looks clear to me.
The difference is that for in loop support IEnumerable objects, while Seq.iter requires that your collection (linq.deltas) is IEnumerable<T>.
E.g. MatchCollection class in .net regular expression inherits IEnumerable not IEnumerable<T>, you cannot use Seq.map or Seq.iter directly on it. But you can use for in loop.
It is the style of programming. Imperative vs using functional programming. Keep in mind that F# is not a pure functional programming language.
Generally, use Seq.Iter if it is a part of some large pipeline processing, as that makes it much more clearer, but for ordinary case I think the imperative way is clearer. Sometime it is a personal preference, sometimes it is other issues like performance.
for in F# is a form of list comprehension - bread and butter of functional programming while Seq.iter is a 'for side-effects only' imperative construct - not a sign of a functional code. Here what you can do with for:
let pairsTo n = seq {
for i in [1..n] do
for j in [i..n] do
if j%i <> 0 then yield (i,j) }
printf "%A" (pairsTo 10 |> Seq.toList)

How do I know if a function is tail recursive in F#

I wrote the follwing function:
let str2lst str =
let rec f s acc =
match s with
| "" -> acc
| _ -> f (s.Substring 1) (s.[0]::acc)
f str []
How can I know if the F# compiler turned it into a loop? Is there a way to find out without using Reflector (I have no experience with Reflector and I Don't know C#)?
Edit: Also, is it possible to write a tail recursive function without using an inner function, or is it necessary for the loop to reside in?
Also, Is there a function in F# std lib to run a given function a number of times, each time giving it the last output as input? Lets say I have a string, I want to run a function over the string then run it again over the resultant string and so on...
Unfortunately there is no trivial way.
It is not too hard to read the source code and use the types and determine whether something is a tail call by inspection (is it 'the last thing', and not in a 'try' block), but people second-guess themselves and make mistakes. There's no simple automated way (other than e.g. inspecting the generated code).
Of course, you can just try your function on a large piece of test data and see if it blows up or not.
The F# compiler will generate .tail IL instructions for all tail calls (unless the compiler flags to turn them off is used - used for when you want to keep stack frames for debugging), with the exception that directly tail-recursive functions will be optimized into loops. (EDIT: I think nowadays the F# compiler also fails to emit .tail in cases where it can prove there are no recursive loops through this call site; this is an optimization given that the .tail opcode is a little slower on many platforms.)
'tailcall' is a reserved keyword, with the idea that a future version of F# may allow you to write e.g.
tailcall func args
and then get a warning/error if it's not a tail call.
Only functions that are not naturally tail-recursive (and thus need an extra accumulator parameter) will 'force' you into the 'inner function' idiom.
Here's a code sample of what you asked:
let rec nTimes n f x =
if n = 0 then
x
else
nTimes (n-1) f (f x)
let r = nTimes 3 (fun s -> s ^ " is a rose") "A rose"
printfn "%s" r
I like the rule of thumb Paul Graham formulates in On Lisp: if there is work left to do, e.g. manipulating the recursive call output, then the call is not tail recursive.

Resources