I have the following code to perform the Sieve of Eratosthenes in F#:
let sieveOutPrime p numbers =
numbers
|> Seq.filter (fun n -> n % p <> 0)
let primesLessThan n =
let removeFirstPrime = function
| s when Seq.isEmpty s -> None
| s -> Some(Seq.head s, sieveOutPrime (Seq.head s) (Seq.tail s))
let remainingPrimes =
seq {3..2..n}
|> Seq.unfold removeFirstPrime
seq { yield 2; yield! remainingPrimes }
This is excruciatingly slow when the input to primesLessThan is remotely large: primes 1000000 |> Seq.skip 1000;; takes nearly a minute for me, though primes 1000000 itself is naturally very fast because it's just a Sequence.
I did some playing around, and I think that the culprit has to be that Seq.tail (in my removeFirstPrime) is doing something intensive. According to the docs, it's generating a completely new sequence, which I could imagine being slow.
If this were Python and the sequence object were a generator, it would be trivial to ensure that nothing expensive happens at this point: just yield from the sequence, and we've cheaply dropped its first element.
LazyList in F# doesn't seem to have the unfold method (or, for that matter, the filter method); otherwise I think LazyList would be the thing I wanted.
How can I make this implementation fast by preventing unnecessary duplications/recomputations? Ideally primesLessThan n |> Seq.skip 1000 would take the same amount of time regardless of how large n was.
Recursive solutions and sequences don't go well together (compare the answers here, it's very much the same pattern you are using). You might want to inspect the generated code, but I'd just consider this a rule of thumb.
LazyList (as defined in FSharpX) does of course come with unfold and filter defined, it would have been quite bizarre if it didn't. Typically in F# code this sort of functionality is provided in separate modules rather than as instance members on the type itself, a convention that does seem to confuse most of those documentation systems.
As you probably know Seq is a lazily evaluated collection. Sieve algorithm is about filtering out non-primes from a sequence so that you don't have to consider them again.
However, when you combine Sieve with a lazily evaluated collection you end up do the filtering of the same non-primes over and over again.
You see much better performance if you switch from Seq to Array or List because of the non-lazy aspect of those collections means that you only filter non-primes once.
One way to improve performance in your code is to introduce caching.
let removeFirstPrime s =
let s = s |> Seq.cache
match s with
| s when Seq.isEmpty s -> None
| s -> Some(Seq.head s, sieveOutPrime (Seq.head s) (Seq.tail s))
I implemented a LazyList that works alot like Seq that allows me to count the number of evaluations:
For all primes up to 2000.
Without caching: 14753706 evaluations
With caching: 97260 evaluations
Of course if you really need performance you use a mutable array implementation.
PS. Performance metrics
Running 'seq' ...
it took 271 ms with cc (16, 4, 0), the result is: 1013507
Running 'list' ...
it took 14 ms with cc (16, 0, 0), the result is: 1013507
Running 'array' ...
it took 14 ms with cc (10, 0, 0), the result is: 1013507
Running 'mutable' ...
it took 0 ms with cc (0, 0, 0), the result is: 1013507
This is Seq with caching. Seq in F# has rather high overhead, there are interesting lazy alternatives to Seq like Nessos.
List and Array run roughly similar but because of the more compact memory layout the GC metrics are better for Array (10 cc0 collections for Array vs 16 cc0 collections for List). Seq has worse GC metrics in that it forced 4 cc1 collections.
The mutable implementation of sieve algorithm has better memory and performance metrics by a large margin.
Related
I'm trying to get the prime factors for a large number..
let factors (x:int64) =
[1L..x]
|> Seq.filter(fun n -> x%n = 0L)
let isPrime (x:int64) =
factors x
|> Seq.length = 2
let primeFactors (x:int64)=
factors x
|> Seq.filter isPrime
This works for say 13195 but fails with an OutOfMemoryException for 600851475143?
Sorry if i'm missing something obvious, it's only my third day on F# and I didn't know what a prime factor was until this morning.
The expresion [1L..x] creates a list, which in your example gets too large to be stored in memory.
Sequences in contrast are lazy, so if used with care you can avoid computing the whole intermediate list. Your code already uses sequences but as said before it begins with a list, to avoid converting from a list you can use curly brackets: {1L..x}
Using sequence expressions is another option:
let factors (x:int64) = seq {
for i = 1L to x do
if x%i = 0L then yield i}
Having solved the OutOfMemoryException problem your prime function is very slow, you can optimise it as suggested in the comments by returning false immediately after finding a divisor between 1 and its square root. Further optimisations may be achieved by dividing the number by the prime factors as you find them and using a sieve for the primes, you can also have a look at some efficient algorithms here.
The expression [...] creates a list of the items specified. In F#, a List can be defined something like this:
type List<'t> =
| empty
| item of 't * List<'t>
As an example, `[1..5]' would become a structure looking like this:
item(1, item(2, item(3, item(4, item(5, empty)))))
As you can see, this will not be a problem for small numbers of items, but for larger numbers of items this will eventually use up all the available memory and cause an OutOfMemoryExcepion. As Gustavo mentioned, to avoid this, you can use a sequence, which will create each item on demand rather than all at the beginning. This reduces the number of things in memory at one time and thus avoids an OutOfMemoryException.
Since you're already using the Seq module instead of the List module (i.e. Seq.filter vs List.filter etc), you can simply use a sequence instead of a list which would look like this: {1L..x}.
When I compare IL code that F# generates for seq{} expressions vs that for user-defined computational workflows, it's quite obvious that seq{} is implemented very differently: it generates a state machine similar to the once C# uses for its' iterator methods. User-defined workflows, on the other hand, use the corresponding builder object as you'd expect.
So I am wondering - why the difference?
Is this for historical reasons, e.g. "seq was there before workflows"?
Or, is there significant performance to be gained?
Some other reason?
This is an optimization performed by the F# compiler. As far as I know, it has actually been implemented later - F# compiler first had list comprehensions, then a general-purpose version of computation expressions (also used for seq { ... }) but that was less efficient, so the optimization was added in some later version.
The main reason is that this removes many allocations and indirections. Let's say you have something like:
seq { for i in input do
yield i
yield i * 10 }
When using computation expressions, this gets translated to something like:
seq.Delay(fun () -> seq.For(input, fun i ->
seq.Combine(seq.Yield(i), seq.Delay(fun () -> seq.Yield(i * 10)))))
There is a couple of function allocations and the For loop always needs to invoke the lambda function. The optimization turns this into a state machine (similar to the C# state machine), so the MoveNext() operation on the generated enumerator just mutates some state of the class and then returns...
You can easily compare the performance by defining a custom computation builder for sequences:
type MSeqBuilder() =
member x.For(en, f) = Seq.collect f en
member x.Yield(v) = Seq.singleton v
member x.Delay(f) = Seq.delay f
member x.Combine(a, b) = Seq.concat [a; b]
let mseq = MSeqBuilder()
let input = [| 1 .. 100 |]
Now we can test this (using #time in F# interactive):
for i in 0 .. 10000 do
mseq { for x in input do
yield x
yield x * 10 }
|> Seq.length |> ignore
On my computer, this takes 2.644sec when using the custom mseq builder but only 0.065sec when using the built-in optimized seq expression. So the optimization makes sequence expressions significantly more efficient.
Historically, computations expressions ("workflows") were a generalization of sequence expressions: http://blogs.msdn.com/b/dsyme/archive/2007/09/22/some-details-on-f-computation-expressions-aka-monadic-or-workflow-syntax.aspx.
But, the answer is certainly that there is significant performance to be gained. I can't turn up any solid links (though there is a mention of "optimizations related to 'when' filters in sequence expressions" in http://blogs.msdn.com/b/dsyme/archive/2007/11/30/full-release-notes-for-f-1-9-3-7.aspx), but I do recall that this was an optimization that made its way in at some point in time. I'd like to say that the benefit is self-evident: sequence expressions are a "core" language feature and deserving of any optimizations that can be made.
Similarly, you'll see that certain tail-recursive functions will be optimized in to loops, rather than tail calls.
While writing up examples for memoization and continuation passing style (CPS) functions in a functional language, I ended up using the Fibonacci example for both. However, Fibonacci doesn't really benefit from CPS, as the loop still has to run exponentially often, while with memoization its only O(n) the first time and O(1) every following time.
Combining both CPS and memoization has a slight benefit for Fibonacci, but are there examples around where CPS is the best way that prevents you from running out of stack and improves performance and where memoization is not a solution?
Or: is there any guideline for when to choose one over the other or both?
On CPS
While CPS is useful as an intermediate language in a compiler, on the source language level it is mostly a device to (1) encode sophisticated control flow (not really performance-related) and (2) transform a non-tail-call consuming stack space into a continuation-allocating tail-call consuming heap space. For example when you write (code untested)
let rec fib = function
| 0 | 1 -> 1
| n -> fib (n-1) + fib (n-2)
let rec fib_cps n k = match n with
| 0 | 1 -> k 1
| n -> fib_cps (n-1) (fun a -> fib_cps (n-2) (fun b -> k (a+b)))
The previous non-tail-call fib (n-2), which allocated a new stack frame, is turned into the tail-call fib (n-2) (fun b -> k (a+b)) which allocates the closure fun b -> k (a+b) (on the heap) to pass it as argument.
This does not asymptotically reduce the memory usage of your program (some further domain-specific optimizations might, but that's another story). You're just trading stack space for heap space, which is interesting on systems where stack space is severely limited by the OS (not the case with some implementations of ML such as SML/NJ, which track their call stack on the heap instead of using the system stack), and potentially performance-degrading because of the additional GC costs and potential locality decrease.
CPS transformation is unlikely to improve performances much (though details of your implementation and runtime systems might make it so), and is a generally applicable transformation that allows to avoid the snark "Stack Overflow" of recursive functions with a deep call stack.
On Memoization
Memoization is useful to introduce sharing of subcalls of recursive functions. A recursive function typically solves a "problem" ("compute fibonacci of n", etc.) by decomposing it into several strictly simpler "sub-problems" (the recursive subcalls), with some base cases for which the problem is solvable right away.
For any recursive function (or recursive formulation of a problem), you can observe the structure of the subproblem space. Which simpler instances of Fib(k) will Fib(n) need to return its result? Which simpler instances will those instances in turn need?
In the general case, this space of subproblem is a graph (generally acyclic for termination purposes): there are some nodes that have several parents, that is several distinct problems for which they are subproblems. For example, Fib(n-2) is a subproblem both of Fib(n) and Fib(n-2). The amount of node sharing in this graph depends on the particular problem / recursive functions. In the case of Fibonacci, all nodes are shared between two parents, so there is a lot of sharing.
A direct recursive call without memoization will not be able observe this sharing. The structure of the calls of a recursive function is a tree, not a graph, and shared subproblems such as Fib(n-2) will be fully visited several times (as many as there are paths from the starting node to the subproblem node in the graph). Memoization introduces sharing by letting some subcalls return directly with "we've already computed this node and here is the result". For problems that have a lot of sharing, this can result in dramatic reduction of (useless) computation: Fibonacci moves from exponential to linear complexity when memoization is introduced -- note that there are other ways to write the function, without using memoization but a different subcalls structure, to have a linear complexity.
let rec fib_pair = function
| 0 -> (1,1)
| n -> let (u,v) = fib_pair (n-1) in (v,u+v)
The technique of using some form of sharing (usually through large tables storing the results) to avoid useless duplication of subcomputations is well-known in the algorithmic community, it is called Dynamic Programming. When you recognize that a problem is amenable to this treatment (you notice the sharing among the subproblems), this can provide large performance benefits.
Does a comparison make sense?
The two seems mostly independent of each other.
There are a lot of problems where memoization is not applicable, because the subproblem graph structure does not have any sharing (it is a tree). On the contrary, CPS transformation is applicable for all recursive functions, but does not by itself lead to performance benefits (other than potential constant factors due to the particular implementation and runtime system you're using, though they're likely to make the code slower rather than faster).
Improving performances by inspecting non-tail contexts
There are optimizations technique that is related to CPS that can improve performance of recursive functions. They consist in looking at the computations "left to be done" after the recursive calls (what would be turned into a function in straightforward CPS style) and finding an alternate, more efficient representation for it, that does not result in systematic closure allocation. Consider for example:
let rec length = function
| [] -> 0
| _::t -> 1 + length t
let rec length_cps li k = match li with
| [] -> k 0
| _::t -> length_cps t (fun a -> k (a + 1))
You can notice that the context of the non-recursive call, namely [_ + 1], has a simple structure: it adds an integer. Instead of representing this as a function fun a -> k (a+1), you may just store the integer to be added corresponding to several application of this function, making k an integer instead of a function.
let rec length_acc li k = match li with
| [] -> k + 0
| _::t -> length_acc t (k + 1)
This function runs in constant, rather than linear, space. By turning the representation of the tail contexts from functions to integers, we have eliminated the allocation step that made memory usage linear.
Close inspection of the order in which the additions are performed will reveal that they are now performed in a different direction: we are adding the 1's corresponding to the beginning of the list first, while the cps version was adding them last. This order-reversal is valid because _ + 1 is an associative operation (if you have several nested contexts, foo + 1 + 1 + 1, it is valid to start compute them either from the inside, ((foo+1)+1)+1, or the outside, foo+(1+(1+1))). The optimization above can be used for all such "associative" contexts around a non-tail-call.
There are certainly other optimizations available based on the same idea (I'm no expert on such optimizations), which is to look at the structure of the continuations involved and represent them under a more efficient form than functions allocated on the heap.
This is related to the transformation of "defunctionalization" that changes the representation of continuations from functions to data-structures, without changing the memory consumption (a defunctionalized program will allocate a data node when a closure would have been allocated in the original program), but allows to express the tail-recursive CPS version in a first-order language (without first-class functions) and can be more efficient on systems where data structures and pattern-matching is more efficient than closure allocation and indirect calls.
type length_cont =
| Linit
| Lcons of length_cont
let rec length_cps_defun li k = match li with
| [] -> length_cont_eval 0 k
| _::t -> length_cps_defun t (Lcons k)
and length_cont_eval acc = function
| Linit -> acc
| Lcons k -> length_cont_eval (acc+1) k
let length li = length_cps_defun li Linit
type fib_cont =
| Finit
| Fminus1 of int * fib_cont
| Fminus2 of fib_cont * int
let rec fib_cps_defun n k = match n with
| 0 | 1 -> fib_cont_eval 1 k
| n -> fib_cps_defun (n-1) (Fminus1 (n, k))
and fib_cont_eval acc = function
| Finit -> acc
| Fminus1 (n, k) -> fib_cps_defun (n-2) (Fminus2 (k, acc))
| Fminus2 (k, acc') -> fib_cont_eval (acc+acc') k
let fib n = fib_cps_defun n Finit
One benefit of CPS is error handling. If you need to fail you just call your failure method.
I think the biggest situation is when you are not talking about calculations, where memoization is great. If you are instead talking about IO or other operations, the benefits of CPS are there but memoization doesn't work.
As to an instance where CPS and memoization are both applicable and CPS is better, I am not sure since I consider them different pieces of functionality.
Finally CPS is a bit diminished in F# since tail recursion makes the whole stack overflow thing a non-issue already.
Just naively using Seq.length may be not good enough as will blow up on infinite sequences.
Getting more fancy with using something like ss |> Seq.truncate n |> Seq.length will work, but behind the scene would involve double traversing of the argument sequence chunk by IEnumerator's MoveNext().
The best approach I was able to come up with so far is:
let hasAtLeast n (ss: seq<_>) =
let mutable result = true
use e = ss.GetEnumerator()
for _ in 1 .. n do result <- e.MoveNext()
result
This involves only single sequence traverse (more accurately, performing e.MoveNext() n times) and correctly handles boundary cases of empty and infinite sequences. I can further throw in few small improvements like explicit processing of specific cases for lists, arrays, and ICollections, or some cutting on traverse length, but wonder if any more effective approach to the problem exists that I may be missing?
Thank you for your help.
EDIT: Having on hand 5 overall implementation variants of hasAtLeast function (2 my own, 2 suggested by Daniel and one suggested by Ankur) I've arranged a marathon between these. Results that are tie for all implementations prove that Guvante is right: a simplest composition of existing algorithms would be the best, there is no point here in overengineering.
Further throwing in the readability factor I'd use either my own pure F#-based
let hasAtLeast n (ss: seq<_>) =
Seq.length (Seq.truncate n ss) >= n
or suggested by Ankur the fully equivalent Linq-based one that capitalizes on .NET integration
let hasAtLeast n (ss: seq<_>) =
ss.Take(n).Count() >= n
Here's a short, functional solution:
let hasAtLeast n items =
items
|> Seq.mapi (fun i x -> (i + 1), x)
|> Seq.exists (fun (i, _) -> i = n)
Example:
let items = Seq.initInfinite id
items |> hasAtLeast 10000
And here's an optimally efficient one:
let hasAtLeast n (items:seq<_>) =
use e = items.GetEnumerator()
let rec loop n =
if n = 0 then true
elif e.MoveNext() then loop (n - 1)
else false
loop n
Functional programming breaks up work loads into small chunks that do very generic tasks that do one simple thing. Determining if there are at least n items in a sequence is not a simple task.
You already found both the solutions to this "problem", composition of existing algorithms, which works for the majority of cases, and creating your own algorithm to solve the issue.
However I have to wonder whether your first solution wouldn't work. MoveNext() is only called n times on the original method for certain, Current is never called, and even if MoveNext() is called on some wrapper class the performance implications are likely tiny unless n is huge.
EDIT:
I was curious so I wrote a simple program to test out the timing of the two methods. The truncate method was quicker for a simple infinite sequence and one that had Sleep(1). It looks like I was right when your correction sounded like overengineering.
I think clarification is needed to explain what is happening in those methods. Seq.truncate takes a sequence and returns a sequence. Other than saving the value of n it doesn't do anything until enumeration. During enumeration it counts and stops after n values. Seq.length takes an enumeration and counts, returning the count when it ends. So the enumeration is only enumerated once, and the amount of overhead is a couple of method calls and two counters.
Using Linq this would be as simple as:
let hasAtLeast n (ss: seq<_>) =
ss.Take(n).Count() >= n
Seq take method blows up if there are not enough elements.
Example usage to show it traverse seq only once and till required elements:
seq { for i = 0 to 5 do
printfn "Generating %d" i
yield i }
|> hasAtLeast 4 |> printfn "%A"
Ok, only just in F# and this is how I understand it now :
Some problems are recursive in nature (building or reading out a treestructure to name just one) and then you use recursion. In these cases you preferably use tail-recursion to give the stack a break
Some languagues are pure functional, so you have to use recursion in stead of while-loops, even if the problem is not recursive in nature
So my question : since F# also support the imperative paradigm, would you use tail recursion in F# for problems that aren't naturally recursive ones? Especially since I have read the compiler recongnizes tail recursion and just transforms it in a while loop anyway?
If so : why ?
The best answer is 'neither'. :)
There's some ugliness associated with both while loops and tail recursion.
While loops require mutability and effects, and though I have nothing against using these in moderation, especially when encapsulated in the context of a local function, you do sometimes feel like you're cluttering/uglifying your program when you start introducing effects merely to loop.
Tail recursion often has the disadvantage of requiring an extra accumulator parameter or continuation-passing style. This clutters the program with extra boilerplate to massage the startup conditions of the function.
The best answer is to use neither while loops nor recursion. Higher-order functions and lambdas are your saviors here, especially maps and folds. Why fool around with messy control structures for looping when you can encapsulate those in reusable libraries and then just state the essence of your computation simply and declaratively?
If you get in the habit of often calling map/fold rather than using loops/recursion, as well as providing a fold function along with any new tree-structured data type you introduce, you'll go far. :)
For those interested in learning more about folds in F#, why not check out my first three blog posts in a series on the topic?
In order of preference and general programming style, I will write code as follows:
Map/fold if its available
let x = [1 .. 10] |> List.map ((*) 2)
Its just convenient and easy to use.
Non-tail recursive function
> let rec map f = function
| x::xs -> f x::map f xs
| [] -> [];;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Most algorithms are easiest to read and express without tail-recursion. This works particularly well when you don't need to recurse too deeply, making it suitable for many sorting algorithms and most operations on balanced data structures.
Remember, log2(1,000,000,000,000,000) ~= 50, so log(n) operation without tail-recursion isn't scary at all.
Tail-recursive with accumulator
> let rev l =
let rec loop acc = function
| [] -> acc
| x::xs -> loop (x::acc) xs
loop [] l
let map f l =
let rec loop acc = function
| [] -> rev acc
| x::xs -> loop (f x::acc) xs
loop [] l;;
val rev : 'a list -> 'a list
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
It works, but the code is clumsy and elegance of the algorithm is slightly obscured. The example above isn't too bad to read, but once you get into tree-like data structures, it really starts to become a nightmare.
Tail-recursive with continuation passing
> let rec map cont f = function
| [] -> cont []
| x::xs -> map (fun l -> cont <| f x::l) f xs;;
val map : ('a list -> 'b) -> ('c -> 'a) -> 'c list -> 'b
> [1 .. 10] |> map id ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Whenever I see code like this, I say to myself "now that's a neat trick!". At the cost of readability, it maintains the shape of the non-recursive function, and found it really interesting for tail-recursive inserts into binary trees.
Its probably my monad-phobia speaking here, or maybe my inherent lack of familiarity with Lisp's call/cc, but I think those occasions when CSP actually simplifies algorithms are few and far between. Counter-examples are welcome in the comments.
While loops / for loops
It occurs to me that, aside from sequence comprehensions, I've never used while or for loops in my F# code. In any case...
> let map f l =
let l' = ref l
let acc = ref []
while not <| List.isEmpty !l' do
acc := (!l' |> List.hd |> f)::!acc
l' := !l' |> List.tl
!acc |> List.rev;;
val map : ('a -> 'b) -> 'a list -> 'b list
> [1 .. 10] |> map ((*) 2);;
val it : int list = [2; 4; 6; 8; 10; 12; 14; 16; 18; 20]
Its practically a parody of imperative programming. You might be able to maintain a little sanity by declaring let mutable l' = l instead, but any non-trivial function will require the use of ref.
Honestly, any problem that you can solve with a loop is already a naturally recursive one, as you can translate both into (usually conditional) jumps in the end.
I believe you should stick with tail calls in almost all cases where you must write an explicit loop. It is just more versatile:
A while loop restricts you to one loop body, while a tail call can allow you to switch between many different states while the "loop" is running.
A while loop restricts you to one condition to check for termination, with the tail recursion you can have an arbitrarily complicated match expression waiting to shunt the control flow off somewhere else.
Your tail calls all return useful values and can produce useful type errors. A while loop does not return anything and relies on side effects to do its work.
While loops are not first class while functions with tail calls (or while loops in them) are. Recursive bindings in local scope can be inspected for their type.
A tail recursive function can easily be broken apart into parts that use tail calls to call each other in the needed sequence. This may make things easier to read, and will help if you find you want to start in the middle of a loop. This is not true of a while loop.
All in all while loops in F# are only worthwhile if you really are going to be working with mutable state, inside a function body, doing the same thing repeatedly until a certain condition is met. If the loop is generally useful or very complicated, you may want to factor it out into some other top level binding. If the data types are themselves immutable (a lot of .NET value types are), you may gain very little from using mutable references to them anyway.
I'd say that you should only resort to while loops for niche cases where a while loop is perfect for the job, and is relatively short. In many imperative programming languages, while loops are often twisted into unnatural roles like driving stuff repeatedly over a case statement. Avoid those kinds of things, and see if you can use tail calls or, even better, a higher order function, to achieve the same ends.
Many problems have a recursive nature, but having thought imperatively for a long time often prevents us from seeing this.
In general I would use a functional technique wherever possible in a functional language - Loops are never functional since they exclusively rely on side-effects. So when dealing with imperative code or algorithms, using loops is adequate, but in functional context, they're aren't considered very nice.
Functional technique doesn't only mean recursion but also using appropriate higher-order functions.
So when summing a list, neither a for-loop nor a recursive function but a fold is the solution for having comprehensible code without reinventing the wheel.
for problems that aren't naturally recursive ones
..
just transforms it in a while loop anyway
You answered this yourself.
Use recursion for recursive problems and loop for things that aren't functional in nature.
Just always think: Which feels more natural, which is more readable.