List.unfold/Array.unfold causes memory blowout - f#

Friends, the following code runs fine when using Seq.unfold. However, List.unfold or Array.unfold (as shown below) causes the program to never terminate. I'm mostly just curious as to why that is. However, I am biased in general towards only using Arrays. Can anyone explain what is the reason for this behavior and if possible how to work within the confines of Arrays for a problem with this general structure.
open MathNet.Numerics.LinearAlgebra
open MathNet.Numerics.Distributions
let randn() = Normal.Sample(0., 1.)
let N = 100
let y = DenseVector.init N (fun _ -> 10. + sqrt(1.) * randn())
let SIM =
Array.unfold (fun (c1_, c2_) ->
let D = 1./(1. / 100. + float(N) / c2_)
let c1 = D *(0. / 100. + y.Sum() / c2_) + sqrt(D) * randn()
let a1 = (3. + float(N) / 2.)
let a2 = (0.5 + ((y-c1).PointwisePower(2.)).Sum() / 2.)
let c2 = InverseGamma.Sample(a1, a2)
Some((c1_, c2_), (c1, c2))
) (0., 1.)
|> Array.take (100000)
let result = SIM |> Array.map (fun (i, j) -> i)

I think the problem is caused by your generator since it never terminates the unfold algorithm by returning None.
Your code is working with Seq because sequences are evaluated lazily and Seq.unfold execute the generator only when you try to read a value from the sequence. The fact that the generator never terminates is not a problem because sequences can be infinite.
On the other hand, lists and arrays are not lazily evaluated and the generator is run until it returns None. With your generator, you end up with an "infinite loop".

Related

What's the meaning of the `in` keyword in this example (F#)

I've been trying to get my head round various bits of F# (I'm coming from more of a C# background), and parsers interest me, so I jumped at this blog post about F# parser combinators:
http://santialbo.com/blog/2013/03/24/introduction-to-parser-combinators
One of the samples here was this:
/// If the stream starts with c, returns Success, otherwise returns Failure
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
in p //what does this mean?
However, one of the things that confused me about this code was the in p statement. I looked up the in keyword in the MSDN docs:
http://msdn.microsoft.com/en-us/library/dd233249.aspx
I also spotted this earlier question:
Meaning of keyword "in" in F#
Neither of those seemed to be the same usage. The only thing that seems to fit is that this is a pipelining construct.
The let x = ... in expr allows you to declare a binding for some variable x which can then be used in expr.
In this case p is a function which takes an argument stream and then returns either Success or Failure depending on the result of the match, and this function is returned by the CharParser function.
The F# light syntax automatically nests let .. in bindings, so for example
let x = 1
let y = x + 2
y * z
is the same as
let x = 1 in
let y = x + 2 in
y * z
Therefore, the in is not needed here and the function could have been written simply as
let CharParser (c: char) : Parser<char> =
let p stream =
match stream with
| x::xs when x = c -> Success(x, xs)
| _ -> Failure
p
The answer from Lee explains the problem. In F#, the in keyword is heritage from earlier functional languages that inspired F# and required it - namely from ML and OCaml.
It might be worth adding that there is just one situation in F# where you still need in - that is, when you want to write let followed by an expression on a single line. For example:
let a = 10
if (let x = a * a in x = 100) then printfn "Ok"
This is a bit funky coding style and I would not normally use it, but you do need in if you want to write it like this. You can always split that to multiple lines though:
let a = 10
if ( let x = a * a
x = 100 ) then printfn "Ok"

F# Parallel.ForEach invalid method overload

Creating a Parallel.ForEach expression of this form:
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
Parallel.ForEach(Partitioner.Create(low, high+1, rangesize), (fun j ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
)) |> ignore
Causes me to receive the error: No overloads match for method 'ForEach'. However I am using the Parallel.ForEach<TSource> Method (Partitioner<TSource>, Action<TSource>) and it seems right to me. Am I missing something?
Edited: I am trying to obtain the same results as the code below (that does not use a Partitioner):
let low = max 1 (k-m)
let high = min (k-1) n
let rangesize = (high+1-low)/(PROCS*3)
let A = [| low .. high |]
Parallel.ForEach(A, fun (j:int) ->
let i = k - j
if x.[i-1] = y.[j-1] then
a.[i] <- b.[i-1] + 1
else
a.[i] <- max c.[i] c.[i-1]
) |> ignore
Are you sure that you have opened all necessary namespaces, all the values you are using (low, high and PROCS) are defined and that your code does not accidentally redefine some of the names that you're using (like Partitioner)?
I created a very simple F# script with this code and it seems to be working fine (I refactored the code to create a partitioner called p, but that does not affect the behavior):
open System.Threading.Tasks
open System.Collections.Concurrent
let PROCS = 10
let low, high = 0, 100
let p = Partitioner.Create(low, high+1, high+1-low/(PROCS*3))
Parallel.ForEach(p, (fun j ->
printfn "%A" j // Print the desired range (using %A as it is a tuple)
)) |> ignore
It is important that the value j is actually a pair of type int * int, so if the body uses it in a wrong way (e.g. as an int), you will get the error. In that case, you can add a type annotation to j and you would get a more useful error elsewhere:
Parallel.ForEach(p, (fun (j:int * int) ->
printfn "%d" j // Error here, because `j` is used as an int, but it is a pair!
)) |> ignore
This means that if you want to perform something for all j values in the original range, you need to write something like this:
Parallel.ForEach(p, (fun (loJ, hiJ) ->
for j in loJ .. hiJ - 1 do // Iterate over all js in this partition
printfn "%d" j // process the current j
)) |> ignore
Aside, I guess that the last argument to Partitioner.Create should actually be (high+1-low)/(PROCS*3) - you probably want to divide the total number of steps, not just the low value.

When Generating Primes in F#, why is the "Sieve of Erosthenes" so slow in this particular implementatIon?

IE,
What am I doing wrong here? Does it have to to with lists, sequences and arrays and the way the limitations work?
So here is the setup: I'm trying to generate some primes. I see that there are a billion text files of a billion primes. The question isn't why...the question is how are the guys using python calculating all of the primes below 1,000,000 in milliseconds on this post...and what am I doing wrong with the following F# code?
let sieve_primes2 top_number =
let numbers = [ for i in 2 .. top_number do yield i ]
let sieve (n:int list) =
match n with
| [x] -> x,[]
| hd :: tl -> hd, List.choose(fun x -> if x%hd = 0 then None else Some(x)) tl
| _ -> failwith "Pernicious list error."
let rec sieve_prime (p:int list) (n:int list) =
match (sieve n) with
| i,[] -> i::p
| i,n' -> sieve_prime (i::p) n'
sieve_prime [1;0] numbers
With the timer on in FSI, I get 4.33 seconds worth of CPU for 100000... after that, it all just blows up.
Your sieve function is slow because you tried to filter out composite numbers up to top_number. With Sieve of Eratosthenes, you only need to do so until sqrt(top_number) and remaining numbers are inherently prime. Suppose we havetop_number = 1,000,000, your function does 78498 rounds of filtering (the number of primes until 1,000,000) while the original sieve only does so 168 times (the number of primes until 1,000).
You can avoid generating even numbers except 2 which cannot be prime from the beginning. Moreover, sieve and sieve_prime can be merged into a recursive function. And you could use lightweight List.filter instead of List.choose.
Incorporating above suggestions:
let sieve_primes top_number =
let numbers = [ yield 2
for i in 3..2..top_number -> i ]
let rec sieve ns =
match ns with
| [] -> []
| x::xs when x*x > top_number -> ns
| x::xs -> x::sieve (List.filter(fun y -> y%x <> 0) xs)
sieve numbers
In my machine, the updated version is very fast and it completes within 0.6s for top_number = 1,000,000.
Based on my code here: stackoverflow.com/a/8371684/124259
Gets the first 1 million primes in 22 milliseconds in fsi - a significant part is probably compiling the code at this point.
#time "on"
let limit = 1000000
//returns an array of all the primes up to limit
let table =
let table = Array.create limit true //use bools in the table to save on memory
let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
let mutable curfactor = 1;
while curfactor < tlimit-2 do
curfactor <- curfactor+2
if table.[curfactor] then //simple optimisation
let mutable v = curfactor*2
while v < limit do
table.[v] <- false
v <- v + curfactor
let out = Array.create (100000) 0 //this needs to be greater than pi(limit)
let mutable idx = 1
out.[0]<-2
let mutable curx=1
while curx < limit-2 do
curx <- curx + 2
if table.[curx] then
out.[idx]<-curx
idx <- idx+1
out
There have been several good answers both as to general trial division algorithm using lists (#pad) and in choice of an array for a sieving data structure using the Sieve of Eratosthenes (SoE) (#John Palmer and #Jon Harrop). However, #pad's list algorithm isn't particularly fast and will "blow up" for larger sieving ranges and #John Palmer's array solution is somewhat more complex, uses more memory than necessary, and uses external mutable state so is not different than if the program were written in an imperative language such as C#.
EDIT_ADD: I've edited the below code (old code with line comments) modifying the sequence expression to avoid some function calls so as to reflect more of an "iterator style" and while it saved 20% of the speed it still doesn't come close to that of a true C# iterator which is about the same speed as the "roll your own enumerator" final F# code. I've modified the timing information below accordingly. END_EDIT
The following true SoE program only uses 64 KBytes of memory to sieve primes up to a million (due to only considering odd numbers and using the packed bit BitArray) and still is almost as fast as #John Palmer's program at about 40 milliseconds to sieve to one million on a i7 2700K (3.5 GHz), with only a few lines of code:
open System.Collections
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
seq { for i = -1 to BFLMT do if i<0 then yield 2u
elif buf.[i] then yield uint32(3+i+i) }
// seq { yield 2u; yield! seq { 0..BFLMT } |> Seq.filter (fun i->buf.[i])
// |> Seq.map (fun i->uint32 (i+i+3)) }
primesSOE 1000000u |> Seq.length;;
Almost all of the elapsed time is spent in the last two lines enumerating the found primes due to the inefficiency of the sequence run time library as well as the cost of enumerating itself at about 28 clock cycles per function call and return with about 16 function calls per iteration. This could be reduced to only a few function calls per iteration by rolling our own iterator, but the code is not as concise; note that in the following code there is no mutable state exposed other than the contents of the sieving array and the reference variable necessary for the iterator implementation using object expressions:
open System
open System.Collections
open System.Collections.Generic
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
let nmrtr() =
let i = ref -2
let rec nxti() = i:=!i+1;if !i<=BFLMT && not buf.[!i] then nxti() else !i<=BFLMT
let inline curr() = if !i<0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
else let v = uint32 !i in v+v+3u
{ new IEnumerator<_> with
member this.Current = curr()
interface IEnumerator with
member this.Current = box (curr())
member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
interface IDisposable with
member this.Dispose() = () }
{ new IEnumerable<_> with
member this.GetEnumerator() = nmrtr()
interface IEnumerable with
member this.GetEnumerator() = nmrtr() :> IEnumerator }
primesSOE 1000000u |> Seq.length;;
The above code takes about 8.5 milliseconds to sieve the primes to a million on the same machine due to greatly reducing the number of function calls per iteration to about three from about 16. This is about the same speed as C# code written in the same style. It's too bad that F#'s iterator style as I used in the first example doesn't automatically generate the IEnumerable boiler plate code as C# iterators do, but I guess that is the intention of sequences - just that they are so damned inefficient as to speed performance due to being implemented as sequence computation expressions.
Now, less than half of the time is expended in enumerating the prime results for a much better use of CPU time.
What am I doing wrong here?
You've implemented a different algorithm that goes through each possible value and uses % to determine if it needs to be removed. What you're supposed to be doing is stepping through with a fixed increment removing multiples. That would be asymptotically.
You cannot step through lists efficiently because they don't support random access so use arrays.

Dynamic programming in F#

What is the most elegant way to implement dynamic programming algorithms that solve problems with overlapping subproblems? In imperative programming one would usually create an array indexed (at least in one dimension) by the size of the problem, and then the algorithm would start from the simplest problems and work towards more complicated once, using the results already computed.
The simplest example I can think of is computing the Nth Fibonacci number:
int Fibonacci(int N)
{
var F = new int[N+1];
F[0]=1;
F[1]=1;
for(int i=2; i<=N; i++)
{
F[i]=F[i-1]+F[i-2];
}
return F[N];
}
I know you can implement the same thing in F#, but I am looking for a nice functional solution (which is O(N) as well obviously).
One technique that is quite useful for dynamic programming is called memoization. For more details, see for example blog post by Don Syme or introduction by Matthew Podwysocki.
The idea is that you write (a naive) recursive function and then add cache that stores previous results. This lets you write the function in a usual functional style, but get the performance of algorithm implemented using dynamic programming.
For example, a naive (inefficient) function for calculating Fibonacci number looks like this:
let rec fibs n =
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2))
This is inefficient, because when you call fibs 3, it will call fibs 1 three times (and many more times if you call, for example, fibs 6). The idea behind memoization is that we write a cache that stores the result of fib 1 and fib 2, and so on, so repeated calls will just pick the pre-calculated value from the cache.
A generic function that does the memoization can be written like this:
open System.Collections.Generic
let memoize(f) =
// Create (mutable) cache that is used for storing results of
// for function arguments that were already calculated.
let cache = new Dictionary<_, _>()
(fun x ->
// The returned function first performs a cache lookup
let succ, v = cache.TryGetValue(x)
if succ then v else
// If value was not found, calculate & cache it
let v = f(x)
cache.Add(x, v)
v)
To write more efficient Fibonacci function, we can now call memoize and give it the function that performs the calculation as an argument:
let rec fibs = memoize (fun n ->
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2)))
Note that this is a recursive value - the body of the function calls the memoized fibs function.
Tomas's answer is a good general approach. In more specific circumstances, there may be other techniques that work well - for example, in your Fibonacci case you really only need a finite amount of state (the previous 2 numbers), not all of the previously calculated values. Therefore you can do something like this:
let fibs = Seq.unfold (fun (i,j) -> Some(i,(j,i+j))) (1,1)
let fib n = Seq.nth n fibs
You could also do this more directly (without using Seq.unfold):
let fib =
let rec loop i j = function
| 0 -> i
| n -> loop j (i+j) (n-1)
loop 1 1
let fibs =
(1I,1I)
|> Seq.unfold (fun (n0, n1) -> Some (n0 , (n1, n0 + n1)))
|> Seq.cache
Taking inspiration from Tomas' answer here, and in an attempt to resolve the warning in my comment on said answer, I propose the following updated solution.
open System.Collections.Generic
let fib n =
let cache = new Dictionary<_, _>()
let memoize f c =
let succ, v = cache.TryGetValue c
if succ then v else
let v = f c
cache.Add(c, v)
v
let rec inner n =
match n with
| 1
| 2 -> bigint n
| n ->
memoize inner (n - 1) + memoize inner (n - 2)
inner n
This solution internalizes the memoization, and while doing so, allows the definitions of fib and inner to be functions, instead of fib being a recursive object, which allows the compiler to (I think) properly reason about the viability of the function calls.
I also return a bigint instead of an int, as int quickly overflows with even a small of n.
Edit: I should mention, however, that this solution still runs into stack overflow exceptions with sufficiently large values of n.

When creating an intermediary value should I store it?

I am trying to learn F# so I paid a visit to Project Euler and I am currently working on Problem 3.
The prime factors of 13195 are 5, 7,
13 and 29.
What is the largest prime
factor of the number 600851475143?
Some things to consider:
My first priority is to learn good functional habits.
My second priority is I would like it to be fast and efficient.
Within the following code I have marked the section this question is regarding.
let isPrime(n:int64) =
let rec check(i:int64) =
i > n / 2L or (n % i <> 0L && check(i + 1L))
check(2L)
let greatestPrimeFactor(n:int64) =
let nextPrime(prime:int64):int64 =
seq { for i = prime + 1L to System.Int64.MaxValue do if isPrime(i) then yield i }
|> Seq.skipWhile(fun v -> n % v <> 0L)
|> Seq.hd
let rec findNextPrimeFactor(number:int64, prime:int64):int64 =
if number = 1L then prime else
//************* No variable
(fun p -> findNextPrimeFactor(number / p, p))(nextPrime(prime))
//*************
//************* Variable
let p = nextPrime(prime)
findNextPrimeFactor(number / p, p)
//*************
findNextPrimeFactor(n, 2L)
Update
Based off some of the feedback I have refactored the code to be 10 times faster.
module Problem3
module private Internal =
let execute(number:int64):int64 =
let rec isPrime(value:int64, current:int64) =
current > value / 2L or (value % current <> 0L && isPrime(value, current + 1L))
let rec nextPrime(prime:int64):int64 =
if number % prime = 0L && isPrime(prime, 2L) then prime else nextPrime(prime + 1L)
let rec greatestPrimeFactor(current:int64, prime:int64):int64 =
if current = 1L then prime else nextPrime(prime + 1L) |> fun p -> greatestPrimeFactor(current / p, p)
greatestPrimeFactor(number, 2L)
let execute() = Internal.execute(600851475143L)
Update
I would like to thank everyone for there advice. This latest version is a compilation of all the advice I received.
module Problem3
module private Internal =
let largestPrimeFactor number =
let rec isPrime value current =
current > value / 2L || (value % current <> 0L && isPrime value (current + 1L))
let rec nextPrime value =
if number % value = 0L && isPrime value 2L then value else nextPrime (value + 1L)
let rec find current prime =
match current / prime with
| 1L -> prime
| current -> nextPrime (prime + 1L) |> find current
find number (nextPrime 2L)
let execute() = Internal.largestPrimeFactor 600851475143L
Functional programming becomes easier and more automatic with practice, so don't sweat it if you don't get it absolutely right on the first try.
With that in mind, let's take your sample code:
let rec findNextPrimeFactor(number:int64, prime:int64):int64 =
if number = 1L then prime else
//************* No variable
(fun p -> findNextPrimeFactor(number / p, p))(nextPrime(prime))
//*************
//************* Variable
let p = nextPrime(prime)
findNextPrimeFactor(number / p, p)
//*************
Your no variable version is just weird, don't use it. I like your version with the explicit let binding.
Another way to write it would be:
nextPrime(prime) |> fun p -> findNextPrimeFactor(number / p, p)
Its ok and occasionally useful to write it like this, but still comes across as a little weird. Most of the time, we use |> to curry values without needing to name our variables (in "pointfree" style). Try to anticipate how your function will be used, and if possible, re-write it so you can use it with the pipe operator without explicit declared variables. For example:
let rec findNextPrimeFactor number prime =
match number / prime with
| 1L -> prime
| number' -> nextPrime(prime) |> findNextPrimeFactor number'
No more named args :)
Ok, now that we have that out of the way, let's look at your isPrime function:
let isPrime(n:int64) =
let rec check(i:int64) =
i > n / 2L or (n % i <> 0L && check(i + 1L))
check(2L)
You've probably heard to use recursion instead of loops, and that much is right. But, wherever possible, you should abstract away recursion with folds, maps, or higher order functions. Two reasons for this:
its a little more readable, and
improperly written recursion will result in a stack overflow. For example, your function is not tail recursive, so it'll blow up on large values of n.
I'd rewrite isPrime like this:
let isPrime n = seq { 2L .. n / 2L } |> Seq.exists (fun i -> n % i = 0L) |> not
Most of the time, if you can abstract away your explicit looping, then you're just applying transformations to your input sequence until you get your results:
let maxFactor n =
seq { 2L .. n - 1L } // test inputs
|> Seq.filter isPrime // primes
|> Seq.filter (fun x -> n % x = 0L) // factors
|> Seq.max // result
We don't even have intermediate variables in this version. Coolness!
My second priority is I would like it
to be fast and efficient.
Most of the time, F# is going to be pretty comparable with C# in terms of speed, or it's going to be "fast enough". If you find your code takes a long time to execute, it probably means you're using the wrong data structure or a bad algorithm. For a concrete example, read the comments on this question.
So, the code I've written is "elegant" in the sense that its concise, gives the correct results, and doesn't rely on any trickery. Unfortunately, its not very fast. For start:
it uses trial division to create a sequence of primes, when the Sieve of Eratosthenes would be much faster. [Edit: I wrote a somewhat naive version of this sieve which didn't work for numbers larger than Int32.MaxValue, so I've removed the code.]
read Wikipedia's article on the prime counting function, it'll give you pointers on calculating the first n primes as well as estimating the upper and lower bounds for the nth prime.
[Edit: I included some code with a somewhat naive implementation of a sieve of eratosthenes. It only works for inputs less than int32.MaxValue, so it probably isn't suitable for project euler.]
Concerning "good functional habit" or rather good practice I see three minor things. Using the yield in your sequence is a little harder to read than just filter. Unnecessary type annotations in a type inferred language leads to difficult refactoring and makes the code harder to read. Don't go overboard and try to remove every type annotation though if you're finding it difficult. Lastly making a lambda function which only takes a value to use as a temp variable reduces readability.
As far as personal style goes I prefer more spaces and only using tupled arguments when the data makes sense being grouped together.
I'd write your original code like this.
let isPrime n =
let rec check i =
i > n / 2L || (n % i <> 0L && check (i + 1L))
check 2L
let greatestPrimeFactor n =
let nextPrime prime =
seq {prime + 1L .. System.Int64.MaxValue}
|> Seq.filter isPrime
|> Seq.skipWhile (fun v -> n % v <> 0L)
|> Seq.head
let rec findNextPrimeFactor number prime =
if number = 1L then
prime
else
let p = nextPrime(prime)
findNextPrimeFactor (number / p) p
findNextPrimeFactor n 2L
Your updated code is optimal for your approach. You would have to use a different algorithm like Yin Zhu answer to go faster. I wrote a test to check to see if F# makes the "check" function tail recursive and it does.
the variable p is actually a name binding, not a variable. Using name binding is not a bad style. And it is more readable. The lazy style of nextPrime is good, and it actually prime-test each number only once during the whole program.
My Solution
let problem3 =
let num = 600851475143L
let rec findMax (n:int64) (i:int64) =
if n=i || n<i then
n
elif n%i=0L then
findMax (n/i) i
else
findMax n (i+1L)
findMax num 2L
I basically divides num from 2, 3, 4.. and don't consider any prime numbers. Because if we divides all 2 from num, then we won't be able to divide it by 4,8, etc.
on this number, my solution is quicker:
> greatestPrimeFactor 600851475143L;;
Real: 00:00:01.110, CPU: 00:00:00.702, GC gen0: 1, gen1: 1, gen2: 0
val it : int64 = 6857L
>
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val problem3 : int64 = 6857L
I think that the code with the temporary binding is significantly easier to read. It's pretty unusual to create an anonymous function and then immediately apply it to a value as you do in the other case. If you really want to avoid using a temporary value, I think that the most idiomatic way to do that in F# would be to use the (|>) operator to pipe the value into the anonymous function, but I still think that this isn't quite as readable.

Resources