Part of testing, I have to load massive CSV files (200+mb each) and I'm trying to cut down the load time since this is adding up.
As the CSV files are all floating point numbers, I tried to see if I could speed that up.
One important point: I do NOT need precision beyond 5 digits after the comma. The code I post here parses it anyways to give a fair comparison to .NET's parser, but any shortcut that preserves that precision is good.
The code I came up with:
open System
open System.Diagnostics
[<EntryPoint>]
let main _ =
let r = Random()
let numbers =
seq {
for _ in 0 .. 10000000 do
let n = (1000. * r.NextDouble()) - 500.
yield (n, sprintf "%.8f" n)
}
|> Seq.toList
let parseString (number: string) : double =
let len = number.Length
let rec parse (index: int) (nSign: double) (signMultiplier: double) (nAccumulator: int64) : float =
if index < len then
match number.[index] with
| x when x >= '0' && x <= '9' -> parse (index + 1) (signMultiplier * nSign) signMultiplier (nAccumulator * 10L + int64 x - int64 '0')
| x when x = '.' -> parse (index + 1) nSign 0.1 nAccumulator
| x when x = '-' -> parse (index + 1) -nSign signMultiplier nAccumulator
| _ -> parse (index + 1) nSign signMultiplier nAccumulator
else
double nAccumulator * nSign
parse 0 +1. 1. 0L
let benchmark name (func: string -> double) =
let allowedError = 0.00001
printf "checking %s - " name
let sw = Stopwatch()
sw.Start()
numbers
|> List.iter (fun (num, str) ->
let parsed = func str
let delta = num - parsed
if abs delta > allowedError then
failwithf "(%f, %s) not matching %f" num str parsed
)
sw.Stop()
printfn "%i ms" sw.ElapsedMilliseconds
benchmark ".net parser" (fun x -> double x)
benchmark "parser1" (fun x -> parseString x)
0
Is there anything obvious I missed to speed this up?
Additionally, there is something I do not understand: I tried to make the test numbers an array instead of a list and suddenly the processing took longer, while I'd expect the opposite. If anyone has some insights as to why it is happening, I'd be happy to know. You can just change the toList to toArray and replace the List with Array in the benchmark loop to test this.
Edit:
to load the data, here is the code I use:
// load a csv file
let loadCSV (stream: Stream) =
seq {
use s = new StreamReader (stream)
while not s.EndOfStream do
yield s.ReadLine ()
}
|> Seq.map (fun line -> line.Split(",") |> Array.toList)
|> Seq.toList
I need to implement a simple dynamic programming algorithm in 2D in F#. For simple 1D cases Seq.unfold seems to be the way to go, see e.g. https://stackoverflow.com/a/7986083/5363
Is there a nice (and efficient) way to achieve a similar result in 2D, e.g. rewrite the following pseudo-code in functional style:
let alpha =
let result = Array2D.zeroCreate N T
for i in 0 .. N-1 do
result.[0, i] <- (initialPi i) * (b i observations.[0])
for t in 1 .. T-1 do
for i in 0 .. N-1 do
let s = row t-1 result |> Seq.mapi (fun j alpha_t_j -> alpha_t_j * initialA.[i, j]) () |> Seq.sum
result.[t, i] <- s * (b i observations.[t])
result
assume that all the missing functions and arrays are defined above.
EDIT: Actually read code, this is at least functional, does have a slightly different return type, although you could avoid that with a conversion
let alpha =
let rec build prev idx max =
match idx with
|0 ->
let r = (Array.init N (fun i -> (initialPi y) * (b i observations.[0]))
r:: (build r 1 max)
|t when t=max -> []
|_ ->
let s = prev |> Seq.mapi (fun j alpha_t_j -> alpha_t_j * initialA.[i, j]) () |> Seq.sum
let r = Array.init N (fun i -> s * (b i observations.[t]))
r:: build r (idx+1 max)
build [] 0 T |> List.toArray
My code (below) falls over with a stack overflow exception. Im assuming F# isnt like haskell and dosent play well with recursive lists. Whats the correct way of dealing with recursive lists like this in F# ? Should i pass it an int so it has a determined size?
let rec collatz num =
match num with
|x when x % 2 = 0 ->num :: collatz (x/2)
|x -> num :: collatz ((x * 3) + 1)
let smallList = collatz(4) |> Seq.take(4)
For an infinite list like this, you want to return a sequence. Sequences are lazy; lists are not.
let rec collatz num =
seq {
yield num
match num with
| x when x % 2 = 0 -> yield! collatz (x/2)
| x -> yield! collatz ((x * 3) + 1)
}
let smallList =
collatz 4
|> Seq.take 4
|> Seq.toList //[4; 2; 1; 4]
let collatz num =
let next x = if x % 2 = 0 then x / 2 else x * 3 + 1
(num, next num)
|>Seq.unfold (fun (n, x) -> Some (n, (x, next x)))
Is it possible to combine memoization and tail-recursion somehow? I'm learning F# at the moment and understand both concepts but can't seem to combine them.
Suppose I have the following memoize function (from Real-World Functional Programming):
let memoize f = let cache = new Dictionary<_, _>()
(fun x -> match cache.TryGetValue(x) with
| true, y -> y
| _ -> let v = f(x)
cache.Add(x, v)
v)
and the following factorial function:
let rec factorial(x) = if (x = 0) then 1 else x * factorial(x - 1)
Memoizing factorial isn't too difficult and making it tail-recursive isn't either:
let rec memoizedFactorial =
memoize (fun x -> if (x = 0) then 1 else x * memoizedFactorial(x - 1))
let tailRecursiveFactorial(x) =
let rec factorialUtil(x, res) = if (x = 0)
then res
else let newRes = x * res
factorialUtil(x - 1, newRes)
factorialUtil(x, 1)
But can you combine memoization and tail-recursion? I made some attempts but can't seem to get it working. Or is this simply not possible?
As always, continuations yield an elegant tailcall solution:
open System.Collections.Generic
let cache = Dictionary<_,_>() // TODO move inside
let memoizedTRFactorial =
let rec fac n k = // must make tailcalls to k
match cache.TryGetValue(n) with
| true, r -> k r
| _ ->
if n=0 then
k 1
else
fac (n-1) (fun r1 ->
printfn "multiplying by %d" n //***
let r = r1 * n
cache.Add(n,r)
k r)
fun n -> fac n id
printfn "---"
let r = memoizedTRFactorial 4
printfn "%d" r
for KeyValue(k,v) in cache do
printfn "%d: %d" k v
printfn "---"
let r2 = memoizedTRFactorial 5
printfn "%d" r2
printfn "---"
// comment out *** line, then run this
//let r3 = memoizedTRFactorial 100000
//printfn "%d" r3
There are two kinds of tests. First, this demos that calling F(4) caches F(4), F(3), F(2), F(1) as you would like.
Then, comment out the *** printf and uncomment the final test (and compile in Release mode) to show that it does not StackOverflow (it uses tailcalls correctly).
Perhaps I'll generalize out 'memoize' and demonstrate it on 'fib' next...
EDIT
Ok, here's the next step, I think, decoupling memoization from factorial:
open System.Collections.Generic
let cache = Dictionary<_,_>() // TODO move inside
let memoize fGuts n =
let rec newFunc n k = // must make tailcalls to k
match cache.TryGetValue(n) with
| true, r -> k r
| _ ->
fGuts n (fun r ->
cache.Add(n,r)
k r) newFunc
newFunc n id
let TRFactorialGuts n k memoGuts =
if n=0 then
k 1
else
memoGuts (n-1) (fun r1 ->
printfn "multiplying by %d" n //***
let r = r1 * n
k r)
let memoizedTRFactorial = memoize TRFactorialGuts
printfn "---"
let r = memoizedTRFactorial 4
printfn "%d" r
for KeyValue(k,v) in cache do
printfn "%d: %d" k v
printfn "---"
let r2 = memoizedTRFactorial 5
printfn "%d" r2
printfn "---"
// comment out *** line, then run this
//let r3 = memoizedTRFactorial 100000
//printfn "%d" r3
EDIT
Ok, here's a fully generalized version that seems to work.
open System.Collections.Generic
let memoize fGuts =
let cache = Dictionary<_,_>()
let rec newFunc n k = // must make tailcalls to k
match cache.TryGetValue(n) with
| true, r -> k r
| _ ->
fGuts n (fun r ->
cache.Add(n,r)
k r) newFunc
cache, (fun n -> newFunc n id)
let TRFactorialGuts n k memoGuts =
if n=0 then
k 1
else
memoGuts (n-1) (fun r1 ->
printfn "multiplying by %d" n //***
let r = r1 * n
k r)
let facCache,memoizedTRFactorial = memoize TRFactorialGuts
printfn "---"
let r = memoizedTRFactorial 4
printfn "%d" r
for KeyValue(k,v) in facCache do
printfn "%d: %d" k v
printfn "---"
let r2 = memoizedTRFactorial 5
printfn "%d" r2
printfn "---"
// comment out *** line, then run this
//let r3 = memoizedTRFactorial 100000
//printfn "%d" r3
let TRFibGuts n k memoGuts =
if n=0 || n=1 then
k 1
else
memoGuts (n-1) (fun r1 ->
memoGuts (n-2) (fun r2 ->
printfn "adding %d+%d" r1 r2 //%%%
let r = r1+r2
k r))
let fibCache, memoizedTRFib = memoize TRFibGuts
printfn "---"
let r5 = memoizedTRFib 4
printfn "%d" r5
for KeyValue(k,v) in fibCache do
printfn "%d: %d" k v
printfn "---"
let r6 = memoizedTRFib 5
printfn "%d" r6
printfn "---"
// comment out %%% line, then run this
//let r7 = memoizedTRFib 100000
//printfn "%d" r7
The predicament of memoizing tail-recursive functions is, of course, that when tail-recursive function
let f x =
......
f x1
calls itself, it is not allowed to do anything with a result of the recursive call, including putting it into cache. Tricky; so what can we do?
The critical insight here is that since the recursive function is not allowed to do anything with a result of recursive call, the result for all arguments to recursive calls will be the same! Therefore if recursion call trace is this
f x0 -> f x1 -> f x2 -> f x3 -> ... -> f xN -> res
then for all x in x0,x1,...,xN the result of f x will be the same, namely res. So the last invocation of a recursive function, the non-recursive call, knows the results for all the previous values - it is in a position to cache them. The only thing you need to do is to pass a list of visited values to it. Here is what it might look for factorial:
let cache = Dictionary<_,_>()
let rec fact0 l ((n,res) as arg) =
let commitToCache r =
l |> List.iter (fun a -> cache.Add(a,r))
match cache.TryGetValue(arg) with
| true, cachedResult -> commitToCache cachedResult; cachedResult
| false, _ ->
if n = 1 then
commitToCache res
cache.Add(arg, res)
res
else
fact0 (arg::l) (n-1, n*res)
let fact n = fact0 [] (n,1)
But wait! Look - l parameter of fact0 contains all the arguments to recursive calls to fact0 - just like the stack would in a non-tail-recursive version! That is exactly right. Any non-tail recursive algorithm can be converted to a tail-recursive one by moving the "list of stack frames" from stack to heap and converting the "postprocessing" of recursive call result into a walk over that data structure.
Pragmatic note: The factorial example above illustrates a general technique. It is quite useless as is - for factorial function it is quite enough to cache the top-level fact n result, because calculation of fact n for a particular n only hits a unique series of (n,res) pairs of arguments to fact0 - if (n,1) is not cached yet, then none of the pairs fact0 is going to be called on are.
Note that in this example, when we went from non-tail-recursive factorial to a tail-recursive factorial, we exploited the fact that multiplication is associative and commutative - tail-recursive factorial execute a different set of multiplications than a non-tail-recursive one.
In fact, a general technique exists for going from non-tail-recursive to tail-recursive algorithm, which yields an algorithm equivalent to a tee. This technique is called "continuatuion-passing transformation". Going that route, you can take a non-tail-recursive memoizing factorial and get a tail-recursive memoizing factorial by pretty much a mechanical transformation. See Brian's answer for exposition of this method.
I'm not sure if there's a simpler way to do this, but one approach would be to create a memoizing y-combinator:
let memoY f =
let cache = Dictionary<_,_>()
let rec fn x =
match cache.TryGetValue(x) with
| true,y -> y
| _ -> let v = f fn x
cache.Add(x,v)
v
fn
Then, you can use this combinator in lieu of "let rec", with the first argument representing the function to call recursively:
let tailRecFact =
let factHelper fact (x, res) =
printfn "%i,%i" x res
if x = 0 then res
else fact (x-1, x*res)
let memoized = memoY factHelper
fun x -> memoized (x,1)
EDIT
As Mitya pointed out, memoY doesn't preserve the tail recursive properties of the memoee. Here's a revised combinator which uses exceptions and mutable state to memoize any recursive function without overflowing the stack (even if the original function is not itself tail recursive!):
let memoY f =
let cache = Dictionary<_,_>()
fun x ->
let l = ResizeArray([x])
while l.Count <> 0 do
let v = l.[l.Count - 1]
if cache.ContainsKey(v) then l.RemoveAt(l.Count - 1)
else
try
cache.[v] <- f (fun x ->
if cache.ContainsKey(x) then cache.[x]
else
l.Add(x)
failwith "Need to recurse") v
with _ -> ()
cache.[x]
Unfortunately, the machinery which is inserted into each recursive call is somewhat heavy, so performance on un-memoized inputs requiring deep recursion can be a bit slow. However, compared to some other solutions, this has the benefit that it requires fairly minimal changes to the natural expression of recursive functions:
let fib = memoY (fun fib n ->
printfn "%i" n;
if n <= 1 then n
else (fib (n-1)) + (fib (n-2)))
let _ = fib 5000
EDIT
I'll expand a bit on how this compares to other solutions. This technique takes advantage of the fact that exceptions provide a side channel: a function of type 'a -> 'b doesn't actually need to return a value of type 'b, but can instead exit via an exception. We wouldn't need to use exceptions if the return type explicitly contained an additional value indicating failure. Of course, we could use the 'b option as the return type of the function for this purpose. This would lead to the following memoizing combinator:
let memoO f =
let cache = Dictionary<_,_>()
fun x ->
let l = ResizeArray([x])
while l.Count <> 0 do
let v = l.[l.Count - 1]
if cache.ContainsKey v then l.RemoveAt(l.Count - 1)
else
match f(fun x -> if cache.ContainsKey x then Some(cache.[x]) else l.Add(x); None) v with
| Some(r) -> cache.[v] <- r;
| None -> ()
cache.[x]
Previously, our memoization process looked like:
fun fib n ->
printfn "%i" n;
if n <= 1 then n
else (fib (n-1)) + (fib (n-2))
|> memoY
Now, we need to incorporate the fact that fib should return an int option instead of an int. Given a suitable workflow for option types, this could be written as follows:
fun fib n -> option {
printfn "%i" n
if n <= 1 then return n
else
let! x = fib (n-1)
let! y = fib (n-2)
return x + y
} |> memoO
However, if we're willing to change the return type of the first parameter (from int to int option in this case), we may as well go all the way and just use continuations in the return type instead, as in Brian's solution. Here's a variation on his definitions:
let memoC f =
let cache = Dictionary<_,_>()
let rec fn n k =
match cache.TryGetValue(n) with
| true, r -> k r
| _ ->
f fn n (fun r ->
cache.Add(n,r)
k r)
fun n -> fn n id
And again, if we have a suitable computation expression for building CPS functions, we can define our recursive function like this:
fun fib n -> cps {
printfn "%i" n
if n <= 1 then return n
else
let! x = fib (n-1)
let! y = fib (n-2)
return x + y
} |> memoC
This is exactly the same as what Brian has done, but I find the syntax here is easier to follow. To make this work, all we need are the following two definitions:
type CpsBuilder() =
member this.Return x k = k x
member this.Bind(m,f) k = m (fun a -> f a k)
let cps = CpsBuilder()
I wrote a test to visualize the memoization. Each dot is a recursive call.
......720 // factorial 6
......720 // factorial 6
.....120 // factorial 5
......720 // memoizedFactorial 6
720 // memoizedFactorial 6
120 // memoizedFactorial 5
......720 // tailRecFact 6
720 // tailRecFact 6
.....120 // tailRecFact 5
......720 // tailRecursiveMemoizedFactorial 6
720 // tailRecursiveMemoizedFactorial 6
.....120 // tailRecursiveMemoizedFactorial 5
kvb's solution returns the same results are straight memoization like this function.
let tailRecursiveMemoizedFactorial =
memoize
(fun x ->
let rec factorialUtil x res =
if x = 0 then
res
else
printf "."
let newRes = x * res
factorialUtil (x - 1) newRes
factorialUtil x 1
)
Test source code.
open System.Collections.Generic
let memoize f =
let cache = new Dictionary<_, _>()
(fun x ->
match cache.TryGetValue(x) with
| true, y -> y
| _ ->
let v = f(x)
cache.Add(x, v)
v)
let rec factorial(x) =
if (x = 0) then
1
else
printf "."
x * factorial(x - 1)
let rec memoizedFactorial =
memoize (
fun x ->
if (x = 0) then
1
else
printf "."
x * memoizedFactorial(x - 1))
let memoY f =
let cache = Dictionary<_,_>()
let rec fn x =
match cache.TryGetValue(x) with
| true,y -> y
| _ -> let v = f fn x
cache.Add(x,v)
v
fn
let tailRecFact =
let factHelper fact (x, res) =
if x = 0 then
res
else
printf "."
fact (x-1, x*res)
let memoized = memoY factHelper
fun x -> memoized (x,1)
let tailRecursiveMemoizedFactorial =
memoize
(fun x ->
let rec factorialUtil x res =
if x = 0 then
res
else
printf "."
let newRes = x * res
factorialUtil (x - 1) newRes
factorialUtil x 1
)
factorial 6 |> printfn "%A"
factorial 6 |> printfn "%A"
factorial 5 |> printfn "%A\n"
memoizedFactorial 6 |> printfn "%A"
memoizedFactorial 6 |> printfn "%A"
memoizedFactorial 5 |> printfn "%A\n"
tailRecFact 6 |> printfn "%A"
tailRecFact 6 |> printfn "%A"
tailRecFact 5 |> printfn "%A\n"
tailRecursiveMemoizedFactorial 6 |> printfn "%A"
tailRecursiveMemoizedFactorial 6 |> printfn "%A"
tailRecursiveMemoizedFactorial 5 |> printfn "%A\n"
System.Console.ReadLine() |> ignore
That should work if mutual tail recursion through y are not creating stack frames:
let rec y f x = f (y f) x
let memoize (d:System.Collections.Generic.Dictionary<_,_>) f n =
if d.ContainsKey n then d.[n]
else d.Add(n, f n);d.[n]
let rec factorialucps factorial' n cont =
if n = 0I then cont(1I) else factorial' (n-1I) (fun k -> cont (n*k))
let factorialdpcps =
let d = System.Collections.Generic.Dictionary<_, _>()
fun n -> y (factorialucps >> fun f n -> memoize d f n ) n id
factorialdpcps 15I //1307674368000
The following F# code gives the correct answer to Project Euler problem #7:
let isPrime num =
let upperDivisor = int32(sqrt(float num)) // Is there a better way?
let rec evaluateModulo a =
if a = 1 then
true
else
match num % a with
| 0 -> false
| _ -> evaluateModulo (a - 1)
evaluateModulo upperDivisor
let mutable accumulator = 1 // Would like to avoid mutable values.
let mutable number = 2 // ""
while (accumulator <= 10001) do
if (isPrime number) then
accumulator <- accumulator + 1
number <- number + 1
printfn "The 10001st prime number is %i." (number - 1) // Feels kludgy.
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
I'd like to avoid the mutable values accumulator and number. I'd also like to refactor the while loop into a tail recursive function. Any tips?
Any ideas on how to remove the (number - 1) kludge which displays the result?
Any general comments about this code or suggestions on how to improve it?
Loops are nice, but its more idiomatic to abstract away loops as much as possible.
let isPrime num =
let upperDivisor = int32(sqrt(float num))
match num with
| 0 | 1 -> false
| 2 -> true
| n -> seq { 2 .. upperDivisor } |> Seq.forall (fun x -> num % x <> 0)
let primes = Seq.initInfinite id |> Seq.filter isPrime
let nthPrime n = Seq.nth n primes
printfn "The 10001st prime number is %i." (nthPrime 10001)
printfn ""
printfn "Hit any key to continue."
System.Console.ReadKey() |> ignore
Sequences are your friend :)
You can refer my F# for Project Euler Wiki:
I got this first version:
let isPrime n =
if n=1 then false
else
let m = int(sqrt (float(n)))
let mutable p = true
for i in 2..m do
if n%i =0 then p <- false
// ~~ I want to break here!
p
let rec nextPrime n =
if isPrime n then n
else nextPrime (n+1)
let problem7 =
let mutable result = nextPrime 2
for i in 2..10001 do
result <- nextPrime (result+1)
result
In this version, although looks nicer, but I still does not early break the loop when the number is not a prime. In Seq module, exist and forall methods support early stop:
let isPrime n =
if n<=1 then false
else
let m = int(sqrt (float(n)))
{2..m} |> Seq.exists (fun i->n%i=0) |> not
// or equivalently :
// {2..m} |> Seq.forall (fun i->n%i<>0)
Notice in this version of isPrime, the function is finally mathematically correct by checking numbers below 2.
Or you can use a tail recursive function to do the while loop:
let isPrime n =
let m = int(sqrt (float(n)))
let rec loop i =
if i>m then true
else
if n%i = 0 then false
else loop (i+1)
loop 2
A more functional version of problem7 is to use Seq.unfold to generate an infinite prime sequence and take nth element of this sequence:
let problem7b =
let primes =
2 |> Seq.unfold (fun p ->
let next = nextPrime (p+1) in
Some( p, next ) )
Seq.nth 10000 primes
Here's my solution, which uses a tail-recursive loop pattern which always allows you to avoid mutables and gain break functionality: http://projecteulerfun.blogspot.com/2010/05/problem-7-what-is-10001st-prime-number.html
let problem7a =
let isPrime n =
let nsqrt = n |> float |> sqrt |> int
let rec isPrime i =
if i > nsqrt then true //break
elif n % i = 0 then false //break
//loop while neither of the above two conditions are true
//pass your state (i+1) to the next call
else isPrime (i+1)
isPrime 2
let nthPrime n =
let rec nthPrime i p count =
if count = n then p //break
//loop while above condition not met
//pass new values in for p and count, emulating state
elif i |> isPrime then nthPrime (i+2) i (count+1)
else nthPrime (i+2) p count
nthPrime 1 1 0
nthPrime 10001
Now, to specifically address some of the questions you had in your solution.
The above nthPrime function allows you to find primes at an arbitrary position, this is how it would look adapted to your approach of finding specifically the 1001 prime, and using your variable names (the solution is tail-recursive and doesn't use mutables):
let prime1001 =
let rec nthPrime i number accumulator =
if accumulator = 1001 then number
//i is prime, so number becomes i in our next call and accumulator is incremented
elif i |> isPrime then prime1001 (i+2) i (accumulator+1)
//i is not prime, so number and accumulator do not change, just advance i to the next odd
else prime1001 (i+2) number accumulator
prime1001 1 1 0
Yes, there is a better way to do square roots: write your own generic square root implementation (reference this and this post for G implementation):
///Finds the square root (integral or floating point) of n
///Does not work with BigRational
let inline sqrt_of (g:G<'a>) n =
if g.zero = n then g.zero
else
let mutable s:'a = (n / g.two) + g.one
let mutable t:'a = (s + (n / s)) / g.two
while t < s do
s <- t
let step1:'a = n/s
let step2:'a = s + step1
t <- step2 / g.two
s
let inline sqrtG n = sqrt_of (G_of n) n
let sqrtn = sqrt_of gn //this has suffix "n" because sqrt is not strictly integral type
let sqrtL = sqrt_of gL
let sqrtI = sqrt_of gI
let sqrtF = sqrt_of gF
let sqrtM = sqrt_of gM