I am wanting to be able to sum multiple fields of a sequence. I don't know how to do this without iterating over the sequence multiple times. My perception is that this is inefficient. Is there a clever way to do this that I am not seeing?
type item = {
Name : string
DemandQty : decimal
Weight : decimal
UnitCost : decimal
}
let items =
{1..10}
|> Seq.map (fun x ->
{
Name = string x
DemandQty = decimal x
Weight = decimal x
UnitCost = decimal x
})
// This Works for a single value
let costSum =
items
|> Seq.sumBy (fun x -> x.DemandQty * x.UnitCost)
// This is similar to what I would like to do
let costSum, weightSum =
items
|> Seq.sumBy (fun x -> x.DemandQty * x.UnitCost, x.DemandQty * x.Weight)
Ideally I can sum multiple values that I am calculating by mapping over the sequence. Is my thinking off here?
I would also be interested in knowing what the performance impact is going to be if I have to iterate through the sequence multiple times to calculate all of the sums I am wanting. My perception is that this would be inefficient.
A straightforward way of summing is folding the sequence with a sum function:
let costSum, weightSum =
items
|> Seq.fold
(fun (costSum, weightSum) x -> (costSum + x.DemandQty * x.UnitCost, weightSum + x.DemandQty * x.Weight))
(0m, 0m)
As for the performance impact of iterating over the sequence multiple time is that it depends. The work of iterating over the sequence is repeated. So at the face of it is less efficient. However for shorter sequences when the number of repeated iterations is constant the performance impact might be negligible. Also computational complexity theory states that constants are ignorable when the number of elements increases.
In short, if it matters, benchmark on expected inputs. If it doesn't have a large enough impact go with the solution that offers best clarity.
Related
I am really new to F# and am struggling to adapt to the whole functional programming mindset. Basically I am trying to figure out how to iterate through a list of numbers and add them up while the sum is less than a certain number (200 for instance).
At the moment I have something along the lines of the following:
let nums = [20 .. 50]
let mutable total = 0
let addUp (num, (total : byref<int>)) =
if (num + total < 200) then
total <- total + num
for num in nums do
addUp (num, &total)
This does get the job done, but I feel like there must be a more appropriate way to do this while still remaining within the style of functional programming, and without out having to rely on a mutable. I still feel like I'm approaching things from an imperative mindset.
I was thinking I could somehow use List.fold to do this but after some experimentation I was unable to figure out how to utilize it in conjunction with the whole while < 200 thing. Any help would be hugely appreciated and I apologize in advance if this is a commonly asked and answered question (I couldn't find anything through my own search).
For educational purposes, the answer above is fine as it lets you see what exactly happens within the fold or reduce.
For a real project, it is better to use existing library functions. You can even separate summation and bound checking. Consider this:
let addUp threshold xs =
xs
|> Seq.scan (+) 0 // line 1
|> Seq.takeWhile (fun x -> x < threshold) // line 2
|> Seq.last // line 3
// Usage:
let nums = Seq.initInfinite ( (+) 20) // line 4
nums
|> addUp 200
|> printfn "%d" // prints "188"
A bit of explanation:
line 1 is a contraction of Seq.scan (fun state x -> state + x) 0 so it actually returns a sequence of intermediate sums (20, 41, 63, ...);
in line 2, we take only the elements that match or filtering predicate;
in line 3, we simply take the last element (that matched the filtering above);
in line 4, again, it's a contraction of Seq.initInfinite (fun x -> 20+x). I took my liberty to make your data an infinite sequence (20, 21, 22, ...), but it still works.
Note, the code looks like three calls, but the sequence is evaluated only once.
Note, in line 2, don't try contraction like above. The reason is that (<) threshold evaluates to fun x -> threshold < x which is wrong. If you. however, use (>) threshold, it reads counter-intuitive and confusing.
Note, there's no error checking in the function. Calling it with an empty source sequence will crash at Seq.last.
First, let's try to avoid mutable variable and one way to do it is to create a recursive function to iterate through the list with the latest total.
let rec addUp total nums =
match nums with
| [] -> total
| num::tl ->
let newTotal = num + total
if newTotal < 200 then
addUp newTotal tl
else
total
addUp 0 nums
We create a recursive function addUp which accepts total and the list of numbers. The function extracts the first number of the list and adds it to the total if the new total is less than 200. Since we might still have more numbers on the list, we call addUp again with the new total and the rest of the numbers. Otherwise, we stop and return the new total.
We can also use List.fold which makes the code cleaner, but the whole list will be iterated. Here is the code:
List.fold (fun total num ->
let newTotal = num + total
if newTotal < 200 then
newTotal
else
total
) 0 nums
Since the output is the same type as the member of the input, we can use List.reduce and get rid of initial state (0).
List.reduce (fun total num ->
let newTotal = num + total
if newTotal < 200 then
newTotal
else
total
) nums
Hope this helps.
Is there a quick and simple way to convert an entire list of strings into floats or integers
and add them together similar to this in F#?
foreach(string s in list)
{
sum += int.Parse(s);
}
If you want to aim for minimal number of characters, then you can simplify the solution posted by Ganesh to something like this:
let sum = list |> Seq.sumBy int
This does pretty much the same thing - the int function is a generic conversion that converts anything to an integer (and it works on strings too). The sumBy function is a combination of map and sum that first projects all elements to a numeric value and then sums the results.
Something like this should have the same effect:
let sum = list |> Seq.map System.Int32.Parse |> Seq.sum
F# doesn't seem to support referring to the method on int so I had to use System.Int32 instead.
In F# the type seq is an alias for the .NET IEnumerable, so this code works on arrays, lists etc.
Note the use of Parse in "point-free" style - a function without its argument can be used directly as an argument to another function that expects that type. In this case Seq.map has this type:
('a -> 'b) -> seq<'a> -> seq<'b>
And since System.Int32.Parse has type string -> int, Seq.map System.Int32.Parse has type seq<string> -> seq<int>.
Technically, there are at least 3 different approaches:
1) The Seq.sum or sumBy approach described in the other answers is the canonical way of getting the sum in F#:
let sum = Seq.sumBy int list
2) For instructional purposes, it may be interesting to see how closely one can simulate C# behavior in F#; for instance, using a reference cell type:
let inline (+=) x y = x := !x + y
let sum = ref 0
for s in list do sum += int s
3) Same idea as 2), but using a byref pointer type:
let inline (+=) (x:_ byref) y = x <- x + y
let mutable sum = 0
for s in list do &sum += int s
IE,
What am I doing wrong here? Does it have to to with lists, sequences and arrays and the way the limitations work?
So here is the setup: I'm trying to generate some primes. I see that there are a billion text files of a billion primes. The question isn't why...the question is how are the guys using python calculating all of the primes below 1,000,000 in milliseconds on this post...and what am I doing wrong with the following F# code?
let sieve_primes2 top_number =
let numbers = [ for i in 2 .. top_number do yield i ]
let sieve (n:int list) =
match n with
| [x] -> x,[]
| hd :: tl -> hd, List.choose(fun x -> if x%hd = 0 then None else Some(x)) tl
| _ -> failwith "Pernicious list error."
let rec sieve_prime (p:int list) (n:int list) =
match (sieve n) with
| i,[] -> i::p
| i,n' -> sieve_prime (i::p) n'
sieve_prime [1;0] numbers
With the timer on in FSI, I get 4.33 seconds worth of CPU for 100000... after that, it all just blows up.
Your sieve function is slow because you tried to filter out composite numbers up to top_number. With Sieve of Eratosthenes, you only need to do so until sqrt(top_number) and remaining numbers are inherently prime. Suppose we havetop_number = 1,000,000, your function does 78498 rounds of filtering (the number of primes until 1,000,000) while the original sieve only does so 168 times (the number of primes until 1,000).
You can avoid generating even numbers except 2 which cannot be prime from the beginning. Moreover, sieve and sieve_prime can be merged into a recursive function. And you could use lightweight List.filter instead of List.choose.
Incorporating above suggestions:
let sieve_primes top_number =
let numbers = [ yield 2
for i in 3..2..top_number -> i ]
let rec sieve ns =
match ns with
| [] -> []
| x::xs when x*x > top_number -> ns
| x::xs -> x::sieve (List.filter(fun y -> y%x <> 0) xs)
sieve numbers
In my machine, the updated version is very fast and it completes within 0.6s for top_number = 1,000,000.
Based on my code here: stackoverflow.com/a/8371684/124259
Gets the first 1 million primes in 22 milliseconds in fsi - a significant part is probably compiling the code at this point.
#time "on"
let limit = 1000000
//returns an array of all the primes up to limit
let table =
let table = Array.create limit true //use bools in the table to save on memory
let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
let mutable curfactor = 1;
while curfactor < tlimit-2 do
curfactor <- curfactor+2
if table.[curfactor] then //simple optimisation
let mutable v = curfactor*2
while v < limit do
table.[v] <- false
v <- v + curfactor
let out = Array.create (100000) 0 //this needs to be greater than pi(limit)
let mutable idx = 1
out.[0]<-2
let mutable curx=1
while curx < limit-2 do
curx <- curx + 2
if table.[curx] then
out.[idx]<-curx
idx <- idx+1
out
There have been several good answers both as to general trial division algorithm using lists (#pad) and in choice of an array for a sieving data structure using the Sieve of Eratosthenes (SoE) (#John Palmer and #Jon Harrop). However, #pad's list algorithm isn't particularly fast and will "blow up" for larger sieving ranges and #John Palmer's array solution is somewhat more complex, uses more memory than necessary, and uses external mutable state so is not different than if the program were written in an imperative language such as C#.
EDIT_ADD: I've edited the below code (old code with line comments) modifying the sequence expression to avoid some function calls so as to reflect more of an "iterator style" and while it saved 20% of the speed it still doesn't come close to that of a true C# iterator which is about the same speed as the "roll your own enumerator" final F# code. I've modified the timing information below accordingly. END_EDIT
The following true SoE program only uses 64 KBytes of memory to sieve primes up to a million (due to only considering odd numbers and using the packed bit BitArray) and still is almost as fast as #John Palmer's program at about 40 milliseconds to sieve to one million on a i7 2700K (3.5 GHz), with only a few lines of code:
open System.Collections
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
seq { for i = -1 to BFLMT do if i<0 then yield 2u
elif buf.[i] then yield uint32(3+i+i) }
// seq { yield 2u; yield! seq { 0..BFLMT } |> Seq.filter (fun i->buf.[i])
// |> Seq.map (fun i->uint32 (i+i+3)) }
primesSOE 1000000u |> Seq.length;;
Almost all of the elapsed time is spent in the last two lines enumerating the found primes due to the inefficiency of the sequence run time library as well as the cost of enumerating itself at about 28 clock cycles per function call and return with about 16 function calls per iteration. This could be reduced to only a few function calls per iteration by rolling our own iterator, but the code is not as concise; note that in the following code there is no mutable state exposed other than the contents of the sieving array and the reference variable necessary for the iterator implementation using object expressions:
open System
open System.Collections
open System.Collections.Generic
let primesSoE top_number=
let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
let SQRTLMT = (int(sqrt (double top_number))-3)/2
let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
let nmrtr() =
let i = ref -2
let rec nxti() = i:=!i+1;if !i<=BFLMT && not buf.[!i] then nxti() else !i<=BFLMT
let inline curr() = if !i<0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
else let v = uint32 !i in v+v+3u
{ new IEnumerator<_> with
member this.Current = curr()
interface IEnumerator with
member this.Current = box (curr())
member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
interface IDisposable with
member this.Dispose() = () }
{ new IEnumerable<_> with
member this.GetEnumerator() = nmrtr()
interface IEnumerable with
member this.GetEnumerator() = nmrtr() :> IEnumerator }
primesSOE 1000000u |> Seq.length;;
The above code takes about 8.5 milliseconds to sieve the primes to a million on the same machine due to greatly reducing the number of function calls per iteration to about three from about 16. This is about the same speed as C# code written in the same style. It's too bad that F#'s iterator style as I used in the first example doesn't automatically generate the IEnumerable boiler plate code as C# iterators do, but I guess that is the intention of sequences - just that they are so damned inefficient as to speed performance due to being implemented as sequence computation expressions.
Now, less than half of the time is expended in enumerating the prime results for a much better use of CPU time.
What am I doing wrong here?
You've implemented a different algorithm that goes through each possible value and uses % to determine if it needs to be removed. What you're supposed to be doing is stepping through with a fixed increment removing multiples. That would be asymptotically.
You cannot step through lists efficiently because they don't support random access so use arrays.
What is the most elegant way to implement dynamic programming algorithms that solve problems with overlapping subproblems? In imperative programming one would usually create an array indexed (at least in one dimension) by the size of the problem, and then the algorithm would start from the simplest problems and work towards more complicated once, using the results already computed.
The simplest example I can think of is computing the Nth Fibonacci number:
int Fibonacci(int N)
{
var F = new int[N+1];
F[0]=1;
F[1]=1;
for(int i=2; i<=N; i++)
{
F[i]=F[i-1]+F[i-2];
}
return F[N];
}
I know you can implement the same thing in F#, but I am looking for a nice functional solution (which is O(N) as well obviously).
One technique that is quite useful for dynamic programming is called memoization. For more details, see for example blog post by Don Syme or introduction by Matthew Podwysocki.
The idea is that you write (a naive) recursive function and then add cache that stores previous results. This lets you write the function in a usual functional style, but get the performance of algorithm implemented using dynamic programming.
For example, a naive (inefficient) function for calculating Fibonacci number looks like this:
let rec fibs n =
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2))
This is inefficient, because when you call fibs 3, it will call fibs 1 three times (and many more times if you call, for example, fibs 6). The idea behind memoization is that we write a cache that stores the result of fib 1 and fib 2, and so on, so repeated calls will just pick the pre-calculated value from the cache.
A generic function that does the memoization can be written like this:
open System.Collections.Generic
let memoize(f) =
// Create (mutable) cache that is used for storing results of
// for function arguments that were already calculated.
let cache = new Dictionary<_, _>()
(fun x ->
// The returned function first performs a cache lookup
let succ, v = cache.TryGetValue(x)
if succ then v else
// If value was not found, calculate & cache it
let v = f(x)
cache.Add(x, v)
v)
To write more efficient Fibonacci function, we can now call memoize and give it the function that performs the calculation as an argument:
let rec fibs = memoize (fun n ->
if n < 1 then 1 else
(fibs (n - 1)) + (fibs (n - 2)))
Note that this is a recursive value - the body of the function calls the memoized fibs function.
Tomas's answer is a good general approach. In more specific circumstances, there may be other techniques that work well - for example, in your Fibonacci case you really only need a finite amount of state (the previous 2 numbers), not all of the previously calculated values. Therefore you can do something like this:
let fibs = Seq.unfold (fun (i,j) -> Some(i,(j,i+j))) (1,1)
let fib n = Seq.nth n fibs
You could also do this more directly (without using Seq.unfold):
let fib =
let rec loop i j = function
| 0 -> i
| n -> loop j (i+j) (n-1)
loop 1 1
let fibs =
(1I,1I)
|> Seq.unfold (fun (n0, n1) -> Some (n0 , (n1, n0 + n1)))
|> Seq.cache
Taking inspiration from Tomas' answer here, and in an attempt to resolve the warning in my comment on said answer, I propose the following updated solution.
open System.Collections.Generic
let fib n =
let cache = new Dictionary<_, _>()
let memoize f c =
let succ, v = cache.TryGetValue c
if succ then v else
let v = f c
cache.Add(c, v)
v
let rec inner n =
match n with
| 1
| 2 -> bigint n
| n ->
memoize inner (n - 1) + memoize inner (n - 2)
inner n
This solution internalizes the memoization, and while doing so, allows the definitions of fib and inner to be functions, instead of fib being a recursive object, which allows the compiler to (I think) properly reason about the viability of the function calls.
I also return a bigint instead of an int, as int quickly overflows with even a small of n.
Edit: I should mention, however, that this solution still runs into stack overflow exceptions with sufficiently large values of n.
I am trying to learn F# so I paid a visit to Project Euler and I am currently working on Problem 3.
The prime factors of 13195 are 5, 7,
13 and 29.
What is the largest prime
factor of the number 600851475143?
Some things to consider:
My first priority is to learn good functional habits.
My second priority is I would like it to be fast and efficient.
Within the following code I have marked the section this question is regarding.
let isPrime(n:int64) =
let rec check(i:int64) =
i > n / 2L or (n % i <> 0L && check(i + 1L))
check(2L)
let greatestPrimeFactor(n:int64) =
let nextPrime(prime:int64):int64 =
seq { for i = prime + 1L to System.Int64.MaxValue do if isPrime(i) then yield i }
|> Seq.skipWhile(fun v -> n % v <> 0L)
|> Seq.hd
let rec findNextPrimeFactor(number:int64, prime:int64):int64 =
if number = 1L then prime else
//************* No variable
(fun p -> findNextPrimeFactor(number / p, p))(nextPrime(prime))
//*************
//************* Variable
let p = nextPrime(prime)
findNextPrimeFactor(number / p, p)
//*************
findNextPrimeFactor(n, 2L)
Update
Based off some of the feedback I have refactored the code to be 10 times faster.
module Problem3
module private Internal =
let execute(number:int64):int64 =
let rec isPrime(value:int64, current:int64) =
current > value / 2L or (value % current <> 0L && isPrime(value, current + 1L))
let rec nextPrime(prime:int64):int64 =
if number % prime = 0L && isPrime(prime, 2L) then prime else nextPrime(prime + 1L)
let rec greatestPrimeFactor(current:int64, prime:int64):int64 =
if current = 1L then prime else nextPrime(prime + 1L) |> fun p -> greatestPrimeFactor(current / p, p)
greatestPrimeFactor(number, 2L)
let execute() = Internal.execute(600851475143L)
Update
I would like to thank everyone for there advice. This latest version is a compilation of all the advice I received.
module Problem3
module private Internal =
let largestPrimeFactor number =
let rec isPrime value current =
current > value / 2L || (value % current <> 0L && isPrime value (current + 1L))
let rec nextPrime value =
if number % value = 0L && isPrime value 2L then value else nextPrime (value + 1L)
let rec find current prime =
match current / prime with
| 1L -> prime
| current -> nextPrime (prime + 1L) |> find current
find number (nextPrime 2L)
let execute() = Internal.largestPrimeFactor 600851475143L
Functional programming becomes easier and more automatic with practice, so don't sweat it if you don't get it absolutely right on the first try.
With that in mind, let's take your sample code:
let rec findNextPrimeFactor(number:int64, prime:int64):int64 =
if number = 1L then prime else
//************* No variable
(fun p -> findNextPrimeFactor(number / p, p))(nextPrime(prime))
//*************
//************* Variable
let p = nextPrime(prime)
findNextPrimeFactor(number / p, p)
//*************
Your no variable version is just weird, don't use it. I like your version with the explicit let binding.
Another way to write it would be:
nextPrime(prime) |> fun p -> findNextPrimeFactor(number / p, p)
Its ok and occasionally useful to write it like this, but still comes across as a little weird. Most of the time, we use |> to curry values without needing to name our variables (in "pointfree" style). Try to anticipate how your function will be used, and if possible, re-write it so you can use it with the pipe operator without explicit declared variables. For example:
let rec findNextPrimeFactor number prime =
match number / prime with
| 1L -> prime
| number' -> nextPrime(prime) |> findNextPrimeFactor number'
No more named args :)
Ok, now that we have that out of the way, let's look at your isPrime function:
let isPrime(n:int64) =
let rec check(i:int64) =
i > n / 2L or (n % i <> 0L && check(i + 1L))
check(2L)
You've probably heard to use recursion instead of loops, and that much is right. But, wherever possible, you should abstract away recursion with folds, maps, or higher order functions. Two reasons for this:
its a little more readable, and
improperly written recursion will result in a stack overflow. For example, your function is not tail recursive, so it'll blow up on large values of n.
I'd rewrite isPrime like this:
let isPrime n = seq { 2L .. n / 2L } |> Seq.exists (fun i -> n % i = 0L) |> not
Most of the time, if you can abstract away your explicit looping, then you're just applying transformations to your input sequence until you get your results:
let maxFactor n =
seq { 2L .. n - 1L } // test inputs
|> Seq.filter isPrime // primes
|> Seq.filter (fun x -> n % x = 0L) // factors
|> Seq.max // result
We don't even have intermediate variables in this version. Coolness!
My second priority is I would like it
to be fast and efficient.
Most of the time, F# is going to be pretty comparable with C# in terms of speed, or it's going to be "fast enough". If you find your code takes a long time to execute, it probably means you're using the wrong data structure or a bad algorithm. For a concrete example, read the comments on this question.
So, the code I've written is "elegant" in the sense that its concise, gives the correct results, and doesn't rely on any trickery. Unfortunately, its not very fast. For start:
it uses trial division to create a sequence of primes, when the Sieve of Eratosthenes would be much faster. [Edit: I wrote a somewhat naive version of this sieve which didn't work for numbers larger than Int32.MaxValue, so I've removed the code.]
read Wikipedia's article on the prime counting function, it'll give you pointers on calculating the first n primes as well as estimating the upper and lower bounds for the nth prime.
[Edit: I included some code with a somewhat naive implementation of a sieve of eratosthenes. It only works for inputs less than int32.MaxValue, so it probably isn't suitable for project euler.]
Concerning "good functional habit" or rather good practice I see three minor things. Using the yield in your sequence is a little harder to read than just filter. Unnecessary type annotations in a type inferred language leads to difficult refactoring and makes the code harder to read. Don't go overboard and try to remove every type annotation though if you're finding it difficult. Lastly making a lambda function which only takes a value to use as a temp variable reduces readability.
As far as personal style goes I prefer more spaces and only using tupled arguments when the data makes sense being grouped together.
I'd write your original code like this.
let isPrime n =
let rec check i =
i > n / 2L || (n % i <> 0L && check (i + 1L))
check 2L
let greatestPrimeFactor n =
let nextPrime prime =
seq {prime + 1L .. System.Int64.MaxValue}
|> Seq.filter isPrime
|> Seq.skipWhile (fun v -> n % v <> 0L)
|> Seq.head
let rec findNextPrimeFactor number prime =
if number = 1L then
prime
else
let p = nextPrime(prime)
findNextPrimeFactor (number / p) p
findNextPrimeFactor n 2L
Your updated code is optimal for your approach. You would have to use a different algorithm like Yin Zhu answer to go faster. I wrote a test to check to see if F# makes the "check" function tail recursive and it does.
the variable p is actually a name binding, not a variable. Using name binding is not a bad style. And it is more readable. The lazy style of nextPrime is good, and it actually prime-test each number only once during the whole program.
My Solution
let problem3 =
let num = 600851475143L
let rec findMax (n:int64) (i:int64) =
if n=i || n<i then
n
elif n%i=0L then
findMax (n/i) i
else
findMax n (i+1L)
findMax num 2L
I basically divides num from 2, 3, 4.. and don't consider any prime numbers. Because if we divides all 2 from num, then we won't be able to divide it by 4,8, etc.
on this number, my solution is quicker:
> greatestPrimeFactor 600851475143L;;
Real: 00:00:01.110, CPU: 00:00:00.702, GC gen0: 1, gen1: 1, gen2: 0
val it : int64 = 6857L
>
Real: 00:00:00.001, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val problem3 : int64 = 6857L
I think that the code with the temporary binding is significantly easier to read. It's pretty unusual to create an anonymous function and then immediately apply it to a value as you do in the other case. If you really want to avoid using a temporary value, I think that the most idiomatic way to do that in F# would be to use the (|>) operator to pipe the value into the anonymous function, but I still think that this isn't quite as readable.