Problems Creating an Infinite Lazy List - f#

I completed the seventh Euler problem* in F# but am not entirely happy with my
implementation. In the function primes I create a sequence that I estimated would contain the 10,001st prime number. When I tried using Seq.initInfinite to lazily generate the candidate primes my code just hung before throwing an out of memory exception.
Could someone advise me on replacing the literal sequence with a lazily-generated sequence which is short-circuited once the desired prime is found?
let isPrime n =
let bound = int (sqrt (float n))
seq {2 .. bound} |> Seq.forall (fun x -> n % x <> 0)
let primeAsync n =
async { return (n, isPrime n)}
let primes =
{1..1000000}
|> Seq.map primeAsync
|> Async.Parallel
|> Async.RunSynchronously
|> Array.filter snd
|> Array.map fst
|> Array.mapi (fun i el -> (i, el))
|> Array.find (fun (fst, snd) -> fst = 10001)
primes
*"By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10,001st prime number?"

I think the problem is/was that Async.RunSynchronously isn't lazy and tried to evaluate the whole infinite sequence. Although there are better solutions for this, your algorithm is fast enough, so you don't even need parallelization; this works perfectly:
open System
let isPrime n =
let bound = n |> float |> sqrt |> int
seq {2 .. bound} |> Seq.forall (fun x -> n % x <> 0)
let prime =
Seq.initInfinite ((+) 2)
|> Seq.filter isPrime
|> Seq.skip 10000
|> Seq.head

The sequence gets 'reified' as soon as you feed it to Async.Parallel. If you want to minimise memory consumption, run the computation serially or split it into lazy chunks, the elements in each chunk to be run in parallel.

Related

Is it correct that Deedle/Series is slow compared to a list?

I am working on a data "intensive" app and I am not sure if I should use Series./DataFrame. It seems very interesting but it looks also way slower than the equivalent done with a List ... but I may not use the Series properly when I filter.
Please let me know what you think.
Thanks
type TSPoint<'a> =
{
Date : System.DateTime
Value : 'a
}
type TimeSerie<'a> = TSPoint<'a> list
let sd = System.DateTime(1950, 2, 1)
let tsd =[1..100000] |> List.map (fun x -> sd.AddDays(float x))
// creating a List of TSPoint
let tsList = tsd |> List.map (fun x -> {Date = x ; Value = 1.})
// creating the same as a serie
let tsSeries = Series(tsd , [1..100000] |> List.map (fun _ -> 1.))
// function to "randomise" the list of dates
let shuffleG xs = xs |> List.sortBy (fun _ -> Guid.NewGuid())
// new date list to search within out tsList and tsSeries
let d = tsd |> shuffleG |> List.take 1000
// Filter
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
Here is what I get:
List -> Real: 00:00:04.780, CPU: 00:00:04.508, GC gen0: 917, gen1: 2, gen2: 1
Series -> Real: 00:00:54.386, CPU: 00:00:49.311, GC gen0: 944, gen1: 7, gen2: 3
In general, Deedle series and data frames do have some extra overhead over writing hand-crafted code using whatever is the most efficient data structure for a given problem. The overhead is small for some operations and larger for some operations, so it depends on what you want to do and how you use Deedle.
If you use Deedle in a way in which it was intended to be used, then you'll get a good performance, but if you run a large number of operations that are not particularly efficient, you may get a bad performance.
In your particular case, you are running Series.filter on 1000 series and creating a new series (which is what happens behind the scenes here) does have some overhead.
However, what your code really does is that you are using Series.filter to find a value with a specific key. Deedle provides a key-based lookup operation for this (and it's one of the things it has been optimized for).
If you rewrite the code as follows, you'll get much better performance with Deedle than with list:
d |> List.map (fun x -> tsSeries.[x])
// 0.001 seconds
d |> List.map (fun x -> (tsSeries |> Series.filter (fun key _ -> key = x)))
// 3.46 seconds
d |> List.map (fun x -> (tsList |> List.filter (fun y -> y.Date = x)))
// 40.5 seconds

F#, loop control until n-2

I am currently learning functional programming and F#, and I want to do a loop control until n-2. For example:
Given a list of doubles, find the pairwise average,
e.g. pairwiseAverage [1.0; 2.0; 3.0; 4.0; 5.0] will give [1.5; 2.5; 3.5; 4.5]
After doing some experimenting and searching, I have a few ways to do it:
Method 1:
let pairwiseAverage (data: List<double>) =
[for j in 0 .. data.Length-2 do
yield (data.[j]+data.[j+1])/2.0]
Method 2:
let pairwiseAverage (data: List<double>) =
let averageWithNone acc next =
match acc with
| (_,None) -> ([],Some(next))
| (result,Some prev) -> ((prev+next)/2.0)::result,Some(next))
let resultTuple = List.fold averageWithNone ([],None) data
match resultTuple with
| (x,_) -> List.rev x
Method 3:
let pairwiseAverage (data: List<double>) =
// Get elements from 1 .. n-1
let after = List.tail data
// Get elements from 0 .. n-2
let before =
data |> List.rev
|> List.tail
|> List.rev
List.map2 (fun x y -> (x+y)/2.0) before after
I just like to know if there are other ways to approach this problem. Thank you.
Using only built-ins:
list |> Seq.windowed 2 |> Seq.map Array.average
Seq.windowed n gives you sliding windows of n elements each.
One simple other way is to use Seq.pairwise
something like
list |> Seq.pairwise |> Seq.map (fun (a,b) -> (a+b)/2.0)
The approaches suggested above are appropriate for short windows, like the one in the question. For windows with a length greater than 2 one cannot use pairwise. The answer by hlo generalizes to wider windows and is a clean and fast approach if window length is not too large. For very wide windows the code below runs faster, as it only adds one number and subtracts another one from the value obtained for the previous window. Notice that Seq.map2 (and Seq.map) automatically deal with sequences of different lengths.
let movingAverage (n: int) (xs: float List) =
let init = xs |> (Seq.take n) |> Seq.sum
let additions = Seq.map2 (fun x y -> x - y) (Seq.skip n xs) xs
Seq.fold (fun m x -> ((List.head m) + x)::m) [init] additions
|> List.rev
|> List.map (fun (x: float) -> x/(float n))
xs = [1.0..1000000.0]
movingAverage 1000 xs
// Real: 00:00:00.265, CPU: 00:00:00.265, GC gen0: 10, gen1: 10, gen2: 0
For comparison, the function above performs the calculation above about 60 times faster than the windowed equivalent:
let windowedAverage (n: int) (xs: float List) =
xs
|> Seq.windowed n
|> Seq.map Array.average
|> Seq.toList
windowedAverage 1000 xs
// Real: 00:00:15.634, CPU: 00:00:15.500, GC gen0: 74, gen1: 74, gen2: 71
I tried to eliminate List.rev using foldBack but did not succeed.
A point-free approach:
let pairwiseAverage = List.pairwise >> List.map ((<||) (+) >> (*) 0.5)
Online Demo
Usually not a better way, but another way regardless... ;-]

why does Seq.isEmpty say not enough elements?

nums is indeed seq of int when I mouse over. Any idea what's going on?
This function line is intended to be the equivalent of C#'s DefaultIfEmpty Linq function.
The general idea is take a space delimited line of strings and write out which ones occur count number of times.
code:
open System
[<EntryPoint>]
let main argv =
let tests = Console.ReadLine() |> int
for i in [0..tests] do
let (length, count) = Console.ReadLine()
|> (fun s -> s.Split [|' '|])
|> (fun split -> Int32.Parse(split.[0]), Int32.Parse(split.[1]))
Console.ReadLine()
|> (fun s -> s.Split [|' '|])
|> Seq.map int
|> Seq.take length
|> Seq.groupBy (fun x -> x)
|> Seq.map (fun (key, group) -> key, Seq.sum group)
|> Seq.where (fun (_, countx) -> countx = count)
|> Seq.map (fun (n, _) -> n)
|> (fun nums -> if Seq.isEmpty nums then "-1" else String.Join(" ", nums))
|> Console.WriteLine
0 // return an integer exit code
Sample input:
3
9 2
4 5 2 5 4 3 1 3 4
So, sequences in F# use lazy evaluation. That means that when you use functions such as map, where, take etc, the results are not evaluated immediately.
The results are only evaluated when the sequence is actually enumerated. When you call Seq.isEmpty you trigger a call to MoveNext() which results in the first element of the result sequence being evaluated - in your case this results in a large chain of functions being evaluated.
In this case, the InvalidOperationException is actually being triggered by Seq.take which throws if the sequence doesn't have sufficient elements. This might surprise you coming from C# where Enumerable.Take will take up to the requested number of elements but could take fewer if you reach the end of the sequence.
If you want this behaviour in F#, you need to replace Seq.take with Seq.truncate.

computing prime factors using same code produces different results?

I am basically trying to compute the factors of a BigInteger that are a prime, I have two simple factorization functions, they both look like they should produce the same result in the way I used them here down below but this is not the case here, can someone explain what is happening?
let lookupTable = new HashSet<int>(primes)
let isPrime x = lookupTable.Contains x
let factors (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..n]
let factors' (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..bigint(sqrt(float(n)))]
600851475143I
|> fun n -> bigint(sqrt(float(n)))
|> factors
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 137
600851475143I
|> factors'
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 6857 (the right answer)
Your functions are not equivalent. In the first function, the list of candidates goes to n, and the filter function also uses n for remainder calculation. The second function, however, also uses n for remainder calculation, but the candidates list goes to sqrt(n) instead.
To make the second function equivalent, you need to reformulate it like this:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> k % x = 0I) [1I..k]
Update, to clarify this somewhat:
In the above code, notice how k is used in two places: to produce the initial list of candidates and to calculate remainder within the filter function? This is precisely the change I made to your code: my code uses k in both places, but your code uses k in one place, but n in the other.
This is how your original function would look with k:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> n % x = 0I) [1I..k]
Notice how it uses k in one place, but n in the other.

How do I do in F# what would be called compression in APL?

In APL one can use a bit vector to select out elements of another vector; this is called compression. For example 1 0 1/3 5 7 would yield 3 7.
Is there a accepted term for this in functional programming in general and F# in particular?
Here is my F# program:
let list1 = [|"Bob"; "Mary"; "Sue"|]
let list2 = [|1; 0; 1|]
[<EntryPoint>]
let main argv =
0 // return an integer exit code
What I would like to do is compute a new string[] which would be [|"Bob"; Sue"|]
How would one do this in F#?
Array.zip list1 list2 // [|("Bob",1); ("Mary",0); ("Sue",1)|]
|> Array.filter (fun (_,x) -> x = 1) // [|("Bob", 1); ("Sue", 1)|]
|> Array.map fst // [|"Bob"; "Sue"|]
The pipe operator |> does function application syntactically reversed, i.e., x |> f is equivalent to f x. As mentioned in another answer, replace Array with Seq to avoid the construction of intermediate arrays.
I expect you'll find many APL primitives missing from F#. For lists and sequences, many can be constructed by stringing together primitives from the Seq, Array, or List modules, like the above. For reference, here is an overview of the Seq module.
I think the easiest is to use an array sequence expression, something like this:
let compress bits values =
[|
for i = 0 to bits.Length - 1 do
if bits.[i] = 1 then
yield values.[i]
|]
If you only want to use combinators, this is what I would do:
Seq.zip bits values
|> Seq.choose (fun (bit, value) ->
if bit = 1 then Some value else None)
|> Array.ofSeq
I use Seq functions instead of Array in order to avoid building intermediary arrays, but it would be correct too.
One might say this is more idiomatic:
Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None) list1 list2
|> Seq.choose id
|> Seq.toArray
EDIT (for the pipe lovers)
(list1, list2)
||> Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None)
|> Seq.choose id
|> Seq.toArray
Søren Debois' solution is good but, as he pointed out, but we can do better. Let's define a function, based on Søren's code:
let compressArray vals idx =
Array.zip vals idx
|> Array.filter (fun (_, x) -> x = 1)
|> Array.map fst
compressArray ends up creating a new array in each of the 3 lines. This can take some time, if the input arrays are long (1.4 seconds for 10M values in my quick test).
We can save some time by working on sequences and creating an array at the end only:
let compressSeq vals idx =
Seq.zip vals idx
|> Seq.filter (fun (_, x) -> x = 1)
|> Seq.map fst
This function is generic and will work on arrays, lists, etc. To generate an array as output:
compressSeq sq idx |> Seq.toArray
The latter saves about 40% of computation time (0.8s in my test).
As ildjarn commented, the function argument to filter can be rewritten to snd >> (=) 1, although that causes a slight performance drop (< 10%), probably because of the extra function call that is generated.

Resources