What's the style for immutable set and map in F# - f#

I have just solved problem23 in Project Euler, in which I need a set to store all abundant numbers. F# has a immutable set, I can use Set.empty.Add(i) to create a new set containing number i. But I don't know how to use immutable set to do more complicated things.
For example, in the following code, I need to see if a number 'x' could be written as the sum of two numbers in a set. I resort to a sorted array and array's binary search algorithm to get the job done.
Please also comment on my style of the following program. Thanks!
let problem23 =
let factorSum x =
let mutable sum = 0
for i=1 to x/2 do
if x%i=0 then
sum <- sum + i
sum
let isAbundant x = x < (factorSum x)
let abuns = {1..28123} |> Seq.filter isAbundant |> Seq.toArray
let inAbuns x = Array.BinarySearch(abuns, x) >= 0
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
the updated version:
let problem23b =
let factorSum x =
{1..x/2} |> Seq.filter (fun i->x%i=0) |> Seq.sum
let isAbundant x = x < (factorSum x)
let abuns = Set( {1..28123} |> Seq.filter isAbundant )
let inAbuns x = Set.contains x abuns
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
This version runs in about 27 seconds, while the first 23 seconds(I've run several times). So an immutable red-black tree actually does not have much speed down compared to a sorted array with binary search. The total number of elements in the set/array is 6965.

Your style looks fine to me. The different steps in the algorithm are clear, which is the most important part of making something work. This is also the tactic I use for solving Project Euler problems. First make it work, and then make it fast.
As already remarked, replacing Array.BinarySearch by Set.contains makes the code even more readable. I find that in almost all PE solutions I've written, I only use arrays for lookups. I've found that using sequences and lists as data structures is more natural within F#. Once you get used to them, that is.
I don't think using mutability inside a function is necessarily bad. I've optimized problem 155 from almost 3 minutes down to 7 seconds with some aggressive mutability optimizations. In general though, I'd save that as an optimization step and start out writing it using folds/filters etc. In the example case of problem 155, I did start out using immutable function composition, because it made testing and most importantly, understanding, my approach easy.
Picking the wrong algorithm is much more detrimental to a solution than using a somewhat slower immutable approach first. A good algorithm is still fast even if it's slower than the mutable version (couch hello captain obvious! cough).
Edit: let's look at your version
Your problem23b() took 31 seconds on my PC.
Optimization 1: use new algorithm.
//useful optimization: if m divides n, (n/m) divides n also
//you now only have to check m up to sqrt(n)
let factorSum2 n =
let rec aux acc m =
match m with
| m when m*m = n -> acc + m
| m when m*m > n -> acc
| m -> aux (acc + (if n%m=0 then m + n/m else 0)) (m+1)
aux 1 2
This is still very much in functional style, but using this updated factorSum in your code, the execution time went from 31 seconds to 8 seconds.
Everything's still in immutable style, but let's see what happens when an array lookup is used instead of a set:
Optimization 2: use an array for lookup:
let absums() =
//create abundant numbers as an array for (very) fast lookup
let abnums = [|1..28128|] |> Array.filter (fun n -> factorSum2 n > n)
//create a second lookup:
//a boolean array where arr.[x] = true means x is a sum of two abundant numbers
let arr = Array.zeroCreate 28124
for x in abnums do
for y in abnums do
if x+y<=28123 then arr.[x+y] <- true
arr
let euler023() =
absums() //the array lookup
|> Seq.mapi (fun i isAbsum -> if isAbsum then 0 else i) //mapi: i is the position in the sequence
|> Seq.sum
//I always write a test once I've solved a problem.
//In this way, I can easily see if changes to the code breaks stuff.
let test() = euler023() = 4179871
Execution time: 0.22 seconds (!).
This is what I like so much about F#, it still allows you to use mutable constructs to tinker under the hood of your algorithm. But I still only do this after I've made something more elegant work first.

You can easily create a Set from a given sequence of values.
let abuns = Set (seq {1..28123} |> Seq.filter isAbundant)
inAbuns would therefore be rewritten to
let inAbuns x = abuns |> Set.mem x
Seq.exists would be changed to Set.exists
But the array implementation is fine too ...
Note that there is no need to use mutable values in factorSum, apart from the fact that it's incorrect since you compute the number of divisors instead of their sum:
let factorSum x = seq { 1..x/2 } |> Seq.filter (fun i -> x % i = 0) |> Seq.sum

Here is a simple functional solution that is shorter than your original and over 100× faster:
let problem23 =
let rec isAbundant i t x =
if i > x/2 then x < t else
if x % i = 0 then isAbundant (i+1) (t+i) x else
isAbundant (i+1) t x
let xs = Array.Parallel.init 28124 (isAbundant 1 0)
let ys = Array.mapi (fun i b -> if b then Some i else None) xs |> Array.choose id
let f x a = x-a < 0 || not xs.[x-a]
Array.init 28124 (fun x -> if Array.forall (f x) ys then x else 0)
|> Seq.sum
The first trick is to record which numbers are abundant in an array indexed by the number itself rather than using a search structure. The second trick is to notice that all the time is spent generating that array and, therefore, to do it in parallel.

Related

Trying to filter out values in a sequence that are not in another sequence

I am trying to filter out values from a sequence, that are not in another sequence. I was pretty sure my code worked, but it is taking a long time to run on my computer and because of this I am not sure, so I am here to see what the community thinks.
Code is below:
let statezip =
StateCsv.GetSample().Rows
|> Seq.map (fun row -> row.State)
|> Seq.distinct
type State = State of string
let unwrapstate (State s) = s
let neededstates (row:StateCsv) = Seq.contains (unwrapstate row.State) statezip
I am filtering by the neededstates function. Is there something wrong with the way I am doing this?
let datafilter =
StateCsv1.GetSample().Rows
|> Seq.map (fun row -> row.State,row.Income,row.Family)
|> Seq.filter neededstates
|> List.ofSeq
I believe that it should filter the sequence by the values that are true, since neededstates function is a bool. StateCsv and StateCsv1 have the same exact structure, although from different years.
Evaluation of contains on sequences and lists can be slow. For a case where you want to check for the existence of an element in a collection, the F# Set type is ideal. You can convert your sequences to sets using Set.ofSeq, and then run the logic over the sets instead. The following example uses the numbers from 1 to 10000 and then uses both sequences and sets to filter the result to only the odd numbers by checking that the values are not in a collection of even numbers.
Using Sequences:
let numberSeq = {0..10000}
let evenNumberSeq = seq { for n in numberSeq do if (n % 2 = 0) then yield n }
#time
numberSeq |> Seq.filter (fun n -> evenNumberSeq |> Seq.contains n |> not) |> Seq.toList
#time
This runs in about 1.9 seconds for me.
Using sets:
let numberSet = numberSeq |> Set.ofSeq
let evenNumberSet = evenNumberSeq |> Set.ofSeq
#time
numberSet |> Set.filter (fun n -> evenNumberSet |> Set.contains n |> not)
#time
This runs in only 0.005 seconds. Hopefully you can materialize your sequences to sets before performing your contains operation, thereby getting this level of speedup.

computing prime factors using same code produces different results?

I am basically trying to compute the factors of a BigInteger that are a prime, I have two simple factorization functions, they both look like they should produce the same result in the way I used them here down below but this is not the case here, can someone explain what is happening?
let lookupTable = new HashSet<int>(primes)
let isPrime x = lookupTable.Contains x
let factors (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..n]
let factors' (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..bigint(sqrt(float(n)))]
600851475143I
|> fun n -> bigint(sqrt(float(n)))
|> factors
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 137
600851475143I
|> factors'
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 6857 (the right answer)
Your functions are not equivalent. In the first function, the list of candidates goes to n, and the filter function also uses n for remainder calculation. The second function, however, also uses n for remainder calculation, but the candidates list goes to sqrt(n) instead.
To make the second function equivalent, you need to reformulate it like this:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> k % x = 0I) [1I..k]
Update, to clarify this somewhat:
In the above code, notice how k is used in two places: to produce the initial list of candidates and to calculate remainder within the filter function? This is precisely the change I made to your code: my code uses k in both places, but your code uses k in one place, but n in the other.
This is how your original function would look with k:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> n % x = 0I) [1I..k]
Notice how it uses k in one place, but n in the other.

How do I do in F# what would be called compression in APL?

In APL one can use a bit vector to select out elements of another vector; this is called compression. For example 1 0 1/3 5 7 would yield 3 7.
Is there a accepted term for this in functional programming in general and F# in particular?
Here is my F# program:
let list1 = [|"Bob"; "Mary"; "Sue"|]
let list2 = [|1; 0; 1|]
[<EntryPoint>]
let main argv =
0 // return an integer exit code
What I would like to do is compute a new string[] which would be [|"Bob"; Sue"|]
How would one do this in F#?
Array.zip list1 list2 // [|("Bob",1); ("Mary",0); ("Sue",1)|]
|> Array.filter (fun (_,x) -> x = 1) // [|("Bob", 1); ("Sue", 1)|]
|> Array.map fst // [|"Bob"; "Sue"|]
The pipe operator |> does function application syntactically reversed, i.e., x |> f is equivalent to f x. As mentioned in another answer, replace Array with Seq to avoid the construction of intermediate arrays.
I expect you'll find many APL primitives missing from F#. For lists and sequences, many can be constructed by stringing together primitives from the Seq, Array, or List modules, like the above. For reference, here is an overview of the Seq module.
I think the easiest is to use an array sequence expression, something like this:
let compress bits values =
[|
for i = 0 to bits.Length - 1 do
if bits.[i] = 1 then
yield values.[i]
|]
If you only want to use combinators, this is what I would do:
Seq.zip bits values
|> Seq.choose (fun (bit, value) ->
if bit = 1 then Some value else None)
|> Array.ofSeq
I use Seq functions instead of Array in order to avoid building intermediary arrays, but it would be correct too.
One might say this is more idiomatic:
Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None) list1 list2
|> Seq.choose id
|> Seq.toArray
EDIT (for the pipe lovers)
(list1, list2)
||> Seq.map2 (fun l1 l2 -> if l2 = 1 then Some(l1) else None)
|> Seq.choose id
|> Seq.toArray
Søren Debois' solution is good but, as he pointed out, but we can do better. Let's define a function, based on Søren's code:
let compressArray vals idx =
Array.zip vals idx
|> Array.filter (fun (_, x) -> x = 1)
|> Array.map fst
compressArray ends up creating a new array in each of the 3 lines. This can take some time, if the input arrays are long (1.4 seconds for 10M values in my quick test).
We can save some time by working on sequences and creating an array at the end only:
let compressSeq vals idx =
Seq.zip vals idx
|> Seq.filter (fun (_, x) -> x = 1)
|> Seq.map fst
This function is generic and will work on arrays, lists, etc. To generate an array as output:
compressSeq sq idx |> Seq.toArray
The latter saves about 40% of computation time (0.8s in my test).
As ildjarn commented, the function argument to filter can be rewritten to snd >> (=) 1, although that causes a slight performance drop (< 10%), probably because of the extra function call that is generated.

FP - Condense and 'nice' code

Writing code in F# in most cases results in very condense an intuitive work. This piece of code looks somehow imperative and inconvenient to me.
times is an array of float values
Lines inside the file times.csv always look like that:
Mai 06 2011 05:43:45 nachm.,00:22.99
Mai 04 2011 08:59:12 nachm.,00:22.73
Mai 04 2011 08:58:27 nachm.,00:19.38
Mai 04 2011 08:57:54 nachm.,00:18.00
average generates an average of the values, dropping the lowest and highest time
getAllSubsetsOfLengthN creates a sequence of all consecutive subsets of length n. Is there a 'nicer' solution to that? Or does already exist something like that inside the F# core?
bestAverageOfN finds the lowest average of all the subsets
let times =
File.ReadAllLines "times.csv"
|> Array.map (fun l -> float (l.Substring((l.LastIndexOf ':') + 1)))
let average set =
(Array.sum set - Array.min set - Array.max set) / float (set.Length - 2)
let getAllSubsetsOfLengthN n (set:float list) =
seq { for i in [0 .. set.Length - n] -> set
|> Seq.skip i
|> Seq.take n }
let bestAverageOfN n =
times
|> Array.toList
|> getAllSubsetsOfLengthN n
|> Seq.map (fun t -> t
|> Seq.toArray
|> average)
|> Seq.min
What I am looking for are nicer, shorter or easier solutions. Every useful post will be upvoted, of course :)
I guess, getAllSubsetsOfLengthN can be replaced with Seq.windowed
so bestAverageOfN will look like:
let bestAverageOfN n =
times
|> Seq.windowed n
|> Seq.map average
|> Seq.min
Without much thinking, there are some basic functional refactorings you can make. For example, in the calculation of bestAverageOfN, you can use function composition:
let bestAverageOfN n =
times
|> Array.toList
|> getAllSubsetsOfLengthN n
|> Seq.map (Seq.toArray >> average)
|> Seq.min
Other than this and the suggestion by desco, I don't think there is anything I would change. If you don't use your special average function anywhere in the code, you could write it inline as a lambda function, but that really depends on your personal preferences.
Just for the sake of generality, I would probably make times an argument of bestAverageOfN:
let bestAverageOfN n times =
times
|> Seq.windowed n
|> Seq.map (fun set ->
(Array.sum set - Array.min set - Array.max set) / float (set.Length - 2))
|> Seq.min
Since you mentioned regex for parsing your input, I thought I'd show you such a solution. It may well be overkill, but it is also a more functional solution since regular expressions are declarative while substring stuff is more imperative. Regex is also nice since it is easier to grow if the structure of your input changes, index substring stuff can get messy, and I try to avoid it completely.
First a couple active patterns,
open System.Text.RegularExpressions
let (|Groups|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then
Some([for g in m.Groups -> g.Value] |> List.tail)
else
None
open System
let (|Float|_|) input =
match Double.TryParse(input) with
| true, value -> Some(value)
| _ -> None
Adopting #ildjarn's times implementation:
let times =
File.ReadAllLines "times.csv"
|> Array.map (function Groups #",.*?:(.*)$" [Float(value)] -> value)
Since bestAversageOfN has already been covered, here's an alternative implementation of times:
let times =
File.ReadAllLines "times.csv"
|> Array.map (fun l -> l.LastIndexOf ':' |> (+) 1 |> l.Substring |> float)

Get a random subset from a set in F#

I am trying to think of an elegant way of getting a random subset from a set in F#
Any thoughts on this?
Perhaps this would work: say we have a set of 2x elements and we need to pick a subset of y elements. Then if we could generate an x sized bit random number that contains exactly y 2n powers we effectively have a random mask with y holes in it. We could keep generating new random numbers until we get the first one satisfying this constraint but is there a better way?
If you don't want to convert to an array you could do something like this. This is O(n*m) where m is the size of the set.
open System
let rnd = Random(0);
let set = Array.init 10 (fun i -> i) |> Set.of_array
let randomSubSet n set =
seq {
let i = set |> Set.to_seq |> Seq.nth (rnd.Next(set.Count))
yield i
yield! set |> Set.remove i
}
|> Seq.take n
|> Set.of_seq
let result = set |> randomSubSet 3
for x in result do
printfn "%A" x
Agree with #JohannesRossel. There's an F# shuffle-an-array algorithm here you can modify suitably. Convert the Set into an array, and then loop until you've selected enough random elements for the new subset.
Not having a really good grasp of F# and what might be considered elegant there, you could just do a shuffle on the list of elements and select the first y. A Fisher-Yates shuffle even helps you in this respect as you also only need to shuffle y elements.
rnd must be out of subset function.
let rnd = new Random()
let rec subset xs =
let removeAt n xs = ( Seq.nth (n-1) xs, Seq.append (Seq.take (n-1) xs) (Seq.skip n xs) )
match xs with
| [] -> []
| _ -> let (rem, left) = removeAt (rnd.Next( List.length xs ) + 1) xs
let next = subset (List.of_seq left)
if rnd.Next(2) = 0 then rem :: next else next
Do you mean a random subset of any size?
For the case of a random subset of a specific size, there's a very elegant answer here:
Select N random elements from a List<T> in C#
Here it is in pseudocode:
RandomKSubset(list, k):
n = len(list)
needed = k
result = {}
for i = 0 to n:
if rand() < needed / (n-i)
push(list[i], result)
needed--
return result
Using Seq.fold to construct using lazy evaluation random sub-set:
let rnd = new Random()
let subset2 xs = let insertAt n xs x = Seq.concat [Seq.take n xs; seq [x]; Seq.skip n xs]
let randomInsert xs = insertAt (rnd.Next( (Seq.length xs) + 1 )) xs
xs |> Seq.fold randomInsert Seq.empty |> Seq.take (rnd.Next( Seq.length xs ) + 1)

Resources