How to combine large data sets

How to combine large data sets - f#

I need to full outer join two lists, parsed from a file, where each list has > 14 million entries. My first thought was using a simple solution using System.Linq
type AccountingData =
{ AccountingId: AccountingId
Year: int
Month: int
Value: decimal }
let combine xs ys =
let withDefaults (xs: AccountingData seq) (ys: AccountingData seq) =
query {
for x in xs do
leftOuterJoin y in ys on (x.AccountingId = y.AccountingId) into g
for y in g.DefaultIfEmpty() do
select (
if obj.ReferenceEquals(null, y) then
x, { x with Value = 0m }
else
x, y
)
}
Enumerable.Union
(withDefaults xs ys,
withDefaults ys xs |> Seq.map Tuple.swap)
let combined = combine xs ys
But this does not work well for such large data sets and it's extremely slow.
Maybe this could be rewritten using a tail recursive function? How?

If you need to process data that is too large to be handled conveniently in memory (especially if you need joins), it might be easier to use a database. However, in this case, I think you should be able to do what you need without a database.
The simplest approach that should be quite efficient would be to use a generic Dictionary<K, V> as a lookup table, populate this with values from ys and then find a matching value for each x. Something like:
let withDefaults (xs: AccountingData seq) (ys: AccountingData seq) =
// Create a lookup for finding 'ys' by accounting Id
let ylookup = Dictionary<_, _>()
for y in ys do ylookup.Add(y.AccountingId, y)
seq {
// Find a corresponding 'y' or default for each 'x'
for x in xs do
match ylookup.TryGetValue(x.AccountingId) with
| true, y -> x, y
| false, _ -> x, { x with Value = 0m } }
It is also worth thinking about whether you want to store data as a lazy sequence (i.e. seq<T>) or whether something else would be better - a lazy sequence has some nice benefits, but it can easily lead to a situation where you are re-running code multiple times by accident.

Related

F#, implement fold3, fold4, fold_n

I am interested to implement fold3, fold4 etc., similar to List.fold and List.fold2. e.g.
// TESTCASE
let polynomial (x:double) a b c = a*x + b*x*x + c*x*x*x
let A = [2.0; 3.0; 4.0; 5.0]
let B = [1.5; 1.0; 0.5; 0.2]
let C = [0.8; 0.01; 0.001; 0.0001]
let result = fold3 polynomial 0.7 A B C
// 2.0 * (0.7 ) + 1.5 * (0.7 )^2 + 0.8 * (0.7 )^3 -> 2.4094
// 3.0 * (2.4094) + 1.0 * (2.4094)^2 + 0.01 * (2.4094)^3 -> 13.173
// 4.0 * (13.173) + 0.5 * (13.173)^2 + 0.001 * (13.173)^3 -> 141.75
// 5.0 * (141.75) + 0.2 * (141.75)^2 + 0.0001 * (141.75)^3 -> 5011.964
//
// Output: result = 5011.964
My first method is grouping the 3 lists A, B, C, into a list of tuples, and then apply list.fold
let fold3 f x A B C =
List.map3 (fun a b c -> (a,b,c)) A B C
|> List.fold (fun acc (a,b,c) -> f acc a b c) x
// e.g. creates [(2.0,1.5,0.8); (3.0,1.0,0.01); ......]
My second method is to declare a mutable data, and use List.map3
let mutable result = 0.7
List.map3 (fun a b c ->
result <- polynomial result a b c // Change mutable data
// Output intermediate data
result) A B C
// Output from List.map3: [2.4094; 13.17327905; 141.7467853; 5011.963942]
// result mutable: 5011.963942
I would like to know if there are other ways to solve this problem. Thank you.

For fold3, you could just do zip3 and then fold:
let polynomial (x:double) (a, b, c) = a*x + b*x*x + c*x*x*x
List.zip3 A B C |> List.fold polynomial 0.7
But if you want this for the general case, then you need what we call "applicative functors".
First, imagine you have a list of functions and a list of values. Let's assume for now they're of the same size:
let fs = [ (fun x -> x+1); (fun x -> x+2); (fun x -> x+3) ]
let xs = [3;5;7]
And what you'd like to do (only natural) is to apply each function to each value. This is easily done with List.map2:
let apply fs xs = List.map2 (fun f x -> f x) fs xs
apply fs xs // Result = [4;7;10]
This operation "apply" is why these are called "applicative functors". Not just any ol' functors, but applicative ones. (the reason for why they're "functors" is a tad more complicated)
So far so good. But wait! What if each function in my list of functions returned another function?
let f1s = [ (fun x -> fun y -> x+y); (fun x -> fun y -> x-y); (fun x -> fun y -> x*y) ]
Or, if I remember that fun x -> fun y -> ... can be written in the short form of fun x y -> ...
let f1s = [ (fun x y -> x+y); (fun x y -> x-y); (fun x y -> x*y) ]
What if I apply such list of functions to my values? Well, naturally, I'll get another list of functions:
let f2s = apply f1s xs
// f2s = [ (fun y -> 3+y); (fun y -> 5+y); (fun y -> 7+y) ]
Hey, here's an idea! Since f2s is also a list of functions, can I apply it again? Well of course I can!
let ys = [1;2;3]
apply f2s ys // Result: [4;7;10]
Wait, what? What just happened?
I first applied the first list of functions to xs, and got another list of functions as a result. And then I applied that result to ys, and got a list of numbers.
We could rewrite that without intermediate variable f2s:
let f1s = [ (fun x y -> x+y); (fun x y -> x-y); (fun x y -> x*y) ]
let xs = [3;5;7]
let ys = [1;2;3]
apply (apply f1s xs) ys // Result: [4;7;10]
For extra convenience, this operation apply is usually expressed as an operator:
let (<*>) = apply
f1s <*> xs <*> ys
See what I did there? With this operator, it now looks very similar to just calling the function with two arguments. Neat.
But wait. What about our original task? In the original requirements we don't have a list of functions, we only have one single function.
Well, that can be easily fixed with another operation, let's call it "apply first". This operation will take a single function (not a list) plus a list of values, and apply this function to each value in the list:
let applyFirst f xs = List.map f xs
Oh, wait. That's just map. Silly me :-)
For extra convenience, this operation is usually also given an operator name:
let (<|>) = List.map
And now, I can do things like this:
let f x y = x + y
let xs = [3;5;7]
let ys = [1;2;3]
f <|> xs <*> ys // Result: [4;7;10]
Or this:
let f x y z = (x + y)*z
let xs = [3;5;7]
let ys = [1;2;3]
let zs = [1;-1;100]
f <|> xs <*> ys <*> zs // Result: [4;-7;1000]
Neat! I made it so I can apply arbitrary functions to lists of arguments at once!
Now, finally, you can apply this to your original problem:
let polynomial a b c (x:double) = a*x + b*x*x + c*x*x*x
let A = [2.0; 3.0; 4.0; 5.0]
let B = [1.5; 1.0; 0.5; 0.2]
let C = [0.8; 0.01; 0.001; 0.0001]
let ps = polynomial <|> A <*> B <*> C
let result = ps |> List.fold (fun x f -> f x) 0.7
The list ps consists of polynomial instances that are partially applied to corresponding elements of A, B, and C, and still expecting the final argument x. And on the next line, I simply fold over this list of functions, applying each of them to the result of the previous.

You could check the implementation for ideas:
https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs
let fold<'T,'State> (f : 'State -> 'T -> 'State) (acc: 'State) (array:'T[]) =
checkNonNull "array" array
let f = OptimizedClosures.FSharpFunc<_,_,_>.Adapt(f)
let mutable state = acc
for i = 0 to array.Length-1 do
state <- f.Invoke(state,array.[i])
state
here's a few implementations for you:
let fold2<'a,'b,'State> (f : 'State -> 'a -> 'b -> 'State) (acc: 'State) (a:'a array) (b:'b array) =
let mutable state = acc
Array.iter2 (fun x y->state<-f state x y) a b
state
let iter3 f (a: 'a[]) (b: 'b[]) (c: 'c[]) =
let f = OptimizedClosures.FSharpFunc<_,_,_,_>.Adapt(f)
if a.Length <> b.Length || a.Length <> c.Length then failwithf "length"
for i = 0 to a.Length-1 do
f.Invoke(a.[i], b.[i], c.[i])
let altIter3 f (a: 'a[]) (b: 'b[]) (c: 'c[]) =
if a.Length <> b.Length || a.Length <> c.Length then failwithf "length"
for i = 0 to a.Length-1 do
f (a.[i]) (b.[i]) (c.[i])
let fold3<'a,'b,'State> (f : 'State -> 'a -> 'b -> 'c -> 'State) (acc: 'State) (a:'a array) (b:'b array) (c:'c array) =
let mutable state = acc
iter3 (fun x y z->state<-f state x y z) a b c
state
NB. we don't have an iter3, so, implement that. OptimizedClosures.FSharpFunc only allow up to 5 (or is it 7?) params. There are a finite number of type slots available. It makes sense. You can go higher than this, of course, without using the OptimizedClosures stuff.
... anyway, generally, you don't want to be iterating too many lists / arrays / sequences at once. So I'd caution against going too high.
... the better way forward in such cases may be to construct a record or tuple from said lists / arrays, first. Then, you can just use map and iter, which are already baked in. This is what zip / zip3 are all about (see: "(array1.[i],array2.[i],array3.[i])")
let zip3 (array1: _[]) (array2: _[]) (array3: _[]) =
checkNonNull "array1" array1
checkNonNull "array2" array2
checkNonNull "array3" array3
let len1 = array1.Length
if len1 <> array2.Length || len1 <> array3.Length then invalidArg3ArraysDifferent "array1" "array2" "array3" len1 array2.Length array3.Length
let res = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked len1
for i = 0 to res.Length-1 do
res.[i] <- (array1.[i],array2.[i],array3.[i])
res
I'm working with arrays at the moment, so my solution pertained to those. Sorry about that. Here's a recursive version for lists.
let fold3 f acc a b c =
let mutable state = acc
let rec fold3 f a b c =
match a,b,c with
| [],[],[] -> ()
| [],_,_
| _,[],_
| _,_,[] -> failwith "length"
| ahead::atail, bhead::btail, chead::ctail ->
state <- f state ahead bhead chead
fold3 f atail btail ctail
fold3 f a b c
i.e. we define a recursive function within a function which acts upon/mutates/changes the outer scoped mutable acc variable (a closure in functional speak). Finally, this gets returned.
It's pretty cool how much type information gets inferred about these functions. In the array examples above, mostly I was explicit with 'a 'b 'c. This time, we let type inference kick in. It knows we're dealing with lists from the :: operator. That's kind of neat.
NB. the compiler will probably unwind this tail-recursive approach so that it is just a loop behind-the-scenes. Generally, get a correct answer before optimising. Just mentioning this, though, as food for later thought.

I think the existing answers provide great options if you want to generalize folding, which was your original question. However, if I simply wanted to call the polynomial function on inputs specified in A, B and C, then I would probably do not want to introduce fairly complex constructs like applicative functors with fancy operators to my code base.
The problem becomes a lot easier if you transpose the input data, so that rather than having a list [A; B; C] with lists for individual variables, you have a transposed list with inputs for calculating each polynomial. To do this, we'll need the transpose function:
let rec transpose = function
| (_::_)::_ as M -> List.map List.head M :: transpose (List.map List.tail M)
| _ -> []
Now you can create a list with inputs, transpose it and calculate all polynomials simply using List.map:
transpose [A; B; C]
|> List.map (function
| [a; b; c] -> polynomial 0.7 a b c
| _ -> failwith "wrong number of arguments")

There are many ways to solve this problem. Few are mentioned like first zip3 all three list, then run over it. Using Applicate Functors like Fyodor Soikin describes means you can turn any function with any amount of arguments into a function that expects list instead of single arguments. This is a good general solution that works with any numbers of lists.
While this is a general good idea, i'm sometimes shocked that so few use more low-level tools. In this case it is a good idea to use recursion and learn more about recursion.
Recursion here is the right-tool because we have immutable data-types. But you could consider how you would implement it with mutable lists and looping first, if that helps. The steps would be:
You loop over an index from 0 to the amount of elements in the lists.
You check if every list has an element for the index
If every list has an element then you pass this to your "folder" function
If at least one list don't have an element, then you abort the loop
The recursive version works exactly the same. Only that you don't use an index to access the elements. You would chop of the first element from every list and then recurse on the remaining list.
Otherwise List.isEmpty is the function to check if a List is empty. You can chop off the first element with List.head and you get the remaining list with the first element removed by List.tail. This way you can just write:
let rec fold3 f acc l1 l2 l3 =
let h = List.head
let t = List.tail
let empty = List.isEmpty
if (empty l1) || (empty l2) && (empty l3)
then acc
else fold3 f (f acc (h l1) (h l2) (h l3)) (t l1) (t l2) (t l3)
The if line checks if every list has at least one element. If that is true
it executes: f acc (h l1) (h l2) (h l3). So it executes f and passes it the first element of every list as an argument. The result is the new accumulator of
the next fold3 call.
Now that you worked on the first element of every list, you must chop off the first element of every list, and continue with the remaining lists. You achieve that with List.tail or in the above example (t l1) (t l2) (t l3). Those are the next remaining lists for the next fold3 call.
Creating a fold4, fold5, fold6 and so on isn't really hard, and I think it is self-explanatory. My general advice is to learn a little bit more about recursion and try to write recursive List functions without Pattern Matching. Pattern Matching is not always easier.
Some code examples:
fold3 (fun acc x y z -> x + y + z :: acc) [] [1;2;3] [10;20;30] [100;200;300] // [333;222;111]
fold3 (fun acc x y z -> x :: y :: z :: acc) [] [1;2;3] [10;20;30] [100;200;300] // [3; 30; 300; 2; 20; 200; 1; 10; 100]

Self-reference in F# sequence expression

Is there a way to have a self-reference in F# sequence expression? For example:
[for i in 1..n do if _f(i)_not_in_this_list_ do yield f(i)]
which prevents inserting duplicate elements.
EDIT: In general case, I would like to know the contents of this_list before applying f(), which is very computationally expensive.
EDIT: I oversimplified in the example above. My specific case is a computationally expensive test T (T: int -> bool) having a property T(i) => T(n*i) so the code snippet is:
[for i in 1..n do if _i_not_in_this_list_ && T(i) then for j in i..i..n do yield j]
The goal is to reduce the number of T() applications and use concise notation. I accomplished the former by using a mutable helper array:
let mutable notYet = Array.create n true
[for i in 1..n do if notYet.[i] && T(i) then for j in i..i..n do yield j; notYet.[j] <- false]

You can have recursive sequence expression e.g.
let rec allFiles dir =
seq { yield! Directory.GetFiles dir
for d in Directory.GetDirectories dir do
yield! allFiles d }
but circular reference is not possible.
An alternative is to use Seq.distinct from Seq module:
seq { for i in 1..n -> f i }
|> Seq.distinct
or to convert sequence to set using Set.ofSeq before consumption as per #John's comment.

You may also decide to maintain information about the previously generated elements in an explicit way; for example:
let genSeq n =
let elems = System.Collections.Generic.HashSet()
seq {
for i in 1..n do
if not (elems.Contains(i)) then
elems.Add(i) |> ignore
yield i
}

There are several considerations here.
First, you can't check if f(i) is in a list or not before actually computing f(i). So I guess you meant that your check function is expensive, not f(i) itself. Correct me if I'm wrong.
Second, if check is indeed very computationally expensive, you may look for a more effective algorithm. There's no guarantee you will find one for every sequence, but they often exist. Then your code will be nothing but a single Seq.unfold.
Third. When there's no such optimization, you may take another approach. Within [for...yield], you only build a current element and you can't access prior ones. Instead of returning an element, building an entire list manually seems to be the way to go:
// a simple algorithm checking if some F(x) exists in a sequence somehow
let check (x:string) xs = Seq.forall (fun el -> not (x.Contains el)) xs
// a converter i -> something else
let f (i: int) = i.ToString()
let generate f xs =
let rec loop ys = function
| [] -> List.rev ys
| x::t ->
let y = f x
loop (if check y ys then y::ys else ys) t
loop [] xs
// usage
[0..3..1000] |> generate f |> List.iter (printf "%O ")

Lexicographic sorting in F#

I am playing with a toy problem (Convex hull identification) and needed lexicographic sorting twice already. One of the cases was given a list of type Point = { X: float; Y: float }, I would like to sort by X coordinate, and in case of equality, by Y coordinate.
I ended up writing the following:
let rec lexiCompare comparers a b =
match comparers with
[ ] -> 0
| head :: tail ->
if not (head a b = 0) then head a b else
lexiCompare tail a b
let xComparer p1 p2 =
if p1.X > p2.X then 1 else
if p1.X < p2.X then -1 else
0
let yComparer p1 p2 =
if p1.Y > p2.Y then 1 else
if p1.Y < p2.Y then -1 else
0
let coordCompare =
lexiCompare [ yComparer; xComparer ]
Which allows me to do
let lowest (points: Point list) =
List.sortWith coordCompare points
|> List.head
So far, so good. However, this feels a bit heavy-handed. I have to create specific comparers returning -1, 0 or 1, and so far I can't see a straightforward way to use this in cases like List.minBy. Ideally, I would like to do something along the lines of providing a list of functions that can be compared (like [(fun p -> p.X); (fun p -> p.Y)]) and do something like lexicographic min of a list of items supporting that list of functions.
Is there a way to achieve this in F#? Or am I thinking about this incorrectly?

Is there a way to achieve this in F#? Or am I thinking about this incorrectly?
F# does this for you automatically when you define a record type like yours:
> type Point = { X: float; Y: float };;
type Point =
{X: float;
Y: float;}
You can immediately start comparing values. For example, defining a 3-element list of points and sorting it into lexicographic order using the built-in List.sort:
> [ { X = 2.0; Y = 3.0 }
{ X = 2.0; Y = 2.0 }
{ X = 1.0; Y = 3.0 } ]
|> List.sort;;
val it : Point list = [{X = 1.0;
Y = 3.0;}; {X = 2.0;
Y = 2.0;}; {X = 2.0;
Y = 3.0;}]
Note that the results were sorted first by X and then by Y.
You can compare two values of any comparable type using the built-in compare function.
If you want to use a custom ordering then you have two options. If you want to do all of your operations using your custom total order then it belongs in the type definition as an implementation of IComparable and friends. If you want to use a custom ordering for a few operations then you can use higher-order functions like List.sortBy and List.sortWith. For example, List.sortBy (fun p -> p.Y, p.X) will sort by Y and then X because F# generates the lexicographic comparison over 2-tuples for you (!).
This is one of the big advantages of F#.

Well, to start with, you can rely on F#'s built-in compare function:
let xComparer p1 p2 = compare p1.X p2.X
let yComparer p1 p2 = compare p1.Y p2.Y
Alternatively, you can clearly abstract this a bit if desired:
let compareWith f a b = compare (f a) (f b)
let xComparer = compareWith (fun p -> p.X)
let yComparer = compareWith (fun p -> p.Y)
Or, as you note, you could build this approach directly into the list handling function:
let rec lexiCompareWith l a b =
match l with
| [] -> 0
| f::fs ->
match compare (f a) (f b) with
| 0 -> lexiCompareWith fs a b
| n -> n
One important limitation here is that since you're putting them into a list, the functions must all have identical return types. This isn't a problem in your Point example (since both functions have type Point -> float), but it would prevent you from sorting two Person objects by name and then age (since the first projection would have type Person -> string but the second would have type Person -> int).

I don't think I understand your question correctly, but doesn't the following code work fine?
let lowest (points : Point list) = List.sort points |> List.head
It seems that F# performs implicit comparison on record data types. And my little experiment indicates that the comparison happens to be lexicographic. But I could not find any evidence to support that result.
So I'm not yet sure F# compares records lexicographically. I can still write in the following manner using tuple instead:
let lowest (points : Point list) =
let tuple = List.map (fun pt -> (pt.X, pt.Y)) points |> List.sort |> List.head
{ X = fst tuple; Y = snd tuple }
I hope this post could help.

What's the style for immutable set and map in F#

I have just solved problem23 in Project Euler, in which I need a set to store all abundant numbers. F# has a immutable set, I can use Set.empty.Add(i) to create a new set containing number i. But I don't know how to use immutable set to do more complicated things.
For example, in the following code, I need to see if a number 'x' could be written as the sum of two numbers in a set. I resort to a sorted array and array's binary search algorithm to get the job done.
Please also comment on my style of the following program. Thanks!
let problem23 =
let factorSum x =
let mutable sum = 0
for i=1 to x/2 do
if x%i=0 then
sum <- sum + i
sum
let isAbundant x = x < (factorSum x)
let abuns = {1..28123} |> Seq.filter isAbundant |> Seq.toArray
let inAbuns x = Array.BinarySearch(abuns, x) >= 0
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
the updated version:
let problem23b =
let factorSum x =
{1..x/2} |> Seq.filter (fun i->x%i=0) |> Seq.sum
let isAbundant x = x < (factorSum x)
let abuns = Set( {1..28123} |> Seq.filter isAbundant )
let inAbuns x = Set.contains x abuns
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
This version runs in about 27 seconds, while the first 23 seconds(I've run several times). So an immutable red-black tree actually does not have much speed down compared to a sorted array with binary search. The total number of elements in the set/array is 6965.

Your style looks fine to me. The different steps in the algorithm are clear, which is the most important part of making something work. This is also the tactic I use for solving Project Euler problems. First make it work, and then make it fast.
As already remarked, replacing Array.BinarySearch by Set.contains makes the code even more readable. I find that in almost all PE solutions I've written, I only use arrays for lookups. I've found that using sequences and lists as data structures is more natural within F#. Once you get used to them, that is.
I don't think using mutability inside a function is necessarily bad. I've optimized problem 155 from almost 3 minutes down to 7 seconds with some aggressive mutability optimizations. In general though, I'd save that as an optimization step and start out writing it using folds/filters etc. In the example case of problem 155, I did start out using immutable function composition, because it made testing and most importantly, understanding, my approach easy.
Picking the wrong algorithm is much more detrimental to a solution than using a somewhat slower immutable approach first. A good algorithm is still fast even if it's slower than the mutable version (couch hello captain obvious! cough).
Edit: let's look at your version
Your problem23b() took 31 seconds on my PC.
Optimization 1: use new algorithm.
//useful optimization: if m divides n, (n/m) divides n also
//you now only have to check m up to sqrt(n)
let factorSum2 n =
let rec aux acc m =
match m with
| m when m*m = n -> acc + m
| m when m*m > n -> acc
| m -> aux (acc + (if n%m=0 then m + n/m else 0)) (m+1)
aux 1 2
This is still very much in functional style, but using this updated factorSum in your code, the execution time went from 31 seconds to 8 seconds.
Everything's still in immutable style, but let's see what happens when an array lookup is used instead of a set:
Optimization 2: use an array for lookup:
let absums() =
//create abundant numbers as an array for (very) fast lookup
let abnums = [|1..28128|] |> Array.filter (fun n -> factorSum2 n > n)
//create a second lookup:
//a boolean array where arr.[x] = true means x is a sum of two abundant numbers
let arr = Array.zeroCreate 28124
for x in abnums do
for y in abnums do
if x+y<=28123 then arr.[x+y] <- true
arr
let euler023() =
absums() //the array lookup
|> Seq.mapi (fun i isAbsum -> if isAbsum then 0 else i) //mapi: i is the position in the sequence
|> Seq.sum
//I always write a test once I've solved a problem.
//In this way, I can easily see if changes to the code breaks stuff.
let test() = euler023() = 4179871
Execution time: 0.22 seconds (!).
This is what I like so much about F#, it still allows you to use mutable constructs to tinker under the hood of your algorithm. But I still only do this after I've made something more elegant work first.

You can easily create a Set from a given sequence of values.
let abuns = Set (seq {1..28123} |> Seq.filter isAbundant)
inAbuns would therefore be rewritten to
let inAbuns x = abuns |> Set.mem x
Seq.exists would be changed to Set.exists
But the array implementation is fine too ...
Note that there is no need to use mutable values in factorSum, apart from the fact that it's incorrect since you compute the number of divisors instead of their sum:
let factorSum x = seq { 1..x/2 } |> Seq.filter (fun i -> x % i = 0) |> Seq.sum

Here is a simple functional solution that is shorter than your original and over 100× faster:
let problem23 =
let rec isAbundant i t x =
if i > x/2 then x < t else
if x % i = 0 then isAbundant (i+1) (t+i) x else
isAbundant (i+1) t x
let xs = Array.Parallel.init 28124 (isAbundant 1 0)
let ys = Array.mapi (fun i b -> if b then Some i else None) xs |> Array.choose id
let f x a = x-a < 0 || not xs.[x-a]
Array.init 28124 (fun x -> if Array.forall (f x) ys then x else 0)
|> Seq.sum
The first trick is to record which numbers are abundant in an array indexed by the number itself rather than using a search structure. The second trick is to notice that all the time is spent generating that array and, therefore, to do it in parallel.

Get a random subset from a set in F#

I am trying to think of an elegant way of getting a random subset from a set in F#
Any thoughts on this?
Perhaps this would work: say we have a set of 2x elements and we need to pick a subset of y elements. Then if we could generate an x sized bit random number that contains exactly y 2n powers we effectively have a random mask with y holes in it. We could keep generating new random numbers until we get the first one satisfying this constraint but is there a better way?

If you don't want to convert to an array you could do something like this. This is O(n*m) where m is the size of the set.
open System
let rnd = Random(0);
let set = Array.init 10 (fun i -> i) |> Set.of_array
let randomSubSet n set =
seq {
let i = set |> Set.to_seq |> Seq.nth (rnd.Next(set.Count))
yield i
yield! set |> Set.remove i
}
|> Seq.take n
|> Set.of_seq
let result = set |> randomSubSet 3
for x in result do
printfn "%A" x

Agree with #JohannesRossel. There's an F# shuffle-an-array algorithm here you can modify suitably. Convert the Set into an array, and then loop until you've selected enough random elements for the new subset.

Not having a really good grasp of F# and what might be considered elegant there, you could just do a shuffle on the list of elements and select the first y. A Fisher-Yates shuffle even helps you in this respect as you also only need to shuffle y elements.

rnd must be out of subset function.
let rnd = new Random()
let rec subset xs =
let removeAt n xs = ( Seq.nth (n-1) xs, Seq.append (Seq.take (n-1) xs) (Seq.skip n xs) )
match xs with
| [] -> []
| _ -> let (rem, left) = removeAt (rnd.Next( List.length xs ) + 1) xs
let next = subset (List.of_seq left)
if rnd.Next(2) = 0 then rem :: next else next

Do you mean a random subset of any size?
For the case of a random subset of a specific size, there's a very elegant answer here:
Select N random elements from a List<T> in C#
Here it is in pseudocode:
RandomKSubset(list, k):
n = len(list)
needed = k
result = {}
for i = 0 to n:
if rand() < needed / (n-i)
push(list[i], result)
needed--
return result

Using Seq.fold to construct using lazy evaluation random sub-set:
let rnd = new Random()
let subset2 xs = let insertAt n xs x = Seq.concat [Seq.take n xs; seq [x]; Seq.skip n xs]
let randomInsert xs = insertAt (rnd.Next( (Seq.length xs) + 1 )) xs
xs |> Seq.fold randomInsert Seq.empty |> Seq.take (rnd.Next( Seq.length xs ) + 1)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to combine large data sets - f#

Related

F#, implement fold3, fold4, fold_n

Self-reference in F# sequence expression

Lexicographic sorting in F#

What's the style for immutable set and map in F#

Get a random subset from a set in F#

Categories

Resources