The following F# code crashes on the third call with a no memory exception. Either I am missing something or Alea is not freeing memory correctly for some reason. I've tried it in both F# Interactive and Compiled. I've also tried calling dispose manually, but it did not work. Any idea why?
let squareGPU (inputs:float[]) =
use dInputs = worker.Malloc(inputs)
use dOutputs = worker.Malloc(inputs.Length)
let blockSize = 256
let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
let gridSize = Math.Min(16 * numSm, divup inputs.Length blockSize)
let lp = new LaunchParam(gridSize, blockSize)
worker.Launch <# squareKernel #> lp dOutputs.Ptr dInputs.Ptr inputs.Length
dOutputs.Gather()
let x = squareGPU [|0.0..0.001..100000.0|]
printfn "1"
let y = squareGPU [|0.0..0.001..100000.0|]
printfn "2"
let z = squareGPU [|0.0..0.001..100000.0|]
printfn "3"
I guess you got System.OutOfMemoryException, right? That doesn't mean GPU device memory running out, it means you are running out your host memory. in your example, you created a rather large array in host, and you calculate it, and you gather another large array as output. The point is, you use different variable name (x, y and z) to store the output array, and thus GC will have no chance to free it, so eventually you will run out your host memory.
I did a very simple test (I use the stop value 30000 instead 100000 as in your example), this test only uses host code, no GPU code:
let x1 = [|0.0..0.001..30000.0|]
printfn "1"
let x2 = [|0.0..0.001..30000.0|]
printfn "2"
let x3 = [|0.0..0.001..30000.0|]
printfn "3"
let x4 = [|0.0..0.001..30000.0|]
printfn "4"
let x5 = [|0.0..0.001..30000.0|]
printfn "5"
let x6 = [|0.0..0.001..30000.0|]
printfn "6"
And I ran this code in F# interactive (which is a 32bit process), I got this:
Microsoft (R) F# Interactive version 12.0.30815.0
Copyright (c) Microsoft Corporation. All Rights Reserved.
For help type #help;;
>
1
2
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
at System.Collections.Generic.List`1.Add(T item)
at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
at <StartupCode$FSI_0002>.$FSI_0002.main#() in C:\Users\Xiang\Documents\Inbox\ConsoleApplication6\Script1.fsx:line 32
Stopped due to error
>
So that means, after I created 2 such large array (x1 and x2), I ran out of the host memory.
To further confirm this, I use same variable name, which gives GC the chance to collect the old array, and this time it works:
let foo() =
let x = [|0.0..0.001..30000.0|]
printfn "1"
let x = [|0.0..0.001..30000.0|]
printfn "2"
let x = [|0.0..0.001..30000.0|]
printfn "3"
let x = [|0.0..0.001..30000.0|]
printfn "4"
let x = [|0.0..0.001..30000.0|]
printfn "5"
let x = [|0.0..0.001..30000.0|]
printfn "6"
>
val foo : unit -> unit
> foo()
;;
1
2
3
4
5
6
val it : unit = ()
>
and if I add GPU kernel it still works:
let foo() =
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "1"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "2"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "3"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "4"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "5"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "6"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "7"
let x = squareGPU [|0.0..0.001..30000.0|]
printfn "8"
> foo();;
1
2
3
4
5
6
7
8
val it : unit = ()
>
Alternatively, you can try to use 64bit process.
GC works in a separate background thread, so if you new huge arrays frequentely, it will easily throw that out of memory exception.
In this big array case, I suggest you to use a "in-place modification" style, that will be more stable. I created a test to show that: (NOTE, since the array is very big, you'd better go to project property page, in the Build tab, uncheck the "Prefer 32-bit", to make sure it runs as 64 bit process)
open System
open Alea.CUDA
open Alea.CUDA.Utilities
open NUnit.Framework
[<ReflectedDefinition>]
let squareKernel (outputs:deviceptr<float>) (inputs:deviceptr<float>) (n:int) =
let start = blockIdx.x * blockDim.x + threadIdx.x
let stride = gridDim.x * blockDim.x
let mutable i = start
while i < n do
outputs.[i] <- inputs.[i] * inputs.[i]
i <- i + stride
let squareGPUInplaceUpdate (worker:Worker) (lp:LaunchParam) (hData:float[]) (dData:DeviceMemory<float>) =
// instead of malloc a new device memory, you just reuse the device memory dData
// and scatter new data to it.
dData.Scatter(hData)
worker.Launch <# squareKernel #> lp dData.Ptr dData.Ptr hData.Length
// actually, there should be a counterpart of data.Scatter(hData) like data.Gather(hData)
// but unfortunately, that is missing, but there is a workaround of using worker.Gather.
worker.Gather(dData.Ptr, hData)
let squareGPUManyTimes (iters:int) =
let worker = Worker.Default
// actually during the many iters, you just malloc 2 host array (for data and expected value)
// and you malloc a device array. You keep reusing them, since they are big array.
// if you new the huge array very frequentely, GC is under pressure. and since GC works
// as a separate thread, so you will get System.OutOfMemoryException from time to time.
let hData = [|0.0..0.001..100000.0|]
let n = hData.Length
let expected = Array.zeroCreate n
use dData = worker.Malloc<float>(n)
let rng = Random()
let update () =
// in-place updating the data
for i = 0 to n - 1 do
hData.[i] <- rng.NextDouble()
expected.[i] <- hData.[i] * hData.[i]
let lp =
let blockSize = 256
let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
let gridSize = Math.Min(16 * numSm, divup n blockSize)
new LaunchParam(gridSize, blockSize)
for i = 1 to iters do
update()
squareGPUInplaceUpdate worker lp hData dData
Assert.AreEqual(expected, hData)
printfn "iter %d passed..." i
[<Test>]
let test() =
squareGPUManyTimes 5
Please note, exception System.OutOfMemoryException ALWAYS means host memory, the GPU memory will throw CUDAException if it find there is not enough memory.
BTW, each time you call DeviceMemory.Gather(), it will a new a .NET array and fill it. By using the in-place method shown in this example, you provide a .net array and let it to be filled by the data from the device.
Related
In F# the documentation provides two standard for loops. The for to expression is the loop which provides an index, incremented or decremented per item, depending on whether it is a for to or for downto expression.
I want to loop over an array and increment a variable amount of times; specifically twice. in C# this is very straight forward:
for(int i = 0; i < somelength; i += 2) { ... }
How would I achieve the same thing in F#?
You can specify the step using the following syntax:
for x in 0 .. 2 .. somelength do
printfn "%d" x
For more information, see the documentation for the for .. in expression. More generally, you can also use this for iterating over any sequence (IEnumerable), so this behaves more like C# foreach.
Tomas answer is correct and elegant it is worth considering that a in F# loop with an increment of 2 is slower than a loop with increment of 1.
Faster loops in F#:
let print x = printfn "%A" x
// Only increment by +1/-1 allowed for ints
let case0 () = for x = 0 to 10 do print x
let case1 () = for x = 10 downto 0 do print x
// Special handling in F# compiler ensures these are fast
let case2 () = for x in 0..10 do print x
let case3 (vs : int array) = for x in vs do print x
let case4 (vs : int list) = for x in vs do print x
let case5 (vs : string) = for x in vs do print x
Slower loops in F#:
let print x = printfn "%A" x
// Not int32s
let case0 () = for x in 0L..10L do print x
let case1 () = for x in 0s..10s do print x
let case2 () = for x in 0.0..10.0 do print x
// Not implicit +1/-1 increment
let case3 () = for x in 0..1..10 do print x
let case4 () = for x in 10..-1..0 do print x
let case5 () = for x in 0..2..10 do print x
let case6 () = for x in 10..-2..0 do print x
// Falls back on seq for all cases except arrays, lists and strings
let case7 (vs : int seq) = for x in vs do print x
let case8 (vs : int ResizeArray) = for x in vs do print x
// Very close to fast case 2 but creates an unnecessary list
let case9 () = for x in [0..10] do print x
When F# compiler don't have special handling to ensure quick iteration it falls back on generic code that looks a bit like this:
use e = (Operators.OperatorIntrinsics.RangeInt32 0 2 10).GetEnumerator()
while enumerator.MoveNext() do
print enumerator.Current
This might or might not be a problem to you but it's worth knowing about I think.
IMHO tail recursion is the way to loop as for and while has a kind of imperative taste to them and thanks to tail call optimization in F# tail recursion is fast if written correctly.
let rec loop i =
if i < someLength then
doSomething i
loop (i + 2)
loop 0
Tomas already answered your syntax question. Another answer suggests using tail recursion instead.
A third approach with a more f-sharpy feel to it would be something like this:
let myArray = [| 1; 2; 3 ; 4 |]
let stepper f step a =
a
|> Array.mapi (fun x i -> if i % step = 0 then Some (f x) else None)
|> Array.choose id
printfn "%A" <| stepper (fun x -> x * 2) 2 myArray
// prints [|2; 6|]
I'm kinda new to F# and trying out a simple calculator app. I take input from the user, and I want to the specific functions to be executed as per the input.
Right now, whenever I take any input from user, the program executes top to bottom. I want it to execute only specific functions matching with the input. Like if input is 6 then body of scientificFun() should be executed. Right now it executes all functions. Please help, I'm kinda stuck on this one!
The code is
open System
let mutable ok = true
while ok do
Console.WriteLine("Choose a operation:\n1.Addition\n2.Substraction\n3.Multiplication\n4.Division\n5.Modulo\n6.Scientific")
let input= Console.ReadLine()
let add =
Console.WriteLine("Ok, how many numbers?")
let mutable count = int32(Console.ReadLine())
let numberArray = Array.create count 0.0
for i in 0 .. numberArray.Length - 1 do
let no = float(Console.ReadLine())
Array.set numberArray i no
Array.sum numberArray
let sub x y = x - y
let mul x y = x * y
let div x y = x / y
let MOD x y = x % y
let scientificFun() =
printfn("1. Exponential")
match input with
| "1" -> printfn("The Result is: %f") (add)
| "6" -> (scientificFun())
| _-> printfn("Choose between 1 and 6")
Console.WriteLine("Would you like to use the calculator again? y/n")
let ans = Console.ReadLine()
if ans = "n" then
ok <- false
else Console.Clear()
You should define add as function: let add() = or let add inputNumbers =
Otherwise this simplified version below only executes the functions corresponding to the input number:
open System
[<EntryPoint>]
let main argv =
// define your functions
let hellofun() =
printfn "%A" "hello"
let worldfun() =
printfn "%A" "world"
let mutable cont = true
let run() = // set up the while loop
while cont do
printfn "%A" "\nChoose an operation:\n 1 hellofunc\n 2 worldfunc\n 3 exit"
let input = Console.ReadLine() // get the input
match input with // pattern match on the input to call the correct function
| "1" -> hellofun()
| "2" -> worldfun()
| "3" -> cont <- false;()
| _ -> failwith "Unknown input"
run() // kick-off the loop
0
The [<EntryPoint>] let main argv = is only necessary if you compile it. Otherwise just execute run().
I have a couple of books that I am going by, but as I am working on my F# problems, I find some difficulties in syntax here. If anyone thinks I should not be asking these questions here and have another book recommendation on a budget, please let me know.
Here is the code that reproduces the problem I am having with my project
[<EntryPoint>]
let main argv =
let mutable x = 0
let somefuncthattakesfunc v = ignore
let c() =
let y = x
ignore
somefuncthattakesfunc (fun () -> (x <- 1))
Console.ReadKey()
0 // return an integer exit code
I am getting the following compile error
The mutable variable 'x' is used in an invalid way. Mutable variables cannot be captured by closures. Consider eliminating this use of mutation or using a heap-allocated mutable reference cell via 'ref' and '!'.
Any clue ?
As the error explains, you can't close over mutable variables, which you are doing in:
let y = x
and
(fun () -> x = 1)
It suggests you use a ref instead if you need mutation:
let x = ref 0
let somefuncthattakesfunc v = ignore
let c() =
let y = !x
ignore
somefuncthattakesfunc (fun () -> x := 1)
As the error message says, mutable variables cannot be captured by closures, use a reference cell instead:
let main argv =
let x = ref 0
let somefuncthattakesfunc v = ignore
let c() =
let y = !x
ignore
somefuncthattakesfunc (fun () -> x := 1)
Console.ReadKey()
0 // return an integer exit code
Also see this answer.
Disclaim: This is micro-benchmark, please do not comment quotes such as "premature optimization is evil" if you feel unhappy about the topic.
Examples are release targeted for x64, .Net4.5 Visual Studio 2012 F# 3.0 and run in windows 7 x64
After profiling, I narrowed down the bottleneck of one of my applications, so that I want to raise this question:
Observation
If there is no loop inside for in loop or Seq.iter, then it is clear they are both of similar speed. (update2 vs update4)
If there is a loop inside for in loop or Seq.iter, it seems Seq.iter is 2x as faster as for in. (update vs update3) strange? (if run in fsi they would be similar)
If it is targeted for anycpu and run in x64, there is no difference in time. So the question becomes: Seq.iter (update3) would boost up 2x speed if target is x64
Time taken:
update: 00:00:11.4250483 // 2x as much as update3, why?
updatae2: 00:00:01.4447233
updatae3: 00:00:06.0863791
updatae4: 00:00:01.4939535
Source Code:
open System.Diagnostics
open System
[<EntryPoint>]
let main argv =
let pool = seq {1 .. 1000000}
let ret = Array.zeroCreate 100
let update pool =
for x in pool do
for y in 1 .. 200 do
ret.[2] <- x + y
let update2 pool =
for x in pool do
//for y in 1 .. 100 do
ret.[2] <- x
let update3 pool =
pool
|> Seq.iter (fun x ->
for y in 1 .. 200 do
ret.[2] <- x + y)
let update4 pool =
pool
|> Seq.iter (fun x ->
//for y in 1 .. 100 do
ret.[2] <- x)
let test n =
let run = match n with
| 1 -> update
| 2 -> update2
| 3 -> update3
| 4 -> update4
for i in 1 .. 50 do
run pool
let sw = new Stopwatch()
sw.Start()
test(1)
sw.Stop()
Console.WriteLine(sw.Elapsed);
sw.Restart()
test(2)
sw.Stop()
Console.WriteLine(sw.Elapsed)
sw.Restart()
test(3)
sw.Stop()
Console.WriteLine(sw.Elapsed)
sw.Restart()
test(4)
sw.Stop()
Console.WriteLine(sw.Elapsed)
0 // return an integer exit code
This isn't a complete answer, but hope it helps you to go further.
I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:
open System
let test1() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
for x in pool do
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test2() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
Seq.iter (fun x ->
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y) pool
let time f =
let sw = new Diagnostics.Stopwatch()
sw.Start()
let result = f()
sw.Stop()
Console.WriteLine(sw.Elapsed)
result
[<EntryPoint>]
let main argv =
time test1
time test2
0
In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:
00:00:06.9264843
00:00:03.6834886
Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:
let body (ret: _ []) x =
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test3() =
let ret = Array.zeroCreate 100
let pool = {1..1000000}
for x in pool do
body ret x
// 00:00:03.7012302
When I disable JIT optimization using the technique described here, execution times of these functions are comparable.
Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.
When I run the experiment on my machine (using F# 3.0 in VS 2012 in Release mode), I do not get the times you describe. Do you consistently get the same numbers when you run it repeatedly?
I tried it about 4 times and I always get numbers that are very similar. The version with Seq.iter tends to be slightly faster, but this is probably not statistically significant. Something like (using Stopwatch):
test(1) = 15321ms
test(2) = 5149ms
test(3) = 14290ms
test(4) = 4999ms
I'm running the test on a laptop with Intel Core2 Duo (2.26Ghz), using 64bit Windows 7.
Motivation
I have a long-running boolean function which should be executed in an array and I want to return immediately if an element in the array satisfies the condition. I would like to do the search in parallel and terminate other threads when the first complete thread returns an correct answer.
Question
What is a good way to implement parallel exists function in F#? Since my goal is performance, an efficient solution is preferred to an easy or idiomatic one.
Test case
Suppose that I want to find whether one value exists in an array or not. And the comparison function (equals) is simulated as a computation-expensive one:
open System.Diagnostics
open System.Threading
// Source at http://parallelpatterns.codeplex.com/releases/view/50473
let doCpuIntensiveOperation seconds (token:CancellationToken) throwOnCancel =
if (token.IsCancellationRequested) then
if (throwOnCancel) then token.ThrowIfCancellationRequested()
false
else
let ms = int64 (seconds * 1000.0)
let sw = new Stopwatch()
sw.Start()
let checkInterval = Math.Min(20000000, int (20000000.0 * seconds))
// Loop to simulate a computationally intensive operation
let rec loop i =
// Periodically check to see if the user has requested
// cancellation or if the time limit has passed
let check = seconds = 0.0 || i % checkInterval = 0
if check && token.IsCancellationRequested then
if throwOnCancel then token.ThrowIfCancellationRequested()
false
elif check && sw.ElapsedMilliseconds > ms then
true
else
loop (i + 1)
// Start the loop with 0 as the first value
loop 0
let inline equals x y =
doCpuIntensiveOperation 0.01 CancellationToken.None false |> ignore
x = y
The array consists of 1000 randomly generated elements and the searching value is guaranteed in the 2nd half of the array (so sequential search has to go through at least a half of the array):
let rand = new System.Random()
let m = 1000
let N = 1000000
let xs = [|for _ in 1..m -> rand.Next(N)|]
let i = rand.Next((m-1)/2, m-1);;
#time "on";;
let b1 = parallelExists (equals xs.[i]) xs;; // Parallel
let b2 = Array.exists (equals xs.[i]) xs;; // Sequential
I think you can take the following steps:
Spawn a number of workers (threads or async computations), and pass each an equal slice of the array and a cancellation token which will be shared by all workers
When a worker finds the searched item, it calls Cancel on the token (each worker should check the cancel state of the token on each iteration and bail if needed)
I don't have time at the moment to write the code, so there could be some detail I'm omitting.
This answer, and related question, may be helpful.
UPDATE
This is an example of what I'm thinking
open System
open System.Collections.Generic
open System.Threading
open System.Threading.Tasks
let getChunks size array =
let rec loop s n =
seq {
if n > 0 then
let r = n - size
if r > 0 then yield (s, size); yield! loop (s + size) r
else yield (s, size + r)
}
loop 0 (Array.length array)
[<Literal>]
let CHUNK_SIZE = 3
let parallelExists f (array:_[]) =
use cts = new CancellationTokenSource()
let rec checkSlice i n =
if n > 0 && not cts.IsCancellationRequested then
if f array.[i] then cts.Cancel()
else checkSlice (i + 1) (n - 1)
let workers =
array
|> getChunks CHUNK_SIZE
|> Seq.map (fun (s, c) -> Task.Factory.StartNew(fun () -> checkSlice s c))
|> Seq.toArray
try
Task.WaitAll(workers, cts.Token)
false
with :? OperationCanceledException -> true
Usage
let array = Array.init 10 id
let exists =
array |> parallelExists (fun i ->
Thread.Sleep(500)
i = 9)
printfn "%b" exists //true
The F# Powerpack has PSeq.exists which maps to PLINQ's ParallelEnumerable.Any which is part of the BCL. There's also ParallelEnumerable.First
I tried to decompile but wouldn't understand right away what was going on. So instead I went and executed the following side-effecting code to confirm that it's using some sort of cancellation once it found the element:
let elems = seq {
for x = 0 to 1000000 do
printfn "test"
yield x }
open System
open System.Linq;;
ParallelEnumerable.First (ParallelEnumerable.AsParallel(elems), Func<_,_>(fun x -> x = 1))