Does F# do automatic memoisation? - f#

I have this code:
for i in 1 .. 10 do
let (tree, interval) = time (fun () -> insert [12.; 6. + 1.0] exampletree 128.)
printfn "insertion time: %A" interval.TotalMilliseconds
()
with the time function defined as
let time f =
let start = DateTime.Now
let res = f ()
let finish = DateTime.Now
(res, finish - start)
the function insert is not relevant here, other than the fact that it doesn't employ mutation and thus returns the same value every time.
I get the results:
insertion time: 218.75
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
insertion time: 0.0
The question is why does the code calculate the result only once (from the insertion time, the result is always correct and equal)? Also, how to force the program to do computation multiple times (I need that for profiling purposes)?
Edit: Jared has supplied the right answer. Now that I know what to look for, I can get the stopwatch code from a timeit function for F#
I had the following results:
insertion time: 243.4247
insertion time: 0.0768
insertion time: 0.0636
insertion time: 0.0617
insertion time: 0.065
insertion time: 0.0564
insertion time: 0.062
insertion time: 0.069
insertion time: 0.0656
insertion time: 0.0553

F# does not do automatic memoization of your functions. In this case memo-ization would be incorrect. Even though you don't mutate items directly you are accessing a mutable value (DateTime.Now) from within your function. Memoizing that or a function accessing it would be a mistake since it can change from call to call.
What you're seeing here is an effect of the .Net JIT. The first time this is run the function f() is JIT'd and produces a noticable delay. The other times it's already JIT'd and executes a time which is smaller than the granularity of DateTime
One way to prove this is to use a more granular measuring class like StopWatch. This will show the function executes many times.

The first timing is probably due to JIT compilation. The actual code you're timing probably runs in less time than DateTime is able to measure.
Edit: Beaten by 18 secs... I'm just glad I had the right idea :)

Related

Is there an equivalent to Akka Streams' `conflate` and/or `batch` operators in Reactor?

I am looking for an equivalent of the batch and conflate operators from Akka Streams in Project Reactor, or some combination of operators that mimic their behavior.
The idea is to aggregate upstream items when the downstream backpressures in a reduce-like manner.
Note that this is different from this question because the throttleLatest / conflate operator described there is different from the one in Akka Streams.
Some background regarding what I need this for:
I am watching a change stream on a MongoDB and for every change I run an aggregate query on the MongoDB to update some metric. When lots of changes come in, the queries can't keep up and I'm getting errors. As I only need the latest value of the aggregate query, it is fine to aggregate multiple change events and run the aggregate query less often, but I want the metric to be as up-to-date as possible so I want to avoid waiting a fixed amount of time when there is no backpressure.
The closest I could come so far is this:
changeStream
.window(Duration.ofSeconds(1))
.concatMap { it.reduce(setOf<String>(), { applicationNames, event -> applicationNames + event.body.sourceReference.applicationName }) }
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
but this would always wait for 1 second regardless of backpressure.
What I would like is something like this:
changeStream
.conflateWithSeed({setOf(it.body.sourceReference.applicationName)}, {applicationNames, event -> applicationNames + event.body.sourceReference.applicationName})
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
I assume you always run only 1 query at the same time - no parallel execution. My idea is to buffer elements in list(which can be easily aggregated) as long as the query is running. As soon as the query finishes, another list is executed.
I tested it on a following code:
boolean isQueryRunning = false;
Flux.range(0, 1000000)
.delayElements(Duration.ofMillis(10))
.bufferUntil(aLong -> !isQueryRunning)
.doOnNext(integers -> isQueryRunning = true)
.concatMap(integers-> Mono.fromCallable(() -> {
int sleepTime = new Random().nextInt(10000);
System.out.println("processing " + integers.size() + " elements. Sleep time: " + sleepTime);
Thread.sleep(sleepTime);
return "";
})
.subscribeOn(Schedulers.elastic())
).doOnNext(s -> isQueryRunning = false)
.subscribe();
Which prints
processing 1 elements. Sleep time: 4585
processing 402 elements. Sleep time: 2466
processing 223 elements. Sleep time: 2613
processing 236 elements. Sleep time: 5172
processing 465 elements. Sleep time: 8682
processing 787 elements. Sleep time: 6780
Its clearly visible, that size of the next batch is proprortional to previous query execution time(Sleep time).
Note that it is not "real" backpressure solution, just a workaround. Also its not suited for parallel execution. It might also require some tuning in order to prevent running queries for empty batches.

Bad precision of usleep when is executed in background thread in Swift

I have been using "usleep" to stop a thread during some milliseconds and I have checked that is stopping more time than expected.
I am sure I am doing something wrong, I am not an expert in Swift, but I don't understand it because it is very easy to check. For example:
DispatchQueue.global(qos: .background).async {
let timeStart: Int64 = Date().toMillis()
usleep(20 * 1000) // 20 ms
let timeEnd: Int64 = Date().toMillis()
let timeDif = timeEnd - timeStart
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
}
The result:
start: 1522712546115, end: 1522712546235, dif: 120
If I execute the same in the main thread:
start: 1522712586996, end: 1522712587018, dif: 22
I think the way I generate the thread is wrong for stopping it. How could I generate a thread that works good with usleep?
Thanks
A couple of thoughts:
The responsiveness to usleep is a function of the Quality of Service of the queue. For example, doing thirty 20ms usleep calls on the five queue types, resulting in the following average and standard deviations (measured in ms):
QoS mean stdev
&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen;&hyphen; &hyphen;&hyphen;&hyphen;&hyphen;&hyphen; &hyphen;&hyphen;&hyphen;&hyphen;&hyphen;
background 99.14 28.06
utility 74.12 23.66
default 23.88 1.83
userInitiated 22.09 1.87
userInteractive 20.99 0.29
The higher the quality of service, the closer to 20 ms it got, and with a lower standard deviation.
If you want accurate time measurements, you should avoid using Date and/or CFAbsoluteTimeGetCurrent(). As the documentation warns us:
Repeated calls to this function do not guarantee monotonically increasing results. The system time may decrease due to synchronization with external time references or due to an explicit user change of the clock.
You can use a mach_time based value, such as conveniently returned by CACurrentMediaTime(), to avoid this problem. For example:
let start = CACurrentMediaTime()
// do something
let elapsed = CACurrentMediaTime() - start
If you need even higher accuracy, see Apple Technical Q&A #2169.
I, too, get times around 120 ms when sleeping a thread on a background queue:
import Foundation
DispatchQueue.global(qos: .background).async {
let timeStart = Date()
usleep(20 * 1000) // 20 ms
let timeEnd = Date()
let timeDif = timeEnd.timeIntervalSince(timeStart) * 1000
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
exit(0)
}
CFRunLoopRun()
outputs:
start: 2018-04-03 00:10:54 +0000, end: 2018-04-03 00:10:54 +0000, dif: 119.734048843384
However, with default QoS, my results are consistently closer to 20 ms:
import Foundation
DispatchQueue.global(qos: .default).async {
let timeStart = Date()
usleep(20 * 1000) // 20 ms
let timeEnd = Date()
let timeDif = timeEnd.timeIntervalSince(timeStart) * 1000
print("start: \(timeStart), end: \(timeEnd), dif: \(timeDif)")
exit(0)
}
CFRunLoopRun()
start: 2018-04-03 00:12:15 +0000, end: 2018-04-03 00:12:15 +0000, dif: 20.035982131958
So it appears that the .background QoS is causing the behavior you are seeing. Although I don't have direct knowledge of why this would be, it doesn't seem too far-fetched to speculate that the OS may be somewhat more lax about waking up sleeping threads that are marked with a background QoS. Indeed, this is what Apple's documentation has to say about it:
A quality of service (QoS) class allows you to categorize work to be performed by NSOperation, NSOperationQueue, NSThread objects, dispatch queues, and pthreads (POSIX threads). By assigning a QoS to work, you indicate its importance, and the system prioritizes it and schedules it accordingly. For example, the system performs work initiated by a user sooner than background work that can be deferred until a more optimal time. In some cases, system resources may be reallocated away from the lower priority work and given to the higher priority work.
The possibility for your background work to be "deferred until a more optimal time" seems a plausible explanation of the behavior you're seeing.

Is F# Interactive supposed to be much slower than compiled?

I've been using F# for nearly six months and have been so sure that F# Interactive should have the same performance as compiled, that when I bothered to benchmark it, I was convinced it was some kind of compiler bug. Though now it occurs to me that I should have checked here first before opening an issue.
For me it is roughly 3x slower and the optimization switch does not seem to be doing anything at all.
Is this supposed to be standard behavior? If so, I really got trolled by the #time directive. I have the timings for how long it takes to sum 100M elements on this Reddit thread.
Update:
Thanks to FuleSnabel, I uncovered some things.
I tried running the example script from both fsianycpu.exe (which is the default F# Interactive) and fsi.exe and I am getting different timings for two runs. 134ms for the first and 78ms for the later. Those two timings also correspond to the timings from unoptimized and optimized binaries respectively.
What makes the matter even more confusing is that the first project I used to compile the thing is a part of the game library (in script form) I am making and it refuses to compile the optimized binary, instead switching to the unoptimized one without informing me. I had to start a fresh project to get it to compile properly. It is a wonder the other test compiled properly.
So basically, something funky is going on here and I should look into switching fsianycpu.exe to fsi.exe as the default interpreter.
I tried the example code in pastebin I don't see the behavior you describe. This is the result from my performance run:
.\bin\Release\ConsoleApplication3.exe
Total iterations: 300000000, Outer: 10000, Inner: 30000
reduce sequence of list, result 450015000, time 2836 ms
reduce array, result 450015000, time 594 ms
for loop array, result 450015000, time 180 ms
reduce list, result 450015000, time 593 ms
fsi -O --exec .\Interactive.fsx
Total iterations: 300000000, Outer: 10000, Inner: 30000
reduce sequence of list, result 450015000, time 2617 ms
reduce array, result 450015000, time 589 ms
for loop array, result 450015000, time 168 ms
reduce list, result 450015000, time 603 ms
It's expected that Seq.reduce would be the slowest, the for loop the fastest and that the reduce on list/array is roughly similar (this assumes locality of list elements which isn't guaranteed).
I rewrote your code to allow for longer runs w/o running out of memory and to improve cache locality of data. With short runs the uncertainity of measurements makes it hard to compare the data.
Program.fs:
module fs
let stopWatch =
let sw = new System.Diagnostics.Stopwatch()
sw.Start ()
sw
let total = 300000000
let outer = 10000
let inner = total / outer
let timeIt (name : string) (a : unit -> 'T) : unit =
let t = stopWatch.ElapsedMilliseconds
let v = a ()
for i = 2 to outer do
a () |> ignore
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, result %A, time %d ms" name v d
[<EntryPoint>]
let sumTest(args) =
let numsList = [1..inner]
let numsArray = [|1..inner|]
printfn "Total iterations: %d, Outer: %d, Inner: %d" total outer inner
let sumsSeqReduce () = Seq.reduce (+) numsList
timeIt "reduce sequence of list" sumsSeqReduce
let sumsArray () = Array.reduce (+) numsArray
timeIt "reduce array" sumsArray
let sumsLoop () =
let mutable total = 0
for i in 0 .. inner - 1 do
total <- total + numsArray.[i]
total
timeIt "for loop array" sumsLoop
let sumsListReduce () = List.reduce (+) numsList
timeIt "reduce list" sumsListReduce
0
Interactive.fsx:
#load "Program.fs"
fs.sumTest [||]
PS. I am running on Windows with Visual Studio 2015. 32bit or 64bit seemed to make only marginal difference

when is vectorization favored in Julia?

I have 2 functions for determining pi numerically in Julia. The second function (which I think is vectorized) is slower than the first.
Why is vectorization slower? Are there rules when to vectorize and when not to?
function determine_pi(n)
area = zeros(Float64, n);
sum = 0;
for i=1:n
if ((rand()^2+rand()^2) <=1)
sum = sum + 1;
end
area[i] = sum*1.0/i;
end
return area
end
and another function
function determine_pi_vec(n)
res = cumsum(map(x -> x<=1?1:0, rand(n).^2+rand(n).^2))./[1:n]
return res
end
When run for n=10^7, below are execution times (after running few times)
n=10^7
#time returnArray = determine_pi(n)
#output elapsed time: 0.183211324 seconds (80000128 bytes allocated)
#time returnArray2 = determine_pi_vec(n);
#elapsed time: 2.436501454 seconds (880001336 bytes allocated, 30.71% gc time)
Vectorization is good if
It makes the code easier to read, and performance isn't critical
If its a linear algebra operation, using a vectorized style can be good because Julia can use BLAS and LAPACK to perform your operation with very specialized, high-performance code.
In general, I personally find it best to start with vectorized code, look for any speed problems, then devectorize any troublesome issues.
Your second code is slow not so much due to it being vectorized, but due to the use of an anonymous function: unfortunately in Julia 0.3, these are normally quite a bit slower. map in general doesn't perform very well, I believe because Julia can't infer the output type of the function (its still "anonymous" from the perspective of the map function). I wrote a different vectorized version which avoids anonymous functions, and is possibly a little bit easier to read:
function determine_pi_vec2(n)
return cumsum((rand(n).^2 .+ rand(n).^2) .<= 1) ./ (1:n)
end
Benchmarking with
function bench(n, f)
f(10)
srand(1000)
#time f(n)
srand(1000)
#time f(n)
srand(1000)
#time f(n)
end
bench(10^8, determine_pi)
gc()
bench(10^8, determine_pi_vec)
gc()
bench(10^8, determine_pi_vec2)
gives me the results
elapsed time: 5.996090409 seconds (800000064 bytes allocated)
elapsed time: 6.028323688 seconds (800000064 bytes allocated)
elapsed time: 6.172004807 seconds (800000064 bytes allocated)
elapsed time: 14.09414031 seconds (8800005224 bytes allocated, 7.69% gc time)
elapsed time: 14.323797823 seconds (8800001272 bytes allocated, 8.61% gc time)
elapsed time: 14.048216404 seconds (8800001272 bytes allocated, 8.46% gc time)
elapsed time: 8.906563284 seconds (5612510776 bytes allocated, 3.21% gc time)
elapsed time: 8.939001114 seconds (5612506184 bytes allocated, 4.25% gc time)
elapsed time: 9.028656043 seconds (5612506184 bytes allocated, 4.23% gc time)
so vectorized code can definitely be about as good as devectorized in some cases, even when we aren't in a the linear-algebra case.

let bindings and constructors in F#

I am defining a stopwatch in F#:
open System.Diagnostics
let UptimeStopwatch = Stopwatch.StartNew()
and I am printing every 3 seconds with
printfn "%A" UptimeStopwatch.Elapsed
and every time i'm getting "00:00:00.0003195" or something similarly small. Is F# calling the constructor every time I reference UptimeStopwatch? If so, how do I get around this ad achieve the desired result? This is a confusing intermingling of functional and imperative programming.
F# seems to interpret statements like
let MyFunction = DoSomething()
and
let MyFunction() = DoSomething()
differently. The first binds the return value of DoSomething() to the variable MyFunction, and the second binds the action DoSomething() to the function MyFunction().
My usage of UptimeStopwatch was correct, and the error was elsewhere in my implementation.
I see you already found a problem elsewhere in your code, but the two lines in your question still take some time to run and, interestingly, you can make the overhead smaller.
When I run the two lines in your sample, it prints a value around 0.0002142. You can make that smaller by storing the elapsed time using let, because there is some overhead associated with constructing a representation of the printf format string (the first argument):
let UptimeStopwatch = Stopwatch.StartNew()
let elapsed = UptimeStopwatch.Elapsed
printfn "%A" elapsed
This prints on average a number around 0.0000878 (two times smaller). If you use Console, the result is similar (because the Elapsed property is obtained before Console is called and nothing else needs to be done in the meantime):
let UptimeStopwatch = Stopwatch.StartNew()
System.Console.WriteLine(UptimeStopwatch.Elapsed)

Resources