I am playing with the F# Interactive Console and comparing the runtime of some numeric operations. On this code the total runtime seems to double only be repeating the declaration of one variable.
In VS 2010 I do:
open System.Diagnostics
let mutable a=1
Then I select this below and run it with Alt+Enter
let stopWatch = Stopwatch.StartNew()
for i=1 to 200100100 do
a <- a + 1
stopWatch.Stop()
printfn "stopWatch.Elapsed: %f" stopWatch.Elapsed.TotalMilliseconds
it takes more or less: 320 ms
now i select this and hit Alt+Enter:
let mutable a=1
let stopWatch = Stopwatch.StartNew()
for i=1 to 200100100 do
a <- a + 1
stopWatch.Stop()
printfn "stopWatch.Elapsed: %f" stopWatch.Elapsed.TotalMilliseconds
almost double: 620 ms
The same block but including the declaration at the top takes almost double. Shouldn't it be the same since I declare the variable before Stopwatch.StartNew() ? Does this have to do with the interactive console?
I have the same results using the #time directive.
I'm not convinced any of the answers yet are quite right. I think a is represented the same in both cases. And by that I mean an individual type is emitted dynamically wrapping that mutable value (certainly on the heap!). It needs to do this since in both cases, a is a top-level binding that can be accessed by subsequent interactions.
So all that said, my theory is this: in the first case, the dynamic type emitted by FSI is loaded at the end of the interaction, in order to output its default value. In the second case, however, the type wrapping a is not loaded until it is first accessed, in the loop, after the StopWatch was started.
Daniel is right - if you define a variable in a separate FSI interaction, then it is represented differently.
To get a correct comparison, you need to declare the variable as local in both cases. The easiest way to do that is to nest the code under do (which turns everything under do into a local scope) and then evaluate the entire do block at once. See the result of the following two examples:
// A version with variable declaration excluded
do
let mutable a=1
let stopWatch = Stopwatch.StartNew()
for i=1 to 200100100 do
a <- a + 1
stopWatch.Stop()
printfn "stopWatch.Elapsed: %f" stopWatch.Elapsed.TotalMilliseconds
// A version with variable declaration included
do
let stopWatch = Stopwatch.StartNew()
let mutable a=1
for i=1 to 200100100 do
a <- a + 1
stopWatch.Stop()
printfn "stopWatch.Elapsed: %f" stopWatch.Elapsed.TotalMilliseconds
The difference I get when I run them is not measureable.
Related
I have the following program :
let readStoriesInQAForm =
printfn "Inside the let!"
0
[<EntryPoint>]
let main argv =
printfn "Ah! entering point to the main!"
System.Console.ReadKey() |> ignore
0 // return an integer exit code
I was expecting to get the following output (since main is the entry point, and there is no function call):
Ah! entering point to the main!
But I get this, when I compiler and run it in VS 2013:
Inside the let!
Ah! entering point to the main!
What is my mistake?
As John Palmer describes in his answer, F# code is executed from top to bottom. The let keyword binds a value to a name - in this case, the value 0 is bound to the readStoriesInQAForm name.
Apart from primitive values, you can also bind functions to names; F# is a Functional programming language, so functions are values too. If you bind a function to a name, you can execute that function by invoking it.
However, readStoriesInQAForm isn't a function - it's a primitive value (0), and it gets bound before main is called, in order to make that value available for main. In this particular case, the way the let binding is defined, it has a side-effect of printing to Standard Out when the binding occurs. (In general, in Functional Programming, the more you can avoid side-effects, the better.)
If you want to avoid this behaviour, change the let binding from a a primitive value to a function:
let readStoriesInQAForm () =
printfn "Inside the let!"
0
Better yet would be to bind the name to the value without any side-effect:
let readStoriesInQAForm = 0
In a F# program, the code is essentially run from top to bottom, so that any values needed later are available.
For example, if you had written:
[<EntryPoint>]
let main argv =
printfn "Ah! entering point to the main!"
printfn readStoriesInQAForm
System.Console.ReadKey() |> ignore
0 // return an integer exit code
The observed behaviour makes perfect sense as it would be illogical to jump out of main to calculate a constant value.
To avoid this issue, you want to make readStoriesInQAForm a function, like so:
let readStoriesInQAForm() = ...
You're not declaring a function, you're declaring a value. It's missing ():
let readStoriesInQAForm() =
In reading John Palmer's answer to What is the difference between mutable values and immutable value redefinition?, John notes that
This sort of redefinition will only work in fsi.
In working with F# Interactive (fsi) I guess I subconsciously knew it, but never paid attention to it.
Now that it is apparent, why the difference?
More specifically please explain how the internals are different between fsi and the compiler such that this occurs by design or result of differences?
If the answer can elaborate on the internal structures that hold the bindings that would be appreciated.
The semantics are consistent with the way FSI compiles interactive submissions to an FSI session: each interactive submission is compiled as module which is open to subsequent interactive submissions.
The following is close to what FSI actual does and illustrates how let binding shadowing works across interactive submissions:
FSI Submission #1: let x = 1;;
module FSI_1 =
let x = 1
open FSI_1 //FSI_1.x is now bound to 1 and available at the top level
FSI Submission #2: let x = 2;;
module FSI_2 =
let x = 2
open FSI_2 //FSI_1.x is now shadowed by FSI_2.x which is bound to 2 and available at the top level
You can see the actual details how how the dynamic FSI assembly is compiled by using reflection on the FSI_ASSEMBLY assembly within the FSI app domain. Each interactive submission is literally emitted as a module (.NET class) with the naming pattern FSI_####. FsEye uses these facts to discover the state of FSI top-level bindings: https://code.google.com/p/fseye/source/browse/tags/2.0.1/FsEye/Fsi/SessionQueries.fs#24
The key takeaway in regard to #JohnPalmer's answer is that top-level FSI definitions cannot be mutated, when they are "redefined" they are merely being shadowed. We can show this as follows:
> let x = 1;; //our original definition of x
val x : int = 1
> let f () = x;; //capture x
val f : unit -> int
> let x = 2;; //shadow our original definition of x
val x : int = 2
> f();; //returns the original x value, which is still 1 rather than 2
val it : int = 1
I'm just beginning F# and haven't really done functional programming since my programming languages class 15 years ago (exception being "modern" C#).
I'm looking at this F# snippet using LINQPad 4:
let add x y = x + y
let lazyPlusOne x = lazy add x 1
let e = lazyPlusOne 15
Dump e
let plusOne x = add x 1
let f = plusOne 15
Dump f
The output it produces is:
Lazy<Int32>
Value is not created.
IsValueCreated False
Value 16
16
I understand the lazy keyword to delay evaluation until needed, same as C# delayed execution.
What is the meaning of: "Value is not created" here?
If you use lazy keyword to construct a lazy value (as in your lazyPlusOne function), then the result is a value of type Lazy<int>. This represents a value of type int that is evaluated only when it is actually needed.
I assume that Dump function tries to print the value including all its properties - when it starts printing, the value is not evaluated, so ToString method prints Value is not created. Then it iterates over other properties and when it accesses Value, the lazy value is evaluated (because its value is now needed). After evaluation, the property returns 16, which is then printed.
You can replace Dump with an F#-friendly printing function (or just use F# Interactive, which is extremely convenient way to play with F# inside Visual Studio with the usual IntelliSense, background error checking etec.)
F#-friendly printing function like printfn "%A" doesn't access the Value property, so it doesn't accidentally evaluate the value. Here is a snippet from F# Interactive:
> let a = lazy (1 + 2);;
val a : Lazy<int> = Value is not created. // Creates lazy value that's not evaluated
> a;;
val it : Lazy<int> = Value is not created. // Still not evaluated!
> a.Value;; // Now, the lazy value needs to be evaluated (to get the Value)
val it : int = 3
> a;; // After evaluation, the value stays cached
val it : Lazy<int> = 3
As of 'Dump e', 'lazyPlusOne 15' has not been evaluated. The 'let e = lazyPlusOne 15' does not require the evaluation of 'lazyPlusOne 15'. We don't yet need to know what e evaluates to yet.
The dump is triggering the evaluation and that is semantically different that just dumping the value after the evaluation.
I'm just beginning F# so please be kind if this is basic.
I've read that a function marked lazy is evaluated only once and then cached. For example:
let lazyFunc = lazy (1 + 1)
let theValue = Lazy.force lazyFunc
Compared to this version which would actually run each time it's called:
let eagerFunc = (1 + 1)
let theValue = eagerFunc
Based on that, should all functions be made lazy? When would you not want to? This is coming from material in the book "Beginning F#".
First of all, it may be helpful to note that none of the things you have defined is a function - eagerFunc and theValue are values of type int and lazyFunc is a value of type Lazy<int>. Given
let lazyTwo = lazy (1 + 1)
and
let eagerTwo = 1 + 1
the expression 1 + 1 will not be evaluated more than once no matter how many times you use eagerTwo. The difference is that 1 + 1 will be evaluated exactly once when defining eagerTwo, but will be evaluated at most once when lazyTwo is used (it will be evaluated the first time that the Value property is accessed, and then cached so that further uses of Value do not need to recalculated it). If lazyTwo's Value is never accessed, then its body 1 + 1 will never be evaluated.
Typically, you won't see much benefit to using lazy values in a strict language like F#. They add a small amount of overhead since accessing the Value property requires checking whether the value has already been calculated. They might save you a bit of calculation if you have something like let lazyValue = lazy someVeryExpensiveCalculationThatMightNotBeNeeded(), since the expensive calculation will only take place if the value is actually used. They can also make some algorithms terminate which otherwise would not, but this is not a major issue in F#. For instance:
// throws an exception if x = 0.0
let eagerDivision x =
let oneOverX = 1.0 / x
if x = 0.0 then
printfn "Tried to divide by zero" // too late, this line is never reached
else
printfn "One over x is: %f" oneOverX
// succeeds even if x = 0.0, since the quotient is lazily evaluated
let lazyDivision x =
let oneOverX = lazy (1.0 / x)
if x = 0.0 then
printfn "Tried to divide by zero"
else
printfn "One over x is: %f" oneOverX.Value
If function executions have side-effects and it is important to see the side-effects each time the function is called (say it wraps an I/O function) you would not want it to be lazy.
There are also functions that are so trivial that executing them each time is faster than caching the value--
let eagerFunc = (1 + 1) is a let binding, and will only execute once. let eagerFunc() = (1 + 1) is a function accepting unit (nothing) and returning an int. It will execute each time it's called. In a sense, every function is lazy, that is, it only executes when called. However, the lazy keyword (and System.Lazy, which it returns) will execute the expression/function given to it at most once. Subsequent calls to the Value property will return the cached result. This is useful when computation of the value is expensive.
Many functions will not be suitable for use with lazy because they are either non-deterministic (may return a different result with each invocation) or parameterized. Of course, it's possible to use a fully-applied (a value is supplied for each parameter) version of such functions, but generally the variability is desired.
I'm writing a parser in F#, and it needs to be as fast as possible (I'm hoping to parse a 100 MB file in less than a minute). As normal, it uses mutable variables to store the next available character and the next available token (i.e. both the lexer and the parser proper use one unit of lookahead).
My current partial implementation uses local variables for these. Since closure variables can't be mutable (anyone know the reason for this?) I've declared them as ref:
let rec read file includepath =
let c = ref ' '
let k = ref NONE
let sb = new StringBuilder()
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
// etc
I assume this has some overhead (not much, I know, but I'm trying for maximum speed here), and it's a little inelegant. The most obvious alternative would be to create a parser class object and have the mutable variables be fields in it. Does anyone know which is likely to be faster? Is there any consensus on which is considered better/more idiomatic style? Is there another option I'm missing?
You mentioned that local mutable values cannot be captured by a closure, so you need to use ref instead. The reason for this is that mutable values captured in the closure need to be allocated on the heap (because closure is allocated on the heap).
F# forces you to write this explicitly (using ref). In C# you can "capture mutable variable", but the compiler translates it to a field in a heap-allocated object behind the scene, so it will be on the heap anyway.
Summary is: If you want to use closures, mutable variables need to be allocated on the heap.
Now, regarding your code - your implementation uses ref, which creates a small object for every mutable variable that you're using. An alternative would be to create a single object with multiple mutable fields. Using records, you could write:
type ReadClosure = {
mutable c : char
mutable k : SomeType } // whatever type you use here
let rec read file includepath =
let state = { c = ' '; k = NONE }
// ...
let readc() =
state.c <- stream.Read() |> char
// etc...
This may be a bit more efficient, because you're allocating a single object instead of a few objects, but I don't expect the difference will be noticeable.
There is also one confusing thing about your code - the stream value will be disposed after the function read returns, so the call to stream.Read may be invalid (if you call readc after read completes).
let rec read file includepath =
let c = ref ' '
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
readc
let f = read a1 a2
f() // This would fail!
I'm not quite sure how you're actually using readc, but this may be a problem to think about. Also, if you're declaring it only as a helper closure, you could probably rewrite the code without closure (or write it explicitly using tail-recursion, which is translated to imperative loop with mutable variables) to avoid any allocations.
I did the following profiling:
let test() =
tic()
let mutable a = 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a <- a + float j
toc("mutable")
let test2() =
tic()
let a = ref 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a := !a + float j
toc("ref")
the average for mutable is 50ms, while ref 600ms. The performance difference is due to that mutable variables are in stack, while ref variables are in managed heap.
The relative difference is big. However, 10^8 times of access is a big number. And the total time is acceptable. So don't worry too much about the performance of ref variables. And remember:
Premature optimization is the root of
all evil.
My advice is you first finish your parser, then consider optimizing it. You won't know where the bottomneck is until you actually run the program. One good thing about F# is that its terse syntax and functional style well support code refactoring. Once the code is done, optimizing it would be convenient. Here's an profiling example.
Just another example, we use .net arrays everyday, which is also in managed heap:
let test3() =
tic()
let a = Array.create 1 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a.[0] <- a.[0] + float j
toc("array")
test3() runs about the same as ref's. If you worry too much of variables in managed heap, then you won't use array anymore.