F# Lazy Evaluation vs Non-Lazy - f#

I'm just beginning F# so please be kind if this is basic.
I've read that a function marked lazy is evaluated only once and then cached. For example:
let lazyFunc = lazy (1 + 1)
let theValue = Lazy.force lazyFunc
Compared to this version which would actually run each time it's called:
let eagerFunc = (1 + 1)
let theValue = eagerFunc
Based on that, should all functions be made lazy? When would you not want to? This is coming from material in the book "Beginning F#".

First of all, it may be helpful to note that none of the things you have defined is a function - eagerFunc and theValue are values of type int and lazyFunc is a value of type Lazy<int>. Given
let lazyTwo = lazy (1 + 1)
and
let eagerTwo = 1 + 1
the expression 1 + 1 will not be evaluated more than once no matter how many times you use eagerTwo. The difference is that 1 + 1 will be evaluated exactly once when defining eagerTwo, but will be evaluated at most once when lazyTwo is used (it will be evaluated the first time that the Value property is accessed, and then cached so that further uses of Value do not need to recalculated it). If lazyTwo's Value is never accessed, then its body 1 + 1 will never be evaluated.
Typically, you won't see much benefit to using lazy values in a strict language like F#. They add a small amount of overhead since accessing the Value property requires checking whether the value has already been calculated. They might save you a bit of calculation if you have something like let lazyValue = lazy someVeryExpensiveCalculationThatMightNotBeNeeded(), since the expensive calculation will only take place if the value is actually used. They can also make some algorithms terminate which otherwise would not, but this is not a major issue in F#. For instance:
// throws an exception if x = 0.0
let eagerDivision x =
let oneOverX = 1.0 / x
if x = 0.0 then
printfn "Tried to divide by zero" // too late, this line is never reached
else
printfn "One over x is: %f" oneOverX
// succeeds even if x = 0.0, since the quotient is lazily evaluated
let lazyDivision x =
let oneOverX = lazy (1.0 / x)
if x = 0.0 then
printfn "Tried to divide by zero"
else
printfn "One over x is: %f" oneOverX.Value

If function executions have side-effects and it is important to see the side-effects each time the function is called (say it wraps an I/O function) you would not want it to be lazy.
There are also functions that are so trivial that executing them each time is faster than caching the value--

let eagerFunc = (1 + 1) is a let binding, and will only execute once. let eagerFunc() = (1 + 1) is a function accepting unit (nothing) and returning an int. It will execute each time it's called. In a sense, every function is lazy, that is, it only executes when called. However, the lazy keyword (and System.Lazy, which it returns) will execute the expression/function given to it at most once. Subsequent calls to the Value property will return the cached result. This is useful when computation of the value is expensive.
Many functions will not be suitable for use with lazy because they are either non-deterministic (may return a different result with each invocation) or parameterized. Of course, it's possible to use a fully-applied (a value is supplied for each parameter) version of such functions, but generally the variability is desired.

Related

Creating an 'add' computation expression

I'd like the example computation expression and values below to return 6. For some the numbers aren't yielding like I'd expect. What's the step I'm missing to get my result? Thanks!
type AddBuilder() =
let mutable x = 0
member _.Yield i = x <- x + i
member _.Zero() = 0
member _.Return() = x
let add = AddBuilder()
(* Compiler tells me that each of the numbers in add don't do anything
and suggests putting '|> ignore' in front of each *)
let result = add { 1; 2; 3 }
(* Currently the result is 0 *)
printfn "%i should be 6" result
Note: This is just for creating my own computation expression to expand my learning. Seq.sum would be a better approach. I'm open to the idea that this example completely misses the value of computation expressions and is no good for learning.
There is a lot wrong here.
First, let's start with mere mechanics.
In order for the Yield method to be called, the code inside the curly braces must use the yield keyword:
let result = add { yield 1; yield 2; yield 3 }
But now the compiler will complain that you also need a Combine method. See, the semantics of yield is that each of them produces a finished computation, a resulting value. And therefore, if you want to have more than one, you need some way to "glue" them together. This is what the Combine method does.
Since your computation builder doesn't actually produce any results, but instead mutates its internal variable, the ultimate result of the computation should be the value of that internal variable. So that's what Combine needs to return:
member _.Combine(a, b) = x
But now the compiler complains again: you need a Delay method. Delay is not strictly necessary, but it's required in order to mitigate performance pitfalls. When the computation consists of many "parts" (like in the case of multiple yields), it's often the case that some of them should be discarded. In these situation, it would be inefficient to evaluate all of them and then discard some. So the compiler inserts a call to Delay: it receives a function, which, when called, would evaluate a "part" of the computation, and Delay has the opportunity to put this function in some sort of deferred container, so that later Combine can decide which of those containers to discard and which to evaluate.
In your case, however, since the result of the computation doesn't matter (remember: you're not returning any results, you're just mutating the internal variable), Delay can just execute the function it receives to have it produce the side effects (which are - mutating the variable):
member _.Delay(f) = f ()
And now the computation finally compiles, and behold: its result is 6. This result comes from whatever Combine is returning. Try modifying it like this:
member _.Combine(a, b) = "foo"
Now suddenly the result of your computation becomes "foo".
And now, let's move on to semantics.
The above modifications will let your program compile and even produce expected result. However, I think you misunderstood the whole idea of the computation expressions in the first place.
The builder isn't supposed to have any internal state. Instead, its methods are supposed to manipulate complex values of some sort, some methods creating new values, some modifying existing ones. For example, the seq builder1 manipulates sequences. That's the type of values it handles. Different methods create new sequences (Yield) or transform them in some way (e.g. Combine), and the ultimate result is also a sequence.
In your case, it looks like the values that your builder needs to manipulate are numbers. And the ultimate result would also be a number.
So let's look at the methods' semantics.
The Yield method is supposed to create one of those values that you're manipulating. Since your values are numbers, that's what Yield should return:
member _.Yield x = x
The Combine method, as explained above, is supposed to combine two of such values that got created by different parts of the expression. In your case, since you want the ultimate result to be a sum, that's what Combine should do:
member _.Combine(a, b) = a + b
Finally, the Delay method should just execute the provided function. In your case, since your values are numbers, it doesn't make sense to discard any of them:
member _.Delay(f) = f()
And that's it! With these three methods, you can add numbers:
type AddBuilder() =
member _.Yield x = x
member _.Combine(a, b) = a + b
member _.Delay(f) = f ()
let add = AddBuilder()
let result = add { yield 1; yield 2; yield 3 }
I think numbers are not a very good example for learning about computation expressions, because numbers lack the inner structure that computation expressions are supposed to handle. Try instead creating a maybe builder to manipulate Option<'a> values.
Added bonus - there are already implementations you can find online and use for reference.
1 seq is not actually a computation expression. It predates computation expressions and is treated in a special way by the compiler. But good enough for examples and comparisons.

This expression was expected to have type bool but here has type unit error

getting an error when I try to run this line of code and I can't figure out why
let validCol column value : bool =
for i in 0..8 do
if sudokuBoard.[i,column] = value then
false
else true
As Tyler Hartwig says a for loop cannot return a value except unit.
On the other hand, inside a list comprehension or a seq Computation Expression you can use for to yield the values and then test if the one you are looking for exists:
let validCol column value : bool =
seq { for i in 0..8 do yield sudokuBoard.[i,column] }
|> Seq.exists value
|> not
In F#, the last call made is what is returned, you have explicitly declared you are returning a bool.
The for loop is unable to return or aggregate multiple values, bun instead, returns unit.
let validCol column value : bool =
for i in 0..8 do
if sudokuBoard.[i,column] = value then
false
else
true
Here, you'll need to figure out how to aggregate all the bool to get your final result. I'm not quite sure what this is supposed to return, or I'd give an example.
It looks like you are looking for a short-cut out of the loop like in C# you can use continue, break or return to exit a loop.
In F# the way to accomplish that with performance is to use tail-recursion. You could achieve it with while loops but that requires mutable variables which tail-recursion doesn't need (although we sometimes uses it).
A tail-recursive function is one that calls itself at the very end and doesn't look at the result:
So this is tail-recursive
let rec loop acc i = if i > 0 then loop (acc + i) (i - 1) else acc
Where this isn't
let rec loop fib i = if i < 1 then 1 else fib (i - 1) + fib (i - 2)
If F# compiler determines a function is tail-recursive the compiler applies tail-recursion optimization (TCO) on the function, basically it unrolls it into an efficient for loop that looks a lot like the loop would like in C#.
So here is one way to write validCol using tail-recursion:
let validCol column value : bool =
// loops is tail-recursive
let rec loop column value i =
if i < 9 then
if sudokuBoard.[i,column] = value then
false // The value already exists in the column, not valid
else
loop column value (i + 1) // Check next row.
else
true // Reach the end, the value is valid
loop column value 0
Unfortunately; F# compiler doesn't have an attribute to force TCO (like Scala or kotlin does) and therefore if you make a slight mistake you might end up with a function that isn't TCO. I think I saw GitHub issue about adding such an attribute.
PS. seq comprehensions are nice in many cases but for a sudoku solver I assume you are looking for something that is as fast as possible. seq comprehensions (and LINQ) I think adds too much overhead for a sudoku solver whereas tail-recursion is about as quick as you can get in F#.
PS. In .NET 2D arrays are slower than 1D arrays, just FYI. Unsure if it has improved with dotnet core.

let bindings and constructors in F#

I am defining a stopwatch in F#:
open System.Diagnostics
let UptimeStopwatch = Stopwatch.StartNew()
and I am printing every 3 seconds with
printfn "%A" UptimeStopwatch.Elapsed
and every time i'm getting "00:00:00.0003195" or something similarly small. Is F# calling the constructor every time I reference UptimeStopwatch? If so, how do I get around this ad achieve the desired result? This is a confusing intermingling of functional and imperative programming.
F# seems to interpret statements like
let MyFunction = DoSomething()
and
let MyFunction() = DoSomething()
differently. The first binds the return value of DoSomething() to the variable MyFunction, and the second binds the action DoSomething() to the function MyFunction().
My usage of UptimeStopwatch was correct, and the error was elsewhere in my implementation.
I see you already found a problem elsewhere in your code, but the two lines in your question still take some time to run and, interestingly, you can make the overhead smaller.
When I run the two lines in your sample, it prints a value around 0.0002142. You can make that smaller by storing the elapsed time using let, because there is some overhead associated with constructing a representation of the printf format string (the first argument):
let UptimeStopwatch = Stopwatch.StartNew()
let elapsed = UptimeStopwatch.Elapsed
printfn "%A" elapsed
This prints on average a number around 0.0000878 (two times smaller). If you use Console, the result is similar (because the Elapsed property is obtained before Console is called and nothing else needs to be done in the meantime):
let UptimeStopwatch = Stopwatch.StartNew()
System.Console.WriteLine(UptimeStopwatch.Elapsed)

F# noob help understanding Lazy and Value

I'm just beginning F# and haven't really done functional programming since my programming languages class 15 years ago (exception being "modern" C#).
I'm looking at this F# snippet using LINQPad 4:
let add x y = x + y
let lazyPlusOne x = lazy add x 1
let e = lazyPlusOne 15
Dump e
let plusOne x = add x 1
let f = plusOne 15
Dump f
The output it produces is:
Lazy<Int32>
Value is not created.
IsValueCreated False
Value 16
16
I understand the lazy keyword to delay evaluation until needed, same as C# delayed execution.
What is the meaning of: "Value is not created" here?
If you use lazy keyword to construct a lazy value (as in your lazyPlusOne function), then the result is a value of type Lazy<int>. This represents a value of type int that is evaluated only when it is actually needed.
I assume that Dump function tries to print the value including all its properties - when it starts printing, the value is not evaluated, so ToString method prints Value is not created. Then it iterates over other properties and when it accesses Value, the lazy value is evaluated (because its value is now needed). After evaluation, the property returns 16, which is then printed.
You can replace Dump with an F#-friendly printing function (or just use F# Interactive, which is extremely convenient way to play with F# inside Visual Studio with the usual IntelliSense, background error checking etec.)
F#-friendly printing function like printfn "%A" doesn't access the Value property, so it doesn't accidentally evaluate the value. Here is a snippet from F# Interactive:
> let a = lazy (1 + 2);;
val a : Lazy<int> = Value is not created. // Creates lazy value that's not evaluated
> a;;
val it : Lazy<int> = Value is not created. // Still not evaluated!
> a.Value;; // Now, the lazy value needs to be evaluated (to get the Value)
val it : int = 3
> a;; // After evaluation, the value stays cached
val it : Lazy<int> = 3
As of 'Dump e', 'lazyPlusOne 15' has not been evaluated. The 'let e = lazyPlusOne 15' does not require the evaluation of 'lazyPlusOne 15'. We don't yet need to know what e evaluates to yet.
The dump is triggering the evaluation and that is semantically different that just dumping the value after the evaluation.

F# ref-mutable vars vs object fields

I'm writing a parser in F#, and it needs to be as fast as possible (I'm hoping to parse a 100 MB file in less than a minute). As normal, it uses mutable variables to store the next available character and the next available token (i.e. both the lexer and the parser proper use one unit of lookahead).
My current partial implementation uses local variables for these. Since closure variables can't be mutable (anyone know the reason for this?) I've declared them as ref:
let rec read file includepath =
let c = ref ' '
let k = ref NONE
let sb = new StringBuilder()
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
// etc
I assume this has some overhead (not much, I know, but I'm trying for maximum speed here), and it's a little inelegant. The most obvious alternative would be to create a parser class object and have the mutable variables be fields in it. Does anyone know which is likely to be faster? Is there any consensus on which is considered better/more idiomatic style? Is there another option I'm missing?
You mentioned that local mutable values cannot be captured by a closure, so you need to use ref instead. The reason for this is that mutable values captured in the closure need to be allocated on the heap (because closure is allocated on the heap).
F# forces you to write this explicitly (using ref). In C# you can "capture mutable variable", but the compiler translates it to a field in a heap-allocated object behind the scene, so it will be on the heap anyway.
Summary is: If you want to use closures, mutable variables need to be allocated on the heap.
Now, regarding your code - your implementation uses ref, which creates a small object for every mutable variable that you're using. An alternative would be to create a single object with multiple mutable fields. Using records, you could write:
type ReadClosure = {
mutable c : char
mutable k : SomeType } // whatever type you use here
let rec read file includepath =
let state = { c = ' '; k = NONE }
// ...
let readc() =
state.c <- stream.Read() |> char
// etc...
This may be a bit more efficient, because you're allocating a single object instead of a few objects, but I don't expect the difference will be noticeable.
There is also one confusing thing about your code - the stream value will be disposed after the function read returns, so the call to stream.Read may be invalid (if you call readc after read completes).
let rec read file includepath =
let c = ref ' '
use stream = File.OpenText file
let readc() =
c := stream.Read() |> char
readc
let f = read a1 a2
f() // This would fail!
I'm not quite sure how you're actually using readc, but this may be a problem to think about. Also, if you're declaring it only as a helper closure, you could probably rewrite the code without closure (or write it explicitly using tail-recursion, which is translated to imperative loop with mutable variables) to avoid any allocations.
I did the following profiling:
let test() =
tic()
let mutable a = 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a <- a + float j
toc("mutable")
let test2() =
tic()
let a = ref 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a := !a + float j
toc("ref")
the average for mutable is 50ms, while ref 600ms. The performance difference is due to that mutable variables are in stack, while ref variables are in managed heap.
The relative difference is big. However, 10^8 times of access is a big number. And the total time is acceptable. So don't worry too much about the performance of ref variables. And remember:
Premature optimization is the root of
all evil.
My advice is you first finish your parser, then consider optimizing it. You won't know where the bottomneck is until you actually run the program. One good thing about F# is that its terse syntax and functional style well support code refactoring. Once the code is done, optimizing it would be convenient. Here's an profiling example.
Just another example, we use .net arrays everyday, which is also in managed heap:
let test3() =
tic()
let a = Array.create 1 0.0
for i=1 to 10 do
for j=1 to 10000000 do
a.[0] <- a.[0] + float j
toc("array")
test3() runs about the same as ref's. If you worry too much of variables in managed heap, then you won't use array anymore.

Resources