Option type benchmark using F# - f#

I need to use Some/None options in heavy numerical simulations. The following micro benchmark gives me Fast = 485 and Slow = 5890.
I do not like nulls and even if I liked them I cannot use null because The type 'float' does not have 'null' as a proper value.
Ideally there would be a compiler option that would compile Some/None into value/null so there would be no runtime penalty. Is that possible? Or how shall I make Some/None efficient?
let s = System.Diagnostics.Stopwatch()
s.Start()
for h in 0 .. 1000 do
Array.init 100000 (fun i -> (float i + 1.)) |> ignore
printfn "Fast = %d" s.ElapsedMilliseconds
s.Restart()
for h in 0 .. 1000 do
Array.init 100000 (fun i -> Some (float i + 1.)) |> ignore
printfn "Slow = %d" s.ElapsedMilliseconds

None is actually already represented as null. But since option<_> is a reference type (which is necessary for null to be a valid value in the .NET type system), creating Some instances will necessarily require heap allocations. One alternative is to use the .NET System.Nullable<_> type, which is similar to option<_>, except that:
it's a value type, so no heap allocation is needed
it only supports value types as elements, so you can create an option<string>, but not a Nullable<string>. For your use case this seems like an unimportant factor.
it has runtime support so that boxing a nullable without a value results in a null reference, which would be impossible otherwise
Keep in mind that your benchmark does very little work, so the results are probably not typical of what you'd see with your real workload. Try to use a more meaningful benchmark based on your actual scenario if at all possible.
As a side note, you get more meaningful diagnostics (including garbage collection statistics) if you use the #time directive in F# rather than bothering with the Stopwatch.

Related

FsCheck: Override generator for a type, but only in the context of a single parent generator

I seem to often run into cases where I want to generate some complex structure, but a special variation with a member type generated differently.
For example, consider this tree
type Tree<'LeafData,'INodeData> =
| LeafNode of 'LeafData
| InternalNode of 'INodeData * Tree<'LeafData,'INodeData> list
I want to generate cases like
No internal node is childless
There are no leaf-type nodes
Only a limited subset of leaf types are used
These are simple to do if I override all generation of a corresponding child type.
The problem is that it seems register is inherently a thread-level action, and there is no gen-local alternative.
For example, what I want could look like
let limitedLeafs =
gen {
let leafGen = Arb.generate<LeafType> |> Gen.filter isAllowedLeaf
do! registerContextualArb (leafGen |> Arb.fromGen)
return! Arb.generate<Tree<NodeType, LeafType>>
}
This Tree example specifically can work around with some creative type shuffling, but that's not always possible.
It's also possible to use some sort of recursive map that enforces assumptions, but that seems relatively complex if the above is possible. I might be misunderstanding the nature of FsCheck generators though.
Does anyone know how to accomplish this kind of gen-local override?
There's a few options here - I'm assuming you're on FsCheck 2.x but keep scrolling for an option in FsCheck 3.
The first is the most natural one but is more work, which is to break down the generator explicitly to the level you need, and then put humpty dumpty together again. I.e don't rely on the type-based generator derivation so much - if I understand your example correctly that would mean implementing a recursive generator - relying on Arb.generate<LeafType> for the generic types.
Second option - Config has an Arbitrary field which you can use to override Arbitrary instances. These overrides will take effect even if the overridden types are part of the automatically generated ones. So as a sketch you could try:
Check.One ({Config.Quick with Arbitrary = [| typeof<MyLeafArbitrary>) |]) (fun safeTree -> ...)
More extensive example which uses FsCheck.Xunit's PropertyAttribute but the principle is the same, set on the Config instead.
Final option! :) In FsCheck 3 (prerelease) you can configure this via a new (as of yet undocumented) concept ArbMap which makes the map from type to Arbitrary instance explicit, instead of this static global nonsense in 2.x (my bad of course. seemed like a good idea at the time.) The implementation is here which may not tell you all that much - the idea is that you put an ArbMap instance together which contains your "safe" generators for the subparts, then you ArbMap.mergeWith that safe map with ArbMap.defaults (thus overriding the default generators with your safe ones, in the resulting ArbMap) and then you use ArbMap.arbitrary or ArbMap.generate with the resulting map.
Sorry for the long winded explanation - but all in all that should give you the best of both worlds - you can reuse the generic union type generator in FsCheck, while surgically overriding certain types in that context.
FsCheck guidance on this is:
To define a generator that generates a subset of the normal range of values for an existing type, say all the even ints, it makes properties more readable if you define a single-case union case, and register a generator for the new type:
As an example, they suggest you could define arbitrary even integers like this:
type EvenInt = EvenInt of int with
static member op_Explicit(EvenInt i) = i
type ArbitraryModifiers =
static member EvenInt() =
Arb.from<int>
|> Arb.filter (fun i -> i % 2 = 0)
|> Arb.convert EvenInt int
Arb.register<ArbitraryModifiers>() |> ignore
You could then generate and test trees whose leaves are even integers like this:
let ``leaves are even`` (tree : Tree<EvenInt, string>) =
let rec leaves = function
| LeafNode leaf -> [leaf]
| InternalNode (_, children) ->
children |> List.collect leaves
leaves tree
|> Seq.forall (fun (EvenInt leaf) ->
leaf % 2 = 0)
Check.Quick ``leaves are even`` // output: Ok, passed 100 tests.
To be honest, I like your idea of a "gen-local override" better, but I don't think FsCheck supports it.

Creating an 'add' computation expression

I'd like the example computation expression and values below to return 6. For some the numbers aren't yielding like I'd expect. What's the step I'm missing to get my result? Thanks!
type AddBuilder() =
let mutable x = 0
member _.Yield i = x <- x + i
member _.Zero() = 0
member _.Return() = x
let add = AddBuilder()
(* Compiler tells me that each of the numbers in add don't do anything
and suggests putting '|> ignore' in front of each *)
let result = add { 1; 2; 3 }
(* Currently the result is 0 *)
printfn "%i should be 6" result
Note: This is just for creating my own computation expression to expand my learning. Seq.sum would be a better approach. I'm open to the idea that this example completely misses the value of computation expressions and is no good for learning.
There is a lot wrong here.
First, let's start with mere mechanics.
In order for the Yield method to be called, the code inside the curly braces must use the yield keyword:
let result = add { yield 1; yield 2; yield 3 }
But now the compiler will complain that you also need a Combine method. See, the semantics of yield is that each of them produces a finished computation, a resulting value. And therefore, if you want to have more than one, you need some way to "glue" them together. This is what the Combine method does.
Since your computation builder doesn't actually produce any results, but instead mutates its internal variable, the ultimate result of the computation should be the value of that internal variable. So that's what Combine needs to return:
member _.Combine(a, b) = x
But now the compiler complains again: you need a Delay method. Delay is not strictly necessary, but it's required in order to mitigate performance pitfalls. When the computation consists of many "parts" (like in the case of multiple yields), it's often the case that some of them should be discarded. In these situation, it would be inefficient to evaluate all of them and then discard some. So the compiler inserts a call to Delay: it receives a function, which, when called, would evaluate a "part" of the computation, and Delay has the opportunity to put this function in some sort of deferred container, so that later Combine can decide which of those containers to discard and which to evaluate.
In your case, however, since the result of the computation doesn't matter (remember: you're not returning any results, you're just mutating the internal variable), Delay can just execute the function it receives to have it produce the side effects (which are - mutating the variable):
member _.Delay(f) = f ()
And now the computation finally compiles, and behold: its result is 6. This result comes from whatever Combine is returning. Try modifying it like this:
member _.Combine(a, b) = "foo"
Now suddenly the result of your computation becomes "foo".
And now, let's move on to semantics.
The above modifications will let your program compile and even produce expected result. However, I think you misunderstood the whole idea of the computation expressions in the first place.
The builder isn't supposed to have any internal state. Instead, its methods are supposed to manipulate complex values of some sort, some methods creating new values, some modifying existing ones. For example, the seq builder1 manipulates sequences. That's the type of values it handles. Different methods create new sequences (Yield) or transform them in some way (e.g. Combine), and the ultimate result is also a sequence.
In your case, it looks like the values that your builder needs to manipulate are numbers. And the ultimate result would also be a number.
So let's look at the methods' semantics.
The Yield method is supposed to create one of those values that you're manipulating. Since your values are numbers, that's what Yield should return:
member _.Yield x = x
The Combine method, as explained above, is supposed to combine two of such values that got created by different parts of the expression. In your case, since you want the ultimate result to be a sum, that's what Combine should do:
member _.Combine(a, b) = a + b
Finally, the Delay method should just execute the provided function. In your case, since your values are numbers, it doesn't make sense to discard any of them:
member _.Delay(f) = f()
And that's it! With these three methods, you can add numbers:
type AddBuilder() =
member _.Yield x = x
member _.Combine(a, b) = a + b
member _.Delay(f) = f ()
let add = AddBuilder()
let result = add { yield 1; yield 2; yield 3 }
I think numbers are not a very good example for learning about computation expressions, because numbers lack the inner structure that computation expressions are supposed to handle. Try instead creating a maybe builder to manipulate Option<'a> values.
Added bonus - there are already implementations you can find online and use for reference.
1 seq is not actually a computation expression. It predates computation expressions and is treated in a special way by the compiler. But good enough for examples and comparisons.

How do I turn this Extension Method into an Extension Property?

I have an extension method
type System.Int32 with
member this.Thousand() = this * 1000
but it requires me to write like this
(5).Thousand()
I'd love to get rid of both parenthesis, starting with making it a property instead of a method (for learning sake) how do I make this a property?
Jon's answer is one way to do it, but for a read-only property there's also a more concise way to write it:
type System.Int32 with
member this.Thousand = this * 1000
Also, depending on your preferences, you may find it more pleasing to write 5 .Thousand (note the extra space) than (5).Thousand (but you won't be able to do just 5.Thousand, or even 5.ToString()).
I don't really know F# (shameful!) but based on this blog post, I'd expect:
type System.Int32 with
member this.Thousand
with get() = this * 1000
I suspect that won't free you from the first set of parentheses (otherwise F# may try to parse the whole thing as a literal), but it should help you with the second.
Personally I wouldn't use this sort of thing for a "production" extension, but it's useful for test code where you're working with a lot of values.
In particular, I've found it neat to have extension methods around dates, e.g. 19.June(1976) as a really simple, easy-to-read way of building up test data. But not for production code :)
It's not beautiful, but if you really want a function that will work for any numeric type, you can do this:
let inline thousand n =
let one = LanguagePrimitives.GenericOne
let thousand =
let rec loop n i =
if i < 1000 then loop (n + one) (i + 1)
else n
loop one 1
n * thousand
5.0 |> thousand
5 |> thousand
5I |> thousand

F# Functions vs. Values

This is a pretty simple question, and I just wanted to check that what I'm doing and how I'm interpreting the F# makes sense. If I have the statement
let printRandom =
x = MyApplication.getRandom()
printfn "%d" x
x
Instead of creating printRandom as a function, F# runs it once and then assigns it a value. So, now, when I call printRandom, instead of getting a new random value and printing it, I simply get whatever was returned the first time. I can get around this my defining it as such:
let printRandom() =
x = MyApplication.getRandom()
printfn "%d" x
x
Is this the proper way to draw this distinction between parameter-less functions and values? This seems less than ideal to me. Does it have consequences in currying, composition, etc?
The right way to look at this is that F# has no such thing as parameter-less functions. All functions have to take a parameter, but sometimes you don't care what it is, so you use () (the singleton value of type unit). You could also make a function like this:
let printRandom unused =
x = MyApplication.getRandom()
printfn "%d" x
x
or this:
let printRandom _ =
x = MyApplication.getRandom()
printfn "%d" x
x
But () is the default way to express that you don't use the parameter. It expresses that fact to the caller, because the type is unit -> int not 'a -> int; as well as to the reader, because the call site is printRandom () not printRandom "unused".
Currying and composition do in fact rely on the fact that all functions take one parameter and return one value.
The most common way to write calls with unit, by the way, is with a space, especially in the non .NET relatives of F# like Caml, SML and Haskell. That's because () is a singleton value, not a syntactic thing like it is in C#.
Your analysis is correct.
The first instance defines a value and not a function. I admit this caught me a few times when I started with F# as well. Coming from C# it seems very natural that an assignment expression which contains multiple statements must be a lambda and hence delay evaluated.
This is just not the case in F#. Statements can be almost arbitrarily nested (and it rocks for having locally scoped functions and values). Once you get comfortable with this you start to see it as an advantage as you can create functions and continuations which are inaccessible to the rest of the function.
The second approach is the standard way for creating a function which logically takes no arguments. I don't know the precise terminology the F# team would use for this declaration though (perhaps a function taking a single argument of type unit). So I can't really comment on how it would affect currying.
Is this the proper way to draw this
distinction between parameter-less
functions and values? This seems less
than ideal to me. Does it have
consequences in currying, composition,
etc?
Yes, what you describe is correct.
For what its worth, it has a very interesting consequence able to partially evaluate functions on declaration. Compare these two functions:
// val contains : string -> bool
let contains =
let people = set ["Juliet"; "Joe"; "Bob"; "Jack"]
fun person -> people.Contains(person)
// val contains2 : string -> bool
let contains2 person =
let people = set ["Juliet"; "Joe"; "Bob"; "Jack"]
people.Contains(person)
Both functions produce identical results, contains creates its people set on declaration and reuses it, whereas contains2 creates its people set everytime you call the function. End result: contains is slightly faster. So knowing the distinction here can help you write faster code.
Assignment bodies looking like function bodies have cought a few programmers unaware. You can make things even more interesting by having the assignment return a function:
let foo =
printfn "This runs at startup"
(fun () -> printfn "This runs every time you call foo ()")
I just wrote a blog post about it at http://blog.wezeku.com/2010/08/23/values-functions-and-a-bit-of-both/.

F#: Why aren't option types compatible with nullable types?

Why aren't option types like "int option" compatible with nullable types like "Nullable"?
I assume there is some semantic reason for the difference, but I can't figure what that is.
An option in F# is used when a value may or may not exist. An option has an underlying type and may either hold a value of that type or it may not have a value.
http://msdn.microsoft.com/en-us/library/dd233245%28VS.100%29.aspx
That sure sounds like the Nullable structure.
Because of the runtime representation choice for System.Nullable<'T>.
Nullable tries to represent the absent of values by the null pointer, and present values by pointers to those values.
(new System.Nullable<int>() :> obj) = null
|> printfn "%b" // true
(new System.Nullable<int>(1) :> obj).GetType().Name
|> printfn "%s" // Int32
Now consider strings. Unfortunately, strings are nullable. So this is valid:
null : string
But now a null runtime value is ambiguous - it can refer to either the absence of a value or a presence of a null value. For this reason, .NET does not allow constructing a System.Nullable<string>.
Contrast this with:
(Some (null : string) :> obj).GetType().Name
|> printfn "%s" // Option`1
That being said, one can define a bijection:
let optionOfNullable (a : System.Nullable<'T>) =
if a.HasValue then
Some a.Value
else
None
let nullableOfOption = function
| None -> new System.Nullable<_>()
| Some x -> new System.Nullable<_>(x)
If you observe the types, these functions constrain 'T to be a structure and have a zero-argument constructor. So perhaps F# compiler could expose .NET functions receiving/returning Nullable<'T> by substituting it for an Option<'T where 'T : struct and 'T : (new : unit -> 'T)>, and inserting the conversion functions where necessary..
The two have different semantics. Just to name one, Nullable is an idempotent data constructor that only works on value types, whereas option is a normal generic type. So you can't have a
Nullable<Nullable<int>>
but you can have an
option<option<int>>
Generally, though there are some overlapping scenarios, there are also things you can do with one but not the other.
Key difference is that must test the option type to see if it has a value. See this question for a good description of its semantics: How does the option type work in F#
Again, this is from my limited understanding, but the problem probably lies in how each gets rendered in the IL. The "nullable" structure probably gets handled slightly different from the option type.
You will find that the interactions between various .Net languages really boils down to how the IL gets rendered. Mostof the time it works just fine but on occasion, it causes issues. (check out this). Just when you thought it was safe to trust the level of abstraction. :)

Resources