Enum vs non-member discriminated union - f#

I've just noticed that there's only a little difference in declaring a non-member discriminated union:
type Color =
| Red
| Green
| Blue
and declaring an enum:
type Color =
| Red = 0
| Green = 1
| Blue = 2
What are their main differences in terms of performance, usage, etc? Do you have suggestions when to use what?

Enum are stucts and are therefore allocated on the stack, while discriminated unions are references types so are heap allocated. So, you would expect DU to be slightly less performant that enums, though in reality you'll probably never notice this difference.
More importantly a discriminated union can only ever be one of the types declared, where as enums are really just an integer, so you could cast an integer that isn't a member of the enum to the enum type. This means that when pattern matching the compiler can assert that the pattern matching is complete when you've covered all the cases for a DU, but for an enum you must always put in a default catch all the rest case, i.e for an enum you'll always need pattern matching like:
match enumColor with
| Red -> 1
| Green -> 2
| Blue -> 3
| _ -> failwith "not an enum member"
where as the last case would not be necessary with an DU.
One final point, as enums are natively supported in both C# and VB.NET, were as DUs are not, enums are often a better choice when creating a public API for consumption by other languages.

In addition to what Robert has said, pattern matching on unions is done in one of two ways. For unions with only nullary cases, i.e., cases without an associated value (this corresponds closely to enums), the compiler-generated Tag property is checked, which is an int. In this case you can expect performance to be the same as with enums. For unions having non-nullary cases, a type test is used, which I assume is also pretty fast. As Robert said, if there is a performance discrepancy it's negligible. But in the former case it should be exactly the same.
Regarding the inherent "incompleteness" of enums, when a pattern match fails what you really want to know is if a valid case wasn't covered by the match. You don't generally care if an invalid integer value was casted to the enum. In that case you want the match to fail. I almost always prefer unions, but when I have to use enums (usually for interoperability), inside the obligatory wildcard case I pass the unmatched value to a function that distinguishes between valid and invalid values and raises the appropriate error.

As of F# 4.1 there are struct discriminated unions.
These have the performance benefits of stack allocation, like enums.
They have the superior matching of discriminated unions.
They are F# specific so if you need to be understood by other .Net languages you should still use enums.

Related

Meaning of Discriminated Union in F#

I do understand the meaning of "discriminated" and "union" in their standalone contexts, but i am at loss when it comes to the F#'s "Discriminated Union".
Fyi, English is not my first language and I am not good at Math either. So i hope someone out there can shed some light on this feature of F#. Please.
What i need to know is:
the use case for this discriminated union. What it is normally used for?
it's equivalent to other OOP feature/terms. if there's any.
is it like set operation where we use venn diagrams to represent the data?
Or you can help me pointing to links.
A discriminated union is a union of two sets where you can tell which set an item originally belonged to; even if they are the same thing, you can discriminate between them, i.e. tell them apart.
For instance, if you have a discriminated union of two sets of integers, both containing the number 2, you can discriminate between the 2s because you know which original set it came from.
As an example, consider points in the 2-dimensional plane.
These can be expressed as a pair of reals in two ways, either using rectangular (or Cartesian) coordinates (x coordinate, y coordinate) or using polar coordinates (angle of rotation, distance).
But if someone just gives you a pair of numbers, you wouldn't know what they meant.
We can form a discriminated union, though:
type Point2D =
| Rectangular of real * real
| Polar of real * real
Now any Point2D value makes the intended interpretation clear, and the compiler can make sure that we don't try to mix the representations or fail to handle a case.
In an OO setting, you would build a class hierarchy with an abstract base class, or have a "kind" member that you could inspect.
It's more common to form unions of different types, though - if you wrote an interpreter for a programming language you might have something that looks like
type Expression =
| Integer of int
| String of string
| Identifier of string
| Operator of string
| Conditional of Expression * Expression * Expression
| Definition of string * Expression
and so on.
Discriminated unions are also called "sum types", and tuples are called "product types".
These terms come from the discipline of type algebra, and the resultant types are called "Algebraic Data Types".
(When functional programmers mention "ADT", the "A" is usually for "Algebraic", not "Abstract".)
The question is very broad but I will try to give a succint answer.
You can think of DUs as Enums on steroids - you can define a new datatype with distinct cases - just like enums - but on top of this each case may (or may not) contain additional data.
A simple example could be:
type Contact =
| Email of String
| Phone of String
| None
And where DUs are Enum on steroids so is patternmatching instead of switch where you can deconstruct the data
let contactToString = function
| Email e -> e
| Phone p -> p
| None -> "no conctact given"
Basically you use DUs in F# and other FP-languagues all the time to structure data in the obvious ways - you gain much from this - for example the compiler will warn you if you miss a case, ...)
The equivalent in OOP (indeed it's compiled into something very similar) is a Base-class Contact with subclasses for each case (EmailContact : Contact, etc. ) that contains the data-part.
But there is a important difference: you can extent such OOP inheritance structures but you cannot externaly extendt DUs. (see Expression-Problem).
And finally: no it has nothing to do with venn-diagramms or set-theory or anything.
The relation to math is, that this structures are called algebraic-datatypes or sum-types because if you count the different values these types can have you have to sum-up the values for each case. (See Tuples too - these are product-types for similar reasons)

F# limitations of discriminated unions

I am trying to port a small compiler from C# to F# to take advantage of features like pattern matching and discriminated unions. Currently, I am modeling the AST using a pattern based on System.Linq.Expressions: A an abstract base "Expression" class, derived classes for each expression type, and a NodeType enum allowing for switching on expressions without lots of casting. I had hoped to greatly reduce this using an F# discriminated union, but I've run into several seeming limitations:
Forced public default constructor (I'd like to do type-checking and argument validation on expression construction, as System.Linq.Expressions does with it's static factory methods)
Lack of named properties (seems like this is fixed in F# 3.1)
Inability to refer to a case type directly. For example, it seems like I can't declare a function that takes in only one type from the union (e. g. let f (x : TYPE) = x compiles for Expression (the union type) but not for Add or Expression.Add. This seems to sacrifice some type-safety over my C# approach.
Are there good workarounds for these or design patterns which make them less frustrating?
I think, you are stuck a little too much with the idea that a DU is a class hierarchy. It is more helpful to think of it as data, really. As such:
Forced public default constructor (I'd like to do type-checking and argument validation on expression construction, as
System.Linq.Expressions does with it's static factory methods)
A DU is just data, pretty much like say a string or a number, not functionality. Why don't you make a function that returns you an Expression option to express, that your data might be invalid.
Lack of named properties (seems like this is fixed in F# 3.1)
If you feel like you need named properties, you probably have an inappropriate type like say string * string * string * int * float as the data for your Expression. Better make a record instead, something like AddInfo and make your case of the DU use that instead, like say | Add of AddInfo. This way you have properties in pattern matches, intellisense, etc.
Inability to refer to a case type directly. For example, it seems like I can't declare a function that takes in only one type from the
union (e. g. let f (x : TYPE) = x compiles for Expression (the union
type) but not for Add or Expression.Add. This seems to sacrifice some
type-safety over my C# approach.
You cannot request something to be the Add case, but you definitely do can write a function, that takes an AddInfo. Plus you can always do it in a monadic way and have functions that take any Expression and only return an option. In that case, you can pattern match, that your input is of the appropriate type and return None if it is not. At the call site, you then can "use" the value in the good case, using functions like Option.bind.
Basically try not to think of a DU as a set of classes, but really just cases of data. Kind of like an enum.
You can make the implementation private. This allows you the full power of DUs in your implementation but presents a limited view to consumers of your API. See this answer to a related question about records (although it also applies to DUs).
EDIT
I can't find the syntax on MSDN, but here it is:
type T =
private
| A
| B
private here means "private to the module."

Use case for F# Choice type

I've been aware of the F# Choice type for a while , but can't think of any place I'd use it rather than defining my own union type with meaningful named cases.
The MSDN documentation doesn't offer much advice ("Helper types for active patterns with two choices.") and doesn't have any example usages.
I must be missing something - what are the key advantages of this type over a custom union?
In my opinion the use cases for the Choice type are quite similar to the use cases for tuple types. In either case, you will often want to define your own more specific type which is isomorphic to the type (a custom DU for choices; a custom record type for tuples). Nonetheless, over limited scopes or in very generic situations (where good naming may become difficult), it's nice to have anonymous variants.
Sure, a more specific union type might be nice for a particular situation, but having a general Choice union means that your code can mesh well with other code when using general constructs such as Workflows, Functors, etc.
IIRC there's not an implementation of the Either Monad (Workflow in F# lingo) in the standard FSharp Core library, but there is one in the FSharpx library (though I couldn't find a constructor for it so I had to roll my own thanks to #MauricioScheffer for poiting me to choose).
From my limited, mostly C# interop, F# experience, Choice and Option aren't baked into F#'s standard methods as much as Haskell's Maybe and Either algebraic data types are baked into its standard libraries, so you don't get as much of a "this is useful" sense when using them in F# as you might in Haskell, but they are quite useful.
As for an example: in an application I recently wrote I returned Choice1Of2 from methods when I had a successful result and Choice2Of2 with an error message when something went wrong -- whether an exception being caught or a precondition not being met -- and ran my code in a Workflow for flow control. This is one standard use of this union type.
Best example is Async.Catch where you either return a result or an exception, both of which are meaningful.
Having said that, the use of Choice is relatively limited in F# code and most of the time people use a DU. However, a Choice might be used when you can't be bothered to define a DU.
Choice may also have better behavior when interacting with C#
I have observed it's value when attempting to enforce the creation of a discriminated union value via a dedicated function and Active Patterns.
Here's the example that was posted:
module File1 =
type EmailAddress =
private
| Valid of string
| Invalid of string
let createEmailAddress (address:System.String) =
if address.Length > 0
then Valid address
else Invalid address
// Exposed patterns go here
let (|Valid|Invalid|) (input : EmailAddress) : Choice<string, string> =
match input with
|Valid str -> Valid str
|Invalid str -> Valid str
module File2 =
open File1
let validEmail = Valid "" // Compiler error
let isValid = createEmailAddress "" // works
let result = // also works
match isValid with
| Valid x -> true
| _ -> false

Forcing a field of an F# type to be null

I understand well the benefit of option, but in this case, I want to avoid using option for performance reasons. option wraps a type in a class, which just means more work for the garbage collector -- and I want to avoid that.
In this case especially, I have multiple fields that are all Some under the same circumstances, but I don't want to put them in a tuple because, again, tuples are classes -- and puts additional stress on the GC. So I end up accessing field.Value -- which defeats the purpose of option.
So unless there's an optimization I don't know about that causes option types to be treated as references that are potentially null, I want to just use null. Is there a way that I can do that?
Edit: To expand on what I'm doing, I'm making a bounding volume hierarchy, which is really a binary tree with data only at the leaf nodes. I'm implementing it as a class rather than as a discriminated union because keeping the items immutable isn't an option for performance reasons, and discriminated unions can't have mutable members, only refs -- again, adding to GC pressure.
As silly as it is in a functional language, I may just end up doing each node type as an inheritance of a Node parent type. Downcasting isn't exactly the fastest operation, but as far as XNA and WP7 are concerned, almost anything is better than angering the GC.
According to this MSDN documentation, if you decorate your type with the [<AllowNullLiteral>] attribute, you can then call Unchecked.defaultof<T>() to build a null for you.
That seems to be the only way within F# to do what you want. Otherwise, you could marshall out to another .net language and get nulls from there... but I'm guessing that is not what you want at all
Now there are Value Options which may give you the best of both worlds
[<StructuralEquality; StructuralComparison>]
[<Struct>]
type ValueOption<'T> =
| ValueNone
| ValueSome of 'T
No class wrapping, and syntax semantics of Option<'T>

Explaining pattern matching vs switch

I have been trying to explain the difference between switch statements and pattern matching(F#) to a couple of people but I haven't really been able to explain it well..most of the time they just look at me and say "so why don't you just use if..then..else".
How would you explain it to them?
EDIT! Thanks everyone for the great answers, I really wish I could mark multiple right answers.
Having formerly been one of "those people", I don't know that there's a succinct way to sum up why pattern-matching is such tasty goodness. It's experiential.
Back when I had just glanced at pattern-matching and thought it was a glorified switch statement, I think that I didn't have experience programming with algebraic data types (tuples and discriminated unions) and didn't quite see that pattern matching was both a control construct and a binding construct. Now that I've been programming with F#, I finally "get it". Pattern-matching's coolness is due to a confluence of features found in functional programming languages, and so it's non-trivial for the outsider-looking-in to appreciate.
I tried to sum up one aspect of why pattern-matching is useful in the second of a short two-part blog series on language and API design; check out part one and part two.
Patterns give you a small language to describe the structure of the values you want to match. The structure can be arbitrarily deep and you can bind variables to parts of the structured value.
This allows you to write things extremely succinctly. You can illustrate this with a small example, such as a derivative function for a simple type of mathematical expressions:
type expr =
| Int of int
| Var of string
| Add of expr * expr
| Mul of expr * expr;;
let rec d(f, x) =
match f with
| Var y when x=y -> Int 1
| Int _ | Var _ -> Int 0
| Add(f, g) -> Add(d(f, x), d(g, x))
| Mul(f, g) -> Add(Mul(f, d(g, x)), Mul(g, d(f, x)));;
Additionally, because pattern matching is a static construct for static types, the compiler can (i) verify that you covered all cases (ii) detect redundant branches that can never match any value (iii) provide a very efficient implementation (with jumps etc.).
Excerpt from this blog article:
Pattern matching has several advantages over switch statements and method dispatch:
Pattern matches can act upon ints,
floats, strings and other types as
well as objects.
Pattern matches can act upon several
different values simultaneously:
parallel pattern matching. Method
dispatch and switch are limited to a single
value, e.g. "this".
Patterns can be nested, allowing
dispatch over trees of arbitrary
depth. Method dispatch and switch are limited
to the non-nested case.
Or-patterns allow subpatterns to be
shared. Method dispatch only allows
sharing when methods are from
classes that happen to share a base
class. Otherwise you must manually
factor out the commonality into a
separate function (giving it a
name) and then manually insert calls
from all appropriate places to your
unnecessary function.
Pattern matching provides redundancy
checking which catches errors.
Nested and/or parallel pattern
matches are optimized for you by the
F# compiler. The OO equivalent must
be written by hand and constantly
reoptimized by hand during
development, which is prohibitively
tedious and error prone so
production-quality OO code tends to
be extremely slow in comparison.
Active patterns allow you to inject
custom dispatch semantics.
Off the top of my head:
The compiler can tell if you haven't covered all possibilities in your matches
You can use a match as an assignment
If you have a discriminated union, each match can have a different 'type'
Tuples have "," and Variants have Ctor args .. these are constructors, they create things.
Patterns are destructors, they rip them apart.
They're dual concepts.
To put this more forcefully: the notion of a tuple or variant cannot be described merely by its constructor: the destructor is required or the value you made is useless. It is these dual descriptions which define a value.
Generally we think of constructors as data, and destructors as control flow. Variant destructors are alternate branches (one of many), tuple destructors are parallel threads (all of many).
The parallelism is evident in operations like
(f * g) . (h * k) = (f . h * g . k)
if you think of control flowing through a function, tuples provide a way to split up a calculation into parallel threads of control.
Looked at this way, expressions are ways to compose tuples and variants to make complicated data structures (think of an AST).
And pattern matches are ways to compose the destructors (again, think of an AST).
Switch is the two front wheels.
Pattern-matching is the entire car.
Pattern matches in OCaml, in addition to being more expressive as mentioned in several ways that have been described above, also give some very important static guarantees. The compiler will prove for you that the case-analysis embodied by your pattern-match statement is:
exhaustive (no cases are missed)
non-redundant (no cases that can never be hit because they are pre-empted by a previous case)
sound (no patterns that are impossible given the datatype in question)
This is a really big deal. It's helpful when you're writing the program for the first time, and enormously useful when your program is evolving. Used properly, match-statements make it easier to change the types in your code reliably, because the type system points you at the broken match statements, which are a decent indicator of where you have code that needs to be fixed.
If-Else (or switch) statements are about choosing different ways to process a value (input) depending on properties of the value at hand.
Pattern matching is about defining how to process a value given its structure, (also note that single case pattern matches make sense).
Thus pattern matching is more about deconstructing values than making choices, this makes them a very convenient mechanism for defining (recursive) functions on inductive structures (recursive union types), which explains why they are so abundantly used in languages like Ocaml etc.
PS: You might know the pattern-match and If-Else "patterns" from their ad-hoc use in math;
"if x has property A then y else z" (If-Else)
"some term in p1..pn where .... is the prime decomposition of x.." ((single case) pattern match)
Perhaps you could draw an analogy with strings and regular expressions? You describe what you are looking for, and let the compiler figure out how for itself. It makes your code much simpler and clearer.
As an aside: I find that the most useful thing about pattern matching is that it encourages good habits. I deal with the corner cases first, and it's easy to check that I've covered every case.

Resources