I would like to understand which is the difference between these two programming concepts. The first represents the absence of data type and at the latter the type exists but there is no information. Additionally, I recognize that Unit comes from functional programming theoretical foundation but I still cannot understand what is the usability of the unit primitive (e.g., in an F# program).
In functional programming, we usually speak of mapping inputs to outputs. This literally means mapping an argument to its return value(s). But if something is going to be a function in the mathematical/category-theory sense, it has to return something. A void value represents that a function returns nothing, which is nonsensical in these terms.
unit is the functional answer to void. It's essentially a type with only one value, (). It has a number of uses, but here's a simple one. Let's say you had something like this in a more traditional imperative language:
public static <T, U> List<U> map(List<T> in, Function<T, U> func) {
List<U> out = new ArrayList<U>(in.size());
for (T t : in) {
out.add(func.apply(t));
}
return out;
}
This applies a particular function func to every element on the list, producing a new list of func's output type. But what happens if you pass in a Function that just prints its arguments? It won't have an output type, so what can you put for U?
In some languages, passing in such a function would break this code (like in C#, where you can't assign void to a generic type). You'd have to resort to workarounds like having an Action<T>, which can get clunky.
This is where the concept of unit is useful: it is a type, but one that may only take on a single value. That greatly simplifies things like chaining and composition, and vastly reduces the number of special cases you have to worry about.
The unit type just makes everything more regular. To an extent you can think of every function in F# as taking a single parameter and returning a single result. Functions that don't need any parameters actually take "unit" as a parameter, and functions that don't return any results return "unit" as a result. This has a variety of advantages; for one, consider how in C# you need both a slew of "Func" delegates to represent functions of various arities that return values, as well as a slew of "Action" delegates that do not return values (because e.g. Func<int,void> is not legal - void cannot be used that way, since it's not quite a 'real' type).
See also F# function types: fun with tuples and currying
Related
increment([]) -> [];
increment([H|T]) -> [H+1|increment(T)].
decrement([]) -> [];
decrement([H|T]) -> [H-1|decrement(T)].
So I have this code but I don't know how they properly work like in java.
Java and Erlang are different beasts. I don't recommend trying to make comparisons to Java when learning Erlang, especially if Java is the only language you know so far. The code you've posted is a good example of the paradigm known as "functional programming". I'd suggest doing some reading on that subject to help you understand what's going on. To try to break this down as far as Erlang goes, you need to understand that an Erlang function is completely different from a Java method.
In Java, your method signature is composed of the method name and the types of its arguments. The return type can also be significant. A Java increment method like the function you wrote might be written like List<Integer> increment(List<Integer> input). The body of the Java method would probably iterate through the list an element at a time and set each element to itself plus one:
List<Integer> increment(List<Integer> input) {
for (int i = 0; i < input.size; i++) {
input.set(i, input.get(i) + 1);
}
}
Erlang has almost nothing in common with this. To begin with, an erlang function's "signature" is the name and arity of the function. Arity means how many arguments the function accepts. So your increment function is known as increment/1, and that's its unique signature. The way you write the argument list inside the parentheses after the function name has less to do with argument types than with the pattern of the data passed to it. A function like increment([]) -> ... can only successfully be called by passing it [], the empty list. Likewise, the function increment([Item]) -> ... can only be successfully called by passing it a list with one item in it, and increment([Item1, Item2]) -> ... must be passed a list with two items in it. This concept of matching data to patterns is quite aptly known as "pattern matching", and you'll find it in many functional languages. In Erlang functions, it's used to select which head of the function to execute. This bears a rough similarity to Java's method overloading, where you can have many methods with the same name but different argument types; however a pattern in an Erlang function head can bind variables to different pieces of the arguments that match the pattern.
In your code example, the function increment/1 has two heads. The first head is executed only if you pass an empty list to the function. The second head is executed only if you pass a non-empty list to the function. When that happens, two variables, H and T, are bound. H is bound to the first item of the list, and T is bound to the rest of the list, meaning all but the first item. That's because the pattern [H|T] matches a non-empty list, including a list with one element, in which case T would be bound to the empty list. The variables thus bound can be used in the body of the function.
The bodies of your functions are a very typical form of iterating a list in Erlang to produce a new list. It's typical because of another important difference from Java, which is that Erlang data is immutable. That means there's no such concept as "setting an element of a list" like I did in the Java code above. If you want to change a list, you have to build a new one, which is what your code does. It effectively says:
The result of incrementing the empty list is the empty list.
The result of incrementing a non-empty list is:
Take the first element of the list: H.
Increment the rest of the list: increment(T).
Prepend H+1 to the result of incrementing the rest of the list.
Note that you want to be careful about how you build lists in Erlang, or you can end up wasting a lot of resources. The List Handling User's Guide is a good place to learn about that. Also note that this code uses a concept known as "recursion", meaning that the function calls itself. In many popular languages, including Java, recursion is of limited usefulness because each new function call adds a stack frame, and your available memory space for stack frames is relatively limited. Erlang and many functional languages support a thing known as "tail call elimination", which is a feature that allows properly written code to recurse indefinitely without exhausting any resources.
Hopefully this helps explain things. If you can ask a more specific question, you might get a better answer.
What's the advantage of having a type represent a function?
For example, I have observed the following snippet:
type Soldier = Soldier of PieceProperties
type King = King of PieceProperties
type Crown = Soldier -> King
Is it just to support Partial Application when additional args have yet to be satisfied?
As Fyodor Soikin says in the comments
Same reason you give names to everything else - values, functions,
modules, etc.
In other words, think about programming in assembly which typically does not use types, (yes I am aware of typed assembly) and all of the problems that one can have and then how many of those problems are solved or reduced by adding types.
So before you programmed with a language that supported functions but that used static typing, you typed everything. Now that you are using F# which has static typing and functions, just extend what you have been using typing for but now add the ability to type the functions.
To quote Benjamin C. Pierce from "Types and Programming Languages"
A type system is a tractable syntactic method for proving the absence
of certain program behaviors by classifying phrases according to the
kinds of values they compute.
As noted in "Types and Programming Languages" Section 1.2
What Type Systems Are Good For
Detecting Errors
Abstraction
Documentation
Language Safety
Efficiency
TL;DR
One of the places that I find named type function definitions invaluable is when I am building parser combinators. During the construction of the functions I fully type the functions so that I know what the types are as opposed to what type inferencing will infer they are which might be different than what I want. Since the function types typically have several parameters it is easier to just give the function type a name, and then use that name everywhere it is needed. This also saves time because the function definition is consistent and avoid having to debug an improperly declared function definition; yes I have made mistakes by doing each function type by hand and learned my lesson. Once all of the functions work, I then remove the type definitions from the functions, but leave the type definition as comments so that it makes the code easier to understand.
A side benefit of using the named type definitions is that when creating test cases, the typing rules in the named function will ensure that the data used for the test is of the correct type. This also makes understanding the data for the test much easier to understand when you come back to it after many months.
Another advantage is that using function names makes the code easier to understand because when a person new to the code looks at if for the first time they can spot the consistency of the names. Also if the names are meaningful then it makes understanding the code much easier.
You have to remember that functions are also values in F#. And you can do pretty much the same stuff with them as other types. For example you can have a function that returns other functions. Or you can have a list that stores functions. In these cases it will help if you are explicit about the function signature. The function type definition will help you to constrain on the parameters and return types. Also, you might have a complicated type signature, a type definition will make it more readable. This maybe a bit contrived but you can do fun(ky) stuff like this:
type FuncX = int -> int
type FuncZ = float -> float -> float
let addxy (x:int) :FuncX = (+) x
let subxy :FuncX = (-) x
let addz (x:float) :FuncZ =
fun (x:float) -> (fun y -> x + y)
let listofFunc = [addxy 10;addxy 20; subxy 10]
If you check the type of listofFunc you will see it's FuncX list. Also the :FuncX refers to the return type of the function. But we could you use it as an input type as well:
let compFunc (x:FuncX) (z:FuncX) =
[(x 10);(z 10)]
compFunc (addxy 10) (addxy 20)
Dart has both:
an equality operator == and
a top-level function named identical().
By the choice of syntax, it feels natural to want to use Dart's == operator more frequently than identical(), and I like that. In fact, the Section on Equality of the Idiomatic Dart states that "in practice, you will rarely need to use" identical().
In a recent answer to one of my questions concerning custom filters, it seems that Angular Dart favors use of identical() rather than == when trying to determine whether changes to a model have reached a steady state. (Which can make sense, I suppose, for large models for reasons of efficiency.)
This got me to thinking about identity of int's and so I wrote some tests of identical() over ints. While I expected that small ints might be "interned/cached" (e.g. similar to what is done by Java's Integer.valueOf()), to my surprise, I can't seem to generate two ints that are equal but not identical. I get similar results for double.
Are int and double values being interned/cached? Or maybe identical() is treating them specially? Coming from a Java background, I used to equate equate Dart's:
== to Java's equal() method and
identical() to Java's equality test ==.
But that now seems wrong. Anyone know what is going on?
Numbers are treated specially. If their bit-pattern is the same they must be identical (although it is still debated if this includes the different versions of NaNs).
The main reasons are expectations, leaking of internal details and efficiency.
Expectations: users expect numbers to be identical. It goes against common sense that x == y (for two integers) but not identical(x, y).
Leaking of internal details: the VM uses SMIs (SMall Integers) to represent integers in a specific range (31 bits on 32-bit machines, 63 on 64-bit machines). These are canonicalized and are always identical. Exposing this internal implementation detail would lead to inconsistent results depending on which platform you run.
Efficiency: the VM wants to unbox numbers wherever it can. For example, inside a method doubles are frequently moved into registers. However, keeping track of the original box can be cumbersome and difficult.
foo(x, y) {
var result = x;
while(y-- > 0) {
result += x;
}
return result;
}
Suppose, that the VM optimizes this function and moves result into a register (unboxing x in the process). This allows for a tight loop where result is then efficiently modified. The difficult case happens, when y is 0. The loop wouldn't execute and foo would return x directly. In other words, the following would need to be true:
var x = 5.0;
identical(x, foo(x, 0)); // should be true.
If the VM unboxed the result variable in the method foo it would need to allocate a fresh box for the result and the identical call would therefore return false.
By modifying the definition of identical all these problems are avoided. It comes with a small cost to the identical check, though.
Seems like I posted too quickly. I just stumbled on Dart Issue 13084: Spec says identical(1.0, 1) is true, even if they have different types which led me to the Dart section on Object Identity of the language spec. (I had previously search for equality in the spec but not object identity.)
Here is an excerpt:
The predefined dart function identical() is defined such that identical(c1, c2) iff:
- c1 evaluates to either null or an instance of
bool and c1 == c2, OR
- c1 and c2 are instances of int and c1 == c2, OR
- c1 and c2 are constant strings and c1 == c2, OR
- c1 and c2 are instances of double and one of the following holds: ...
and there are more clauses dealing with lists, maps and constant objects. See the language spec for the full details. Hence, identical() is much more than just a simple test for reference equality.
I can't remember the source for this, but somewhere on dartlang.org or the issue tracker it was said that num, int and double are indeed getting special treatment. One of those special treatments is that you can't subclass those types for performance reasons, but there may be more. What exactly this special treatment entails can probably only be answered by the developers, or maybe someone who knows the specification by heart, but one thing can be inferred:
The numeric types are dart objects - they have methods you can call on their instances. But they also have qualities of primitive data types, as you can do int i = 3;, while a pure object should have a new keyword somewhere. This is different from Java, where there are real primitive types and real objects wrapping them and exposing instance methods.
While the technical details certainly are more complex, if you think about dart numerics as a blend of object and primitive, your comparison to Java still makes sense. In Java, new Integer(5).equals(new Integer(5)) evaluates to true, and so does 5==5.
I am aware this is not a very technically correct answer, but I hope it's still useful to make sense of the behaviour of dart numerics when coming from a Java background.
I am trying to port a small compiler from C# to F# to take advantage of features like pattern matching and discriminated unions. Currently, I am modeling the AST using a pattern based on System.Linq.Expressions: A an abstract base "Expression" class, derived classes for each expression type, and a NodeType enum allowing for switching on expressions without lots of casting. I had hoped to greatly reduce this using an F# discriminated union, but I've run into several seeming limitations:
Forced public default constructor (I'd like to do type-checking and argument validation on expression construction, as System.Linq.Expressions does with it's static factory methods)
Lack of named properties (seems like this is fixed in F# 3.1)
Inability to refer to a case type directly. For example, it seems like I can't declare a function that takes in only one type from the union (e. g. let f (x : TYPE) = x compiles for Expression (the union type) but not for Add or Expression.Add. This seems to sacrifice some type-safety over my C# approach.
Are there good workarounds for these or design patterns which make them less frustrating?
I think, you are stuck a little too much with the idea that a DU is a class hierarchy. It is more helpful to think of it as data, really. As such:
Forced public default constructor (I'd like to do type-checking and argument validation on expression construction, as
System.Linq.Expressions does with it's static factory methods)
A DU is just data, pretty much like say a string or a number, not functionality. Why don't you make a function that returns you an Expression option to express, that your data might be invalid.
Lack of named properties (seems like this is fixed in F# 3.1)
If you feel like you need named properties, you probably have an inappropriate type like say string * string * string * int * float as the data for your Expression. Better make a record instead, something like AddInfo and make your case of the DU use that instead, like say | Add of AddInfo. This way you have properties in pattern matches, intellisense, etc.
Inability to refer to a case type directly. For example, it seems like I can't declare a function that takes in only one type from the
union (e. g. let f (x : TYPE) = x compiles for Expression (the union
type) but not for Add or Expression.Add. This seems to sacrifice some
type-safety over my C# approach.
You cannot request something to be the Add case, but you definitely do can write a function, that takes an AddInfo. Plus you can always do it in a monadic way and have functions that take any Expression and only return an option. In that case, you can pattern match, that your input is of the appropriate type and return None if it is not. At the call site, you then can "use" the value in the good case, using functions like Option.bind.
Basically try not to think of a DU as a set of classes, but really just cases of data. Kind of like an enum.
You can make the implementation private. This allows you the full power of DUs in your implementation but presents a limited view to consumers of your API. See this answer to a related question about records (although it also applies to DUs).
EDIT
I can't find the syntax on MSDN, but here it is:
type T =
private
| A
| B
private here means "private to the module."
I'm started to learn F#, and I noticed that one of the major differences in syntax from C# is that type inference is used much more than in C#. This is usually presented as one of the benefits of F#. Why is type inference presented as benefit?
Imagine, you have a class hierarchy and code that uses different classes from it. Strong typing allows you quickly detect which classes are used in any method.
With type inference it will not be so obvious and you have to use hints to understand, which class is used. Are there any techniques that exist to make F# code more readable with type inference?
This question assumes that you are using object-oriented programming (e.g. complex class hierarchies) in F#. While you can certainly do that, using OO concepts is mainly useful for interoperability or for wrapping some F# functionality in a .NET library.
Understanding code. Type inference becomes much more useful when you write code in the functional style. It makes code shorter, but also helps you understand what is going on. For example, if you write map function over list (the Select method in LINQ):
let map f list =
seq { for el in list -> f el }
The type inference tells you that the function type is:
val map : f:('a -> 'b) -> list:seq<'a> -> seq<'b>
This matches our expectations about what we wanted to write - the argument f is a function turning values of type 'a into values of type 'b and the map function takes a list of 'a values and produces a list of 'b values. So you can use the type inference to easily check that your code does what you would expect.
Generalization. Automatic generalization (mentioned in the comments) means that the above code is automatically as reusable as possible. In C#, you might write:
IEnumerable<int> Select(IEnumerable<int> list, Func<int, int> f) {
foreach(int el in list)
yield return f(el);
}
This method is not generic - it is Select that works only on collections of int values. But there is no reason why it should be restricted to int - the same code would work for any types. The type inference mechanism helps you discover such generalizations.
More checking. Finally, thanks to the inference, the F# language can more easily check more things than you could if you had to write all types explicitly. This applies to many aspects of the language, but it is best demonstrated using units of measure:
let l = 1000.0<meter>
let s = 60.0<second>
let speed = l/s
The F# compiler infers that speed has a type float<meter/second> - it understands how units of measure work and infers the type including unit information. This feature is really useful, but it would be hard to use if you had to write all units by hand (because the types get long). In general, you can use more precise types, because you do not have to (always) type them.