Why don't F# lists have a tail pointer - f#

Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?). What advantages does dropping the tail pointer have?

F#, like many other functional[-ish] languages, has a cons-list (the terminology originally comes from LISP, but the concept is the same). In F# the :: operator (or List.Cons) is used for cons'ing: note the signature is ‘a –> ‘a list –> ‘a list (see Mastering F# Lists).
Do not confuse a cons-list with an opaque Linked List implementation which contains a discrete first[/last] node - every cell in a cons-list is the start of a [different] list! That is, a "list" is simply the chain of cells that starts at a given cons-cell.
This offers some advantages when used in a functional-like manner: one is that all the "tail" cells are shared and since each cons-cell is immutable (the "data" might be mutable, but that's a different issue) there is no way to make a change to a "tail" cell and flub up all the other lists which contain said cell.
Because of this property, [new] lists can be efficiently built - that is, they do not require a copy - simply by cons'ing to the front. In addition, it is also very efficient to deconstruct a list to head :: tail - once again, no copy - which is often very useful in recursive functions.
This immutable property generally does not exist in a [standard mutable] Double Linked List implementation in that appending would add side-effects: the internal 'last' node (the type is now opaque) and one of the "tail" cells are changed. (There are immutable sequence types that allow an "effectively constant time" append/update such as immutable.Vector in Scala -- however, these are heavy-weight objects compared to a cons-list that is nothing more than a series of cells cons'ed together.)
As mentioned, there are also disadvantages a cons-list is not appropriate for all tasks - in particular, creating a new list except by cons'ing to the head is an O(n) operation, fsvo n, and for better (or worse) the list is immutable.
I would recommend creating your own version of concat to see how this operation is really done. (The article Why I love F#: Lists - The Basics covers this.)
Happy coding.
Also see related post: Why can you only prepend to lists in functional languages?

F# lists are immutable, there's no such thing as "append/concat", rather there's just creating new lists (that may reuse some suffixes of old lists). Immutability has many advantages, outside the scope of this question. (All pure languages, and most functional languages have this data structure, it is not an F#-ism.)
See also
http://diditwith.net/2008/03/03/WhyILoveFListsTheBasics.aspx
which has good picture diagrams to explain things (e.g. why consing on the front is cheaper than at the end for an immutable list).

In addition to what the others said: if you need efficient, but yet immutable data structures (which should be an idiomatic F# way), you have to consider reading Chris Okasaki, Purely Functional Data Structures. There is also a thesis available (on which the book is based).

In addition to what has been already said, the Introducing Functional Programming section on MSDN has an article about Working with Functional Lists that explains how lists work and also implements them in C#, so it may be a good way to understand how they work (and why adding reference to the last element would not allow efficient implementation of append).
If you need to append things to the end of the list, as well as to the front, then you need a different data structure. For example, Norman Ramsey posted source code for DList which has these properties here (The implementation is not idiomatic F#, but it should be easy to fix).

If you find you want a list with better performance for append operations, have a look at the QueueList in the F# PowerPack and the JoinList in the FSharpx extension libraries.
QueueList encapsulates two lists. When you prepend using the cons, it prepends an element to the first list, just as a cons-list. However, if you want to append a single element, it can be pushed to the top of the second list. When the first list runs out of elements, List.rev is run on the second list, and the two are swapped putting your list back in order and freeing the second list to append new elements.
JoinList uses a discriminated union to more efficiently append whole lists and is a bit more involved.
Both are obviously less performant for standard cons-list operations but offer better performance for other scenarios.
You can read more about these structures in the article Refactoring Pattern Matching.

As others have pointed out, an F# list could be represented by a data structure:
List<T> { T Value; List<T> Tail; }
From here, the convention is that a list goes from the List you have a reference to until Tail is null. Based on that definition, the benefits/features/limitations in the other answers come naturally.
However, your original question seems to be why the list is not defined more like:
List<T> { Node<T> Head; Node<T> Tail; }
Node<T> { T Value; Node<T> Next; }
Such a structure would allow both appending and prepending to the list without any visible effects to the a reference to the original list, since it still only sees a "window" of the now expanded list. Although this would (sometimes) allow O(1) concatenation, there are several issues such a feature would face:
The concatenation only works once. This can lead to unexpected performance behavior where one concatenation is O(1), but the next is O(n). Say for example:
listA = makeList1 ()
listB = makeList2 ()
listC = makeList3 ()
listD = listA + listB //modified Node at tail of A for O(1)
listE = listA + listC //must now make copy of A to concat with C
You could argue that the time savings for the cases where possible are worth it, but the surprise of not knowing when it will be O(1) and when O(n) are strong arguments against the feature.
All lists now take up twice as much space, even if you never plan to concatenate them.
You now have a separate List and Node type. In the current implementation, I believe F# only uses a single type like the beginning of my answer. There may be a way to do what you are suggesting with only one type, but it is not obvious to me.
The concatenation requires mutating the original "tail" node instance. While this shouldn't affect programs, it is a point of mutation, which most functional languages tend to avoid.

Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?).
If by "tail pointer" you mean a pointer from every list to the last element in the list, that alone cannot be used to provide either of the benefits you cite. Although you could then get at the last element in the list quickly, you cannot do anything with it because it is immutable.
You could write a mutable doubly-linked list as you say but the mutability would make programs using it significantly harder to reason about because every function you call with one might change it.
As Brian said, there are purely functional catenable lists. However, they are many times slower at common operations than the simple singly-linked list that F# uses.
What advantages does dropping the tail pointer have?
30% less space usage and better performance on virtually all list operations.

Related

What are the steps in doing incrementation in erlang?

increment([]) -> [];
increment([H|T]) -> [H+1|increment(T)].
decrement([]) -> [];
decrement([H|T]) -> [H-1|decrement(T)].
So I have this code but I don't know how they properly work like in java.
Java and Erlang are different beasts. I don't recommend trying to make comparisons to Java when learning Erlang, especially if Java is the only language you know so far. The code you've posted is a good example of the paradigm known as "functional programming". I'd suggest doing some reading on that subject to help you understand what's going on. To try to break this down as far as Erlang goes, you need to understand that an Erlang function is completely different from a Java method.
In Java, your method signature is composed of the method name and the types of its arguments. The return type can also be significant. A Java increment method like the function you wrote might be written like List<Integer> increment(List<Integer> input). The body of the Java method would probably iterate through the list an element at a time and set each element to itself plus one:
List<Integer> increment(List<Integer> input) {
for (int i = 0; i < input.size; i++) {
input.set(i, input.get(i) + 1);
}
}
Erlang has almost nothing in common with this. To begin with, an erlang function's "signature" is the name and arity of the function. Arity means how many arguments the function accepts. So your increment function is known as increment/1, and that's its unique signature. The way you write the argument list inside the parentheses after the function name has less to do with argument types than with the pattern of the data passed to it. A function like increment([]) -> ... can only successfully be called by passing it [], the empty list. Likewise, the function increment([Item]) -> ... can only be successfully called by passing it a list with one item in it, and increment([Item1, Item2]) -> ... must be passed a list with two items in it. This concept of matching data to patterns is quite aptly known as "pattern matching", and you'll find it in many functional languages. In Erlang functions, it's used to select which head of the function to execute. This bears a rough similarity to Java's method overloading, where you can have many methods with the same name but different argument types; however a pattern in an Erlang function head can bind variables to different pieces of the arguments that match the pattern.
In your code example, the function increment/1 has two heads. The first head is executed only if you pass an empty list to the function. The second head is executed only if you pass a non-empty list to the function. When that happens, two variables, H and T, are bound. H is bound to the first item of the list, and T is bound to the rest of the list, meaning all but the first item. That's because the pattern [H|T] matches a non-empty list, including a list with one element, in which case T would be bound to the empty list. The variables thus bound can be used in the body of the function.
The bodies of your functions are a very typical form of iterating a list in Erlang to produce a new list. It's typical because of another important difference from Java, which is that Erlang data is immutable. That means there's no such concept as "setting an element of a list" like I did in the Java code above. If you want to change a list, you have to build a new one, which is what your code does. It effectively says:
The result of incrementing the empty list is the empty list.
The result of incrementing a non-empty list is:
Take the first element of the list: H.
Increment the rest of the list: increment(T).
Prepend H+1 to the result of incrementing the rest of the list.
Note that you want to be careful about how you build lists in Erlang, or you can end up wasting a lot of resources. The List Handling User's Guide is a good place to learn about that. Also note that this code uses a concept known as "recursion", meaning that the function calls itself. In many popular languages, including Java, recursion is of limited usefulness because each new function call adds a stack frame, and your available memory space for stack frames is relatively limited. Erlang and many functional languages support a thing known as "tail call elimination", which is a feature that allows properly written code to recurse indefinitely without exhausting any resources.
Hopefully this helps explain things. If you can ask a more specific question, you might get a better answer.

Why are mutables allowed in F#? When are they essential?

Coming from C#, trying to get my head around the language.
From what I understand one of the main benefits of F# is that you ditch the concept of state, which should (in many cases) make things much more robust.
If this is the case (and correct me if it's not), why allow us to break this principle with mutables? To me it feels like it they don't belong in the language. I understand you don't have to use them, but it gives you the tools to go off track and think in an OOP manner.
Can anyone provide an example of where a mutable value is essential?
Current compilers for declarative (stateless) code are not very smart. This results in lots of memory allocations and copy operations, which are rather expensive. Mutating some property of an object allows to re-use the object in its new state, which is much faster.
Imagine you make a game with 10000 units moving around at 60 ticks a second. You can do this in F#, including collisions with a mutable quad- or octree, on a single CPU core.
Now imagine the units and quadtree are immutable. The compiler would have no better idea than to allocate and construct 600000 units per second and create 60 new trees per second. And this excludes any changes in other management structures. In a real-world use case with complex units, this kind of solution will be too slow.
F# is a multi-paradigm language that enables the programmer to write functional, object-oriented, and, to an extent, imperative programs. Currently, each variant has its valid uses. Maybe, at some point in the future, better compilers will allow for better optimization of declarative programs, but right now, we have to fall back to imperative programming when performance becomes an issue.
Having the ability to use mutable state is often important for performance reasons, among other things.
Consider implementing the API List.take: count : int -> list : 'a list -> 'a list which returns a list consisting of only the first count elements from the input list.
If you are bound by immutability, Lists can only be built up back-to-front. Implementing take then boils down to
Build up result list back-to-front with first count guys from input: O(count)
Reverse that result and return O(count)
The F# runtime, for performance reasons, has the magic special ability to build Lists front-to-back when needed (i.e. to mutate the tail of the last guy to point to a new tail element). The basic algorithm used for List.take is:
Build up result list front-to-back with first count guys from input: O(count)
Return the result
Same asymptotic performance, but in practical terms it's twice as fast to use mutation in this case.
Pervasive mutable state can be a nightmare as code is difficult to reason about. But if you factor your code so that mutable state is tightly encapsulated (e.g. in implementation details of List.take), then you can enjoy its benefits where it makes sense. So making immutability the default, but still allowing mutability, is a very practical and useful feature of the language.
First of all, what makes F# powerful is, in my opinion, not just the immutability by default, but a whole mix of features like: immutability by default, type inference, lightweight syntax, sum (DUs) and product types (tuples), pattern matching and currying by default. Possibly more.
These make F# very functional by default and they make you program in a certain way. In particular they make you feel uncomfortable when you use mutable state, as it requires the mutable keyword. Uncomfortable in this sense means more careful. And that is exactly what you should be.
Mutable state is not forbidden or evil per se, but it should be controlled. The need to explicitly use mutable is like a warning sign making you aware of danger. And good ways how to control it, is using it internally within a function. That way you can have your own internal mutable state and still be perfectly thread-safe because you don't have shared mutable state. In fact, your function can still be referentially transparent even if it uses mutable state internally.
As for why F# allows mutable state; it would be very difficult to write usual real-world code without the possibility for it. For instance in Haskell, something like a random number cannot be done in the same way as it can be done in F#, but rather needs threading through the state explicitly.
When I write applications, I tend to have about 95% of the code base in a very functional style that would be pretty much 1:1 portable to say Haskell without any trouble. But then at the system boundaries or at some performance-critical inner loop mutable state is used. That way you get the best of both worlds.

Why implement an immutable list as a linked-list?

According to F#'s list documentation:
"A list in F# is an ordered, immutable series of elements of the same type"
"Lists in F# are implemented as singly linked lists"
Why not implement it contiguously in memory since it immutable and thus has a fixed size? Why ever use an F# list instead of an F# array?
They serve different purposes, for instance:
You use an Array in F# to store big amounts of data that needs to be accessed randomly with relative low overhead.
A List in F# is useful when you need to accumulate something over iterations of a recursive function. Arrays don't play well here, since they have a fixed size.
With a list, you can prepend all elements of ListM (size M) to ListN (size N) in O(M) time. Similarly, you can prepend a single Element to any list in O(1) time.

Comparing arrays elements in Erlang

I'm trying to learn how to think in a functional programming way, for this, I'm trying to learn Erlang and solving easy problems from codingbat. I came with the common problem of comparing elements inside a list. For example, compare a value of the i-th position element with the value of the i+1-th position of the list. So, I have been thinking and searching how to do this in a functional way in Erlang (or any functional language).
Please, be gentle with me, I'm very newb in this functional world, but I want to learn
Thanks in advance
Define a list:
L = [1,2,3,4,4,5,6]
Define a function f, which takes a list
If it matches a list of one element or an empty list, return the empty list
If it matches the first element and the second element then take the first element and construct a new list by calling the rest of the list recursivly
Otherwise skip the first element of the list.
In Erlang code
f ([]) -> [];
f ([_]) -> [];
f ([X, X|Rest]) -> [X | f(Rest)];
f ([_|Rest]) -> f(Rest).
Apply function
f(L)
This should work... haven't compiled and run it but it should get you started. Also in case you need to do modifications to it to behave differently.
Welcome to Erlang ;)
I try to be gentle ;-) So main thing in functional approach is thinking in terms: What is input? What should be output? There is nothing like comparing the i-th element with the i+1-th element alone. There have to be always purpose of it which will lead to data transformation. Even Mazen Harake's example doing it. In this example there is function which returns only elements which are followed by same value i.e. filters given list. Typically there are very different ways how do similar thing which depends of purpose of it. List is basic functional structure and you can do amazing things with it as Lisp shows us but you have to remember it is not array.
Each time you need access i-th element repeatable it indicates you are using wrong data structure. You can build up different data structures form lists and tuples in Erlang which can serve your purposes better. So when you face problem to compare i-th with i+1-th element you should stop and think. What is purpose of it? Do you need perform some stream data transformation as Mazen Harake does or You need random access? If second you should use different data structure (array for example). Even then you should think about your task characteristics. If you will be mostly read and almost never write then you can use list_to_tuple(L) and then read using element/2. When you need write occasionally you will start thinking about partition it to several tuples and as your write ratio will grow you will end up with array implementation.
So you can use lists:nth/2 if you will do it only once or several times but on short list and you are not performance freak as I'm. You can improve it using [X1,X2|_] = lists:nthtail(I-1, L) (L = lists:nthtail(0,L) works as expected). If you are facing bigger lists and you want call it many times you have to rethink your approach.
P.S.: There are many other fascinating data structures except lists and trees. Zippers for example.

id values of different variables in python 3

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

Resources