According to F#'s list documentation:
"A list in F# is an ordered, immutable series of elements of the same type"
"Lists in F# are implemented as singly linked lists"
Why not implement it contiguously in memory since it immutable and thus has a fixed size? Why ever use an F# list instead of an F# array?
They serve different purposes, for instance:
You use an Array in F# to store big amounts of data that needs to be accessed randomly with relative low overhead.
A List in F# is useful when you need to accumulate something over iterations of a recursive function. Arrays don't play well here, since they have a fixed size.
With a list, you can prepend all elements of ListM (size M) to ListN (size N) in O(M) time. Similarly, you can prepend a single Element to any list in O(1) time.
Related
I am new to .NET programming. Sorry if this question has been asked before.
I am currently learning F#. What are the differences between Dictionary, Hashtable and Map? When should I use each?
I also have another question that is not mentioned in the title. When should I use Async.RunSynchronously? It seems rather self-contradictory to me, so I am sure that I am missing something.
The choice between Dictionary, Hashtable and Map depends on the uses cases. You should however know the characteristics of each. This is not an exhaustive list but just some key differences you might want to start from :
Hashtable Represents a collection of key/value pairs that are organized based on the hash code of the key. This is a mutable collection from .NET BCL
Dictionary<> this is a generic implementation of the hashtable. Also a mutable collection from .NET BCL
Map This is the F# immutable type. It is implemented based on AVL trees which is an entirely different data structure with different performance characteristics and use cases.
If you are doing many writes, Hash tables collections have significantly better fill rate performance than AVL trees.
Retrieving a value from a Dictionary by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
F# maps are implemented as immutable AVL trees, an efficient data structure which forms a self-balancing binary tree. AVL trees are well-known for their efficiency, in which they can search, insert, and delete elements in the tree in O(log n) time, where n is the number of elements in the tree.
As for the map uses case, if you’ve got a set of static data (such as configuration data that’s loaded when your application starts up) you need to look up by key frequently, a Map is as good a choice as any, its immutability in this case ensures that the static data cannot be modified by mistake and has little impact to performance as you never need to mutate it once initialized.
Async.RunSynchronously runs the provided asynchronous computation and awaits its result. You can use it in F# interactive window for example to test your asynchronous workflows.
Seq.windowed in F# returns a sequence where each window within is an array. Is there a reason why each window is returned as an array (a very concrete type) as opposed to say, another sequence or IList<'T>? An IList<'T>, for example, would be sufficient if the purpose was to communicate that the items of the window can be randomly accessed but an array says two things: elements are mutable and randomly accessible. If you can rationalise the choice of array, how is windowed different from Seq.groupBy? Why does that latter (or operators in the same vein) not also return the members of a group as an array?
I'm wondering if this is simply a design oversight or is there a deeper, contractual reason for an array?
I do not know what is the design principle behind this. I suppose it might just be an accidental aspect of the implementation - Seq.windowed can be quite easily implemented by storing items in arrays, while Seq.groupBy probably needs to use some more complicated structure.
In general, I think that F# APIs either use 'T[] if using array is the natural efficient implementation, or return seq<'T> when the data source may be infinite, lazy, or when the implementation would have to convert the data to an array explicitly (then this can be left to the caller).
For Seq.windowed, I think that array makes a good sense, because you know the length of the array and so you are likely to use indexing. For example, assuming that prices is a sequence of date-price tuples (seq<DateTime * float>) you can write:
prices
|> Seq.windowed 5
|> Seq.map (fun win -> fst (win.[2]), Seq.averageBy snd win)
The sample calculates floating average and uses indexing to get the date in the middle.
In summary, I do not really have a good explanation for the design rationale, but I'm quite happy with the choices made - they seem to work really well with the usual use cases for the functions.
A couple of thoughts.
First, know that in their current version, both Seq.windowed and Seq.groupBy use non-lazy collections in their implementation. windowed uses arrays and returns you arrays. groupBy builds up a Dictionary<'tkey, ResizeArray<'tvalue>>, but keeps that a secret and returns the group values back as a seq instead of ResizeArray.
Returning ResizeArrays from groupBy wouldn't fit with anything else, so that obviously needs to be hidden. The other alternative is to give back the ToArray() of the data. This would require another copy of the data to be created, which is a downside. And there isn't really much upside, since you don't know ahead of time how big your group will be anyways, so you aren't expecting to do random access or any of the other special stuff arrays enable. So simply wrapping in a seq seems like a good option.
For windowed, it's a different story. You want an array back in this case. Why? Because you already know how big that array is going to be, so you can safely do random access or, better yet, pattern matching. That's a big upside. The downside remains, though - the data needs to be re-copied into a newly-allocated array for every window.
seq{1 .. 100} |> Seq.windowed 3 |> Seq.map (fun [|x; _; y|] -> x + y)
There is still the open question - "but couldn't we avoid the array allocation/copy downside by internally only using true lazy seqs, and returning them as such? Isn't that more in the 'spirit of the seq'?" It would be kind of tricky (would need some fancy cloning of enumerators?), but sure, probably with some careful coding. There's a huge downside to this, though. You'd need to cache the entire unspooled seq in memory to make it work, which kind of negates the whole goal of doing things lazily. Unlike lists or arrays, enumerating a seq multiple times is not guaranteed to yield the same results (e.g. a seq that returns random numbers), so the backing data for these seq windows you are returning needs to be cached somewhere. When that window is eventually accessed, you can't just tap in and re-enumerate through the original source seq - you might get back different data, or the seq might end at a different place. This points to the other upside of using arrays in Seq.windowed - only windowSize elements need to be kept in memory at once.
This is of course pure guess. I think this is related to the way both functions are implemented.
As already mentioned, in Seq.groupBy the groups are of variable length and in Seq.windowed they are of a fixed size.
So in the implementation from Seq.windowed it makes more sense to use a fixed size array, as opposed to the Generic.List used in Seq.groupBy, which btw in F# is called ResizeArray.
Now to the outside world, an Array though mutable is widely used in F# code and libraries and F# provides syntactic support for creating, initializing and manipulating arrays, whereas the ResizeArray is not that widely used in F# code and the language provides no syntactic support apart from the type alias, so I think that's why they decided to expose it as a Seq.
Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?). What advantages does dropping the tail pointer have?
F#, like many other functional[-ish] languages, has a cons-list (the terminology originally comes from LISP, but the concept is the same). In F# the :: operator (or List.Cons) is used for cons'ing: note the signature is ‘a –> ‘a list –> ‘a list (see Mastering F# Lists).
Do not confuse a cons-list with an opaque Linked List implementation which contains a discrete first[/last] node - every cell in a cons-list is the start of a [different] list! That is, a "list" is simply the chain of cells that starts at a given cons-cell.
This offers some advantages when used in a functional-like manner: one is that all the "tail" cells are shared and since each cons-cell is immutable (the "data" might be mutable, but that's a different issue) there is no way to make a change to a "tail" cell and flub up all the other lists which contain said cell.
Because of this property, [new] lists can be efficiently built - that is, they do not require a copy - simply by cons'ing to the front. In addition, it is also very efficient to deconstruct a list to head :: tail - once again, no copy - which is often very useful in recursive functions.
This immutable property generally does not exist in a [standard mutable] Double Linked List implementation in that appending would add side-effects: the internal 'last' node (the type is now opaque) and one of the "tail" cells are changed. (There are immutable sequence types that allow an "effectively constant time" append/update such as immutable.Vector in Scala -- however, these are heavy-weight objects compared to a cons-list that is nothing more than a series of cells cons'ed together.)
As mentioned, there are also disadvantages a cons-list is not appropriate for all tasks - in particular, creating a new list except by cons'ing to the head is an O(n) operation, fsvo n, and for better (or worse) the list is immutable.
I would recommend creating your own version of concat to see how this operation is really done. (The article Why I love F#: Lists - The Basics covers this.)
Happy coding.
Also see related post: Why can you only prepend to lists in functional languages?
F# lists are immutable, there's no such thing as "append/concat", rather there's just creating new lists (that may reuse some suffixes of old lists). Immutability has many advantages, outside the scope of this question. (All pure languages, and most functional languages have this data structure, it is not an F#-ism.)
See also
http://diditwith.net/2008/03/03/WhyILoveFListsTheBasics.aspx
which has good picture diagrams to explain things (e.g. why consing on the front is cheaper than at the end for an immutable list).
In addition to what the others said: if you need efficient, but yet immutable data structures (which should be an idiomatic F# way), you have to consider reading Chris Okasaki, Purely Functional Data Structures. There is also a thesis available (on which the book is based).
In addition to what has been already said, the Introducing Functional Programming section on MSDN has an article about Working with Functional Lists that explains how lists work and also implements them in C#, so it may be a good way to understand how they work (and why adding reference to the last element would not allow efficient implementation of append).
If you need to append things to the end of the list, as well as to the front, then you need a different data structure. For example, Norman Ramsey posted source code for DList which has these properties here (The implementation is not idiomatic F#, but it should be easy to fix).
If you find you want a list with better performance for append operations, have a look at the QueueList in the F# PowerPack and the JoinList in the FSharpx extension libraries.
QueueList encapsulates two lists. When you prepend using the cons, it prepends an element to the first list, just as a cons-list. However, if you want to append a single element, it can be pushed to the top of the second list. When the first list runs out of elements, List.rev is run on the second list, and the two are swapped putting your list back in order and freeing the second list to append new elements.
JoinList uses a discriminated union to more efficiently append whole lists and is a bit more involved.
Both are obviously less performant for standard cons-list operations but offer better performance for other scenarios.
You can read more about these structures in the article Refactoring Pattern Matching.
As others have pointed out, an F# list could be represented by a data structure:
List<T> { T Value; List<T> Tail; }
From here, the convention is that a list goes from the List you have a reference to until Tail is null. Based on that definition, the benefits/features/limitations in the other answers come naturally.
However, your original question seems to be why the list is not defined more like:
List<T> { Node<T> Head; Node<T> Tail; }
Node<T> { T Value; Node<T> Next; }
Such a structure would allow both appending and prepending to the list without any visible effects to the a reference to the original list, since it still only sees a "window" of the now expanded list. Although this would (sometimes) allow O(1) concatenation, there are several issues such a feature would face:
The concatenation only works once. This can lead to unexpected performance behavior where one concatenation is O(1), but the next is O(n). Say for example:
listA = makeList1 ()
listB = makeList2 ()
listC = makeList3 ()
listD = listA + listB //modified Node at tail of A for O(1)
listE = listA + listC //must now make copy of A to concat with C
You could argue that the time savings for the cases where possible are worth it, but the surprise of not knowing when it will be O(1) and when O(n) are strong arguments against the feature.
All lists now take up twice as much space, even if you never plan to concatenate them.
You now have a separate List and Node type. In the current implementation, I believe F# only uses a single type like the beginning of my answer. There may be a way to do what you are suggesting with only one type, but it is not obvious to me.
The concatenation requires mutating the original "tail" node instance. While this shouldn't affect programs, it is a point of mutation, which most functional languages tend to avoid.
Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?).
If by "tail pointer" you mean a pointer from every list to the last element in the list, that alone cannot be used to provide either of the benefits you cite. Although you could then get at the last element in the list quickly, you cannot do anything with it because it is immutable.
You could write a mutable doubly-linked list as you say but the mutability would make programs using it significantly harder to reason about because every function you call with one might change it.
As Brian said, there are purely functional catenable lists. However, they are many times slower at common operations than the simple singly-linked list that F# uses.
What advantages does dropping the tail pointer have?
30% less space usage and better performance on virtually all list operations.
If I want to make a time series type in F# to hold stock prices, which basic type should I use? We need
Select a subset based on time index,
Calculate basic statistics for a subset like mean, STD or for several subsets like correlations,
Append item for new data and fast update statistics or technical indicators,
Do linear regression between time series, etc
I have read that array has a better performance, seq has a smaller memory footnote, list is better for adding items and F# vector is easier for certain math calculation. To balance all the trade offs, how would you model a stock price time series in f#? Thanks.
As a concrete representation you can choose either array or list or some other .NET colllection type. A sequence seq<'T> is an abstract type and both array and list are automatically also sequences - this means that when you write some code that works with sequences, it will work with any concrete data type (array, list or any other .NET collection).
So, when writing data processing, you can use Seq by default (as it gives you great flexibility - it doesn't matter what concrete representation you use) and then optimize some operations to use the concrete representation (whatever that will be) if you need something to run faster.
Regarding the concrete representation - I think the crucial question is whether you want to add elements without changing original data structure (immutable list or array used in an immutable way) or whether you want to mutate the data structure (e.g. use some mutable .NET collection).
If you need to add new items freuqently then you can either use immutable list (which supports appending elements to front) or a mutable collection (array won't do as it cannot be resized).
If you're working on a more sophisticated system, I would recommend taking a look at ObservableCollection<T> (see MSDN). This is a collection that automatically notifies you when it is changed. In response to the notification, you could update your statistics (it also tells you which elements were added, so you don't need to recalculate everything). However, F# doesn't have any libraries for working with this type, so you'll need to write a lot of things yourself.
If you're adding data only rarely or adding them in larger groups, you could use array (and allocate new array each time you add items). If you have only relatively small number of items in the collection, you could use lists (where adding item is easy).
For numerical calculations, the F# PowerPack (and types like vector) offer only quite limitied set of features, so you may need to look at some thrid party libraries. Extreme optimizations is a commercial library with some F# examples and Math.NET is an open source alternative.
Otherwise, it is difficult to give any concrete advice - can you add some more details about your system? (e.g. how large the data set is, how many items need to be added how often etc...)
I can do (x : int array)
But I need only 300 elements array , so how do I (x : int[300]) ?
Can't find such information over msdn )
#Marcelo Cantos No reason , but I always used sized arrays. Why not ?
No. The F# type system does not support types such as "array of size 300", and even if it did, using the type system to check potential array overflows at compile time is too impractical to implement.
Besides, "has exactly 300 elements" is an useless property in F# in almost all situations, because there is a wealth of functions and primitives that work on arrays of arbitrary size without any risk of overflow (map or iter, for instance). Why write code that works for 300 elements when you can just as easily write code that works for any number of elements ?
If you really need to represent the "has exactly 300 elements" property, the simplest thing you could do is create a wrapper type around the native array type. This lets you restrict those operations that return arrays to only operations that respect the 300-element invariant (such as a map from another 300-element array, or a create where the length property is always 300). I'm afraid this isn't as simple as you hoped, but since F# does not natively support 300-element arrays, you will need to describe all the function invariants yourself.