Why does F#'s Seq.windowed return seq of array - f#

Seq.windowed in F# returns a sequence where each window within is an array. Is there a reason why each window is returned as an array (a very concrete type) as opposed to say, another sequence or IList<'T>? An IList<'T>, for example, would be sufficient if the purpose was to communicate that the items of the window can be randomly accessed but an array says two things: elements are mutable and randomly accessible. If you can rationalise the choice of array, how is windowed different from Seq.groupBy? Why does that latter (or operators in the same vein) not also return the members of a group as an array?
I'm wondering if this is simply a design oversight or is there a deeper, contractual reason for an array?

I do not know what is the design principle behind this. I suppose it might just be an accidental aspect of the implementation - Seq.windowed can be quite easily implemented by storing items in arrays, while Seq.groupBy probably needs to use some more complicated structure.
In general, I think that F# APIs either use 'T[] if using array is the natural efficient implementation, or return seq<'T> when the data source may be infinite, lazy, or when the implementation would have to convert the data to an array explicitly (then this can be left to the caller).
For Seq.windowed, I think that array makes a good sense, because you know the length of the array and so you are likely to use indexing. For example, assuming that prices is a sequence of date-price tuples (seq<DateTime * float>) you can write:
prices
|> Seq.windowed 5
|> Seq.map (fun win -> fst (win.[2]), Seq.averageBy snd win)
The sample calculates floating average and uses indexing to get the date in the middle.
In summary, I do not really have a good explanation for the design rationale, but I'm quite happy with the choices made - they seem to work really well with the usual use cases for the functions.

A couple of thoughts.
First, know that in their current version, both Seq.windowed and Seq.groupBy use non-lazy collections in their implementation. windowed uses arrays and returns you arrays. groupBy builds up a Dictionary<'tkey, ResizeArray<'tvalue>>, but keeps that a secret and returns the group values back as a seq instead of ResizeArray.
Returning ResizeArrays from groupBy wouldn't fit with anything else, so that obviously needs to be hidden. The other alternative is to give back the ToArray() of the data. This would require another copy of the data to be created, which is a downside. And there isn't really much upside, since you don't know ahead of time how big your group will be anyways, so you aren't expecting to do random access or any of the other special stuff arrays enable. So simply wrapping in a seq seems like a good option.
For windowed, it's a different story. You want an array back in this case. Why? Because you already know how big that array is going to be, so you can safely do random access or, better yet, pattern matching. That's a big upside. The downside remains, though - the data needs to be re-copied into a newly-allocated array for every window.
seq{1 .. 100} |> Seq.windowed 3 |> Seq.map (fun [|x; _; y|] -> x + y)
There is still the open question - "but couldn't we avoid the array allocation/copy downside by internally only using true lazy seqs, and returning them as such? Isn't that more in the 'spirit of the seq'?" It would be kind of tricky (would need some fancy cloning of enumerators?), but sure, probably with some careful coding. There's a huge downside to this, though. You'd need to cache the entire unspooled seq in memory to make it work, which kind of negates the whole goal of doing things lazily. Unlike lists or arrays, enumerating a seq multiple times is not guaranteed to yield the same results (e.g. a seq that returns random numbers), so the backing data for these seq windows you are returning needs to be cached somewhere. When that window is eventually accessed, you can't just tap in and re-enumerate through the original source seq - you might get back different data, or the seq might end at a different place. This points to the other upside of using arrays in Seq.windowed - only windowSize elements need to be kept in memory at once.

This is of course pure guess. I think this is related to the way both functions are implemented.
As already mentioned, in Seq.groupBy the groups are of variable length and in Seq.windowed they are of a fixed size.
So in the implementation from Seq.windowed it makes more sense to use a fixed size array, as opposed to the Generic.List used in Seq.groupBy, which btw in F# is called ResizeArray.
Now to the outside world, an Array though mutable is widely used in F# code and libraries and F# provides syntactic support for creating, initializing and manipulating arrays, whereas the ResizeArray is not that widely used in F# code and the language provides no syntactic support apart from the type alias, so I think that's why they decided to expose it as a Seq.

Related

Why is there no Seq.partition in F#

In F# we have List.partition and Array.partition which return a tuple of lists and a tuple of arrays respectively.
so, why is there no Seq.partition returning a tuple of sequences?
here is a very simple implementation:
F# Snippets
so... why isn't this part of the core?
In F# 4.0 (Visual Studio 2015), the core libraries are a lot more uniform than before, but they still do not come with an implementation of Seq.partition. You can find more about this in the F# language design discussion: Regular functional operators producing two or more output collections.
The summary is that the Seq.partition function is quite tricky and a having it could introduce potential performance issues. There a couple of ways it can work:
It can iterate over the input collection twice (like the FsSnip version), which can cause issues if you have complex delayed computation (you're doing everything twice)
It can iterate over the input once, but then it would have to do some complex mutable state sharing (which could secretly allocate memory).
So, Seq.partition cannot be implemented reasonably while keeping all the good properties that you would expect about the seq<'T> type.
Seq.partition is just a specialized version of Seq.groupBy, so the standard library could implement the former as a wrapper around the latter without introducing any new issues.
let partition predicate source =
let map =
source
|> Seq.groupBy predicate
|> Map.ofSeq
let get flag =
map
|> Map.tryFind flag
|> Option.defaultValue Seq.empty
get true, get false

Why implement an immutable list as a linked-list?

According to F#'s list documentation:
"A list in F# is an ordered, immutable series of elements of the same type"
"Lists in F# are implemented as singly linked lists"
Why not implement it contiguously in memory since it immutable and thus has a fixed size? Why ever use an F# list instead of an F# array?
They serve different purposes, for instance:
You use an Array in F# to store big amounts of data that needs to be accessed randomly with relative low overhead.
A List in F# is useful when you need to accumulate something over iterations of a recursive function. Arrays don't play well here, since they have a fixed size.
With a list, you can prepend all elements of ListM (size M) to ListN (size N) in O(M) time. Similarly, you can prepend a single Element to any list in O(1) time.

Increasing the length of a tuple in Erlang

How can I increase the length of a tuple in Erlang? For example, suppose Tup={1,2,3}, and now I want to add another element to it. Is there any way to do this?
Tuple is not suppose to be a flexible data structure. If you are resizing it often, then you should consider using other erlang data structures like lists, maps or sets - depends on your expectation. Here is nice introduction to key-value stores.
But if you really have to extend that tuple, then you can use erlang:append_element/2:
{1,2,3,4} = erlang:append_element({1,2,3}, 4).
Tuples aren't mutable so you can't, strictly speaking, increase the length.
Generally, if you want a variable-number-of-things datatype, a tuple will be very inconvenient. For example, iterating over all elements of a list is highly idiomatic, while iterating over all elements of a tuple whose size is unknown at compile-time is a pain.
However, a common pattern is to get a tuple as a result from some function and return elements of that tuple plus additions.
country_coords(Name) ->
{Lat, Lng} = find_address(Name),
{_Street, _City, _Zip, Country} = geocode(Lat, Lng),
{ok, Lat, Lng, Country}.
erlang:append_element(tuple_whose_length_to_increase, element_to_be).This is the inbuilt function but tuples,lists are not meant to be flexible.So avoid using this function unless there is no other way

Why don't F# lists have a tail pointer

Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?). What advantages does dropping the tail pointer have?
F#, like many other functional[-ish] languages, has a cons-list (the terminology originally comes from LISP, but the concept is the same). In F# the :: operator (or List.Cons) is used for cons'ing: note the signature is ‘a –> ‘a list –> ‘a list (see Mastering F# Lists).
Do not confuse a cons-list with an opaque Linked List implementation which contains a discrete first[/last] node - every cell in a cons-list is the start of a [different] list! That is, a "list" is simply the chain of cells that starts at a given cons-cell.
This offers some advantages when used in a functional-like manner: one is that all the "tail" cells are shared and since each cons-cell is immutable (the "data" might be mutable, but that's a different issue) there is no way to make a change to a "tail" cell and flub up all the other lists which contain said cell.
Because of this property, [new] lists can be efficiently built - that is, they do not require a copy - simply by cons'ing to the front. In addition, it is also very efficient to deconstruct a list to head :: tail - once again, no copy - which is often very useful in recursive functions.
This immutable property generally does not exist in a [standard mutable] Double Linked List implementation in that appending would add side-effects: the internal 'last' node (the type is now opaque) and one of the "tail" cells are changed. (There are immutable sequence types that allow an "effectively constant time" append/update such as immutable.Vector in Scala -- however, these are heavy-weight objects compared to a cons-list that is nothing more than a series of cells cons'ed together.)
As mentioned, there are also disadvantages a cons-list is not appropriate for all tasks - in particular, creating a new list except by cons'ing to the head is an O(n) operation, fsvo n, and for better (or worse) the list is immutable.
I would recommend creating your own version of concat to see how this operation is really done. (The article Why I love F#: Lists - The Basics covers this.)
Happy coding.
Also see related post: Why can you only prepend to lists in functional languages?
F# lists are immutable, there's no such thing as "append/concat", rather there's just creating new lists (that may reuse some suffixes of old lists). Immutability has many advantages, outside the scope of this question. (All pure languages, and most functional languages have this data structure, it is not an F#-ism.)
See also
http://diditwith.net/2008/03/03/WhyILoveFListsTheBasics.aspx
which has good picture diagrams to explain things (e.g. why consing on the front is cheaper than at the end for an immutable list).
In addition to what the others said: if you need efficient, but yet immutable data structures (which should be an idiomatic F# way), you have to consider reading Chris Okasaki, Purely Functional Data Structures. There is also a thesis available (on which the book is based).
In addition to what has been already said, the Introducing Functional Programming section on MSDN has an article about Working with Functional Lists that explains how lists work and also implements them in C#, so it may be a good way to understand how they work (and why adding reference to the last element would not allow efficient implementation of append).
If you need to append things to the end of the list, as well as to the front, then you need a different data structure. For example, Norman Ramsey posted source code for DList which has these properties here (The implementation is not idiomatic F#, but it should be easy to fix).
If you find you want a list with better performance for append operations, have a look at the QueueList in the F# PowerPack and the JoinList in the FSharpx extension libraries.
QueueList encapsulates two lists. When you prepend using the cons, it prepends an element to the first list, just as a cons-list. However, if you want to append a single element, it can be pushed to the top of the second list. When the first list runs out of elements, List.rev is run on the second list, and the two are swapped putting your list back in order and freeing the second list to append new elements.
JoinList uses a discriminated union to more efficiently append whole lists and is a bit more involved.
Both are obviously less performant for standard cons-list operations but offer better performance for other scenarios.
You can read more about these structures in the article Refactoring Pattern Matching.
As others have pointed out, an F# list could be represented by a data structure:
List<T> { T Value; List<T> Tail; }
From here, the convention is that a list goes from the List you have a reference to until Tail is null. Based on that definition, the benefits/features/limitations in the other answers come naturally.
However, your original question seems to be why the list is not defined more like:
List<T> { Node<T> Head; Node<T> Tail; }
Node<T> { T Value; Node<T> Next; }
Such a structure would allow both appending and prepending to the list without any visible effects to the a reference to the original list, since it still only sees a "window" of the now expanded list. Although this would (sometimes) allow O(1) concatenation, there are several issues such a feature would face:
The concatenation only works once. This can lead to unexpected performance behavior where one concatenation is O(1), but the next is O(n). Say for example:
listA = makeList1 ()
listB = makeList2 ()
listC = makeList3 ()
listD = listA + listB //modified Node at tail of A for O(1)
listE = listA + listC //must now make copy of A to concat with C
You could argue that the time savings for the cases where possible are worth it, but the surprise of not knowing when it will be O(1) and when O(n) are strong arguments against the feature.
All lists now take up twice as much space, even if you never plan to concatenate them.
You now have a separate List and Node type. In the current implementation, I believe F# only uses a single type like the beginning of my answer. There may be a way to do what you are suggesting with only one type, but it is not obvious to me.
The concatenation requires mutating the original "tail" node instance. While this shouldn't affect programs, it is a point of mutation, which most functional languages tend to avoid.
Or phrased another way, what kind of benefits do you get from having a basic, singly linked list with only a head pointer? The benefits of a tail pointer that I can see are:
O(1) list concatenation
O(1) Appending stuff to the right side of the list
Both of which are rather convenient things to have, as opposed to O(n) list concatenation (where n is the length of the left-side list?).
If by "tail pointer" you mean a pointer from every list to the last element in the list, that alone cannot be used to provide either of the benefits you cite. Although you could then get at the last element in the list quickly, you cannot do anything with it because it is immutable.
You could write a mutable doubly-linked list as you say but the mutability would make programs using it significantly harder to reason about because every function you call with one might change it.
As Brian said, there are purely functional catenable lists. However, they are many times slower at common operations than the simple singly-linked list that F# uses.
What advantages does dropping the tail pointer have?
30% less space usage and better performance on virtually all list operations.

Comparing arrays elements in Erlang

I'm trying to learn how to think in a functional programming way, for this, I'm trying to learn Erlang and solving easy problems from codingbat. I came with the common problem of comparing elements inside a list. For example, compare a value of the i-th position element with the value of the i+1-th position of the list. So, I have been thinking and searching how to do this in a functional way in Erlang (or any functional language).
Please, be gentle with me, I'm very newb in this functional world, but I want to learn
Thanks in advance
Define a list:
L = [1,2,3,4,4,5,6]
Define a function f, which takes a list
If it matches a list of one element or an empty list, return the empty list
If it matches the first element and the second element then take the first element and construct a new list by calling the rest of the list recursivly
Otherwise skip the first element of the list.
In Erlang code
f ([]) -> [];
f ([_]) -> [];
f ([X, X|Rest]) -> [X | f(Rest)];
f ([_|Rest]) -> f(Rest).
Apply function
f(L)
This should work... haven't compiled and run it but it should get you started. Also in case you need to do modifications to it to behave differently.
Welcome to Erlang ;)
I try to be gentle ;-) So main thing in functional approach is thinking in terms: What is input? What should be output? There is nothing like comparing the i-th element with the i+1-th element alone. There have to be always purpose of it which will lead to data transformation. Even Mazen Harake's example doing it. In this example there is function which returns only elements which are followed by same value i.e. filters given list. Typically there are very different ways how do similar thing which depends of purpose of it. List is basic functional structure and you can do amazing things with it as Lisp shows us but you have to remember it is not array.
Each time you need access i-th element repeatable it indicates you are using wrong data structure. You can build up different data structures form lists and tuples in Erlang which can serve your purposes better. So when you face problem to compare i-th with i+1-th element you should stop and think. What is purpose of it? Do you need perform some stream data transformation as Mazen Harake does or You need random access? If second you should use different data structure (array for example). Even then you should think about your task characteristics. If you will be mostly read and almost never write then you can use list_to_tuple(L) and then read using element/2. When you need write occasionally you will start thinking about partition it to several tuples and as your write ratio will grow you will end up with array implementation.
So you can use lists:nth/2 if you will do it only once or several times but on short list and you are not performance freak as I'm. You can improve it using [X1,X2|_] = lists:nthtail(I-1, L) (L = lists:nthtail(0,L) works as expected). If you are facing bigger lists and you want call it many times you have to rethink your approach.
P.S.: There are many other fascinating data structures except lists and trees. Zippers for example.

Resources