Is it possible to override Values in Collections.Seq module? - f#

I have implemented IEnumerable for a collection I built, and (although I have not tested them all) the Seq values appear to work correctly. Is it possible to override some Seq values, for instance "last", when the native performance of a value of my collection is better than using Seq's IEnumerable based function? I have not found any information on overriding Seq.

No -- the functions in the Seq module can't be overridden. However, some of them do try to optimize performance by checking their input value (the seq<'T> instance you pass them) to see if it's an instance of IList<'T> or 'T[]; if it is, the functions will take some optimized code path. For example, if you pass an array ('T[]) to Seq.length, it'll be able to quickly determine the length by using the .Length property of arrays.
If you're stuck on using the Seq module, the only performance optimization I can think of would be to have your collection also implement ICollection<'T> and/or IList<'T>. That may optimize some cases, but it won't be all cases.

As already said in the other answer, there is no way you can override the functions in the Seq module. If you're implementing a custom collection, then the best thing to do is to follow the standard pattern used by the core F# libraries.
The Seq module contains the most often used functions and functions that can be reasonably provided for any sequence.
Modules like Array or List provide more efficient implementations for a specific collection type and they add more functions (not available in Seq) that are specific to the collection (for example, functions List.tail and Array.get).
The best way when adding your own collection is to follow this pattern:
Implement IEnumerable<'T> so that the functions from Seq module work for your type
Create MyCollection module that contains efficient implementations of standard functions (at least those that matter to you) and adds more functionality that is specific to your collection.

Related

Why does F# Set need IComparable?

So I am trying to use the F# Set as a hash table. But my element type doesn't implement the IComparable interface (although it implements IEquatable). I got an error saying the construction is not allowed because of comparison constraint. And through some further read, I discovered that F# Set is implemented using binary tree, which makes insertion causes O(log(n)). This looks weird to me, why is the Set structure designed this way?
Edit: So I learned that Set in F# is actually a SortedSet. And I guess the question becomes, why is Sorted Set somehow more preferable than a general Hash Set as an immutable/functional data structure?
There are two important points that should help you understand how sets in F# (and in functional languages in general) work and how they are used:
Implementing immutable hashtables (like .NET HashSet) is hard - when you remove or add elements, you want to avoid copying everything in the data structure and (as far as I know) there is no general way of doing that (you would end up copying too much, so it would be inefficient).
For this reason, most functional sets are implemented as (some form of trees). Those require comparison to build a sorted tree. The nice property of balanced trees is that removing and adding elements does not have to copy everything in the tree, so even the worst case scenario is reasonably efficient (though mutable hashtable is still faster).
Now, F# is functional-first, which means that immutable structures are preferred, but it is perfectly fine to use mutable data structures (especially if you limit the usage to some well defined and restricted scope). For this reason, F# programmers often use Dictionary or HashSet, especially when this is only within the scope of a single function.

Should I use Nullable<'a> or Option<'a> in F#?

Which way is more idiomatic to use Nullable<'a> or to use Option<'a> for representing a nullable int?
Option is far more idiomatic in F# code.
It has far nicer syntax when used in match and has large amounts of support from the standard library.
However, if you plan to access the code from C# or some other language you should probably expose the interface with Nullable which is easier to use in C#.
As John said, Option<T> is definitely more idiomatic type in F#. I would certainly use options as my default choice - the Option module provides many useful functions, pattern matching works nicely on options and F# libraries are generally designed to work with options.
That said, there are two cases when you might want to use nullable:
When creating arrays of optional values - Nullable<T> is a value type (sort of) and if you create an array Nullable<T>[] then it is allocated as continuous memory block. On the other hand options are reference types and option<T>[] will be an array of references to heap-allocated objects.
When you need to write some calculations and propagate missing values - in F# 3.0, there is a module Microsoft.FSharp.Linq.NullableOperators which implements various operators for dealing with nullable values (see MSDN documentation) which lets you write e.g.:
let one = Nullable(1)
let two = Nullable(2)
// Add constant to nullable, then compare value of two nullables
(one ?+ 2) ?>=? two

IEnumerable<T> in OCaml

I use F# a lot. All the basic collections in F# implement IEumberable interface, thus it is quite natural to access them using the single Seq module in F#. Is this possible in OCaml?
The other question is that 'a seq in F# is lazy, e.g. I can create a sequence from 1 to 100 using {1..100} or more verbosely:
seq { for i=1 to 100 do yield i }
In OCaml, I find myself using the following two methods to work around with this feature:
generate a list:
let rec range a b =
if a > b then []
else a :: range (a+1) b;;
or resort to explicit recursive functions.
The first generates extra lists. The second breaks the abstraction as I need to operate on the sequence level using higher order functions such as map and fold.
I know that the OCaml library has Stream module. But the functionality of it seems to be quite limited, not as general as 'a seq in F#.
BTW, I am playing Project Euler problems using OCaml recently. So there are quite a few sequences operations, that in an imperative language would be loops with a complex body.
This Ocaml library seems to offer what you are asking. I've not used it though.
http://batteries.forge.ocamlcore.org/
Checkout this module, Enum
http://batteries.forge.ocamlcore.org/doc.preview:batteries-beta1/html/api/Enum.html
I somehow feel Enum is a much better name than Seq. It eliminates the lowercase/uppercase confusion on Seqs.
An enumerator, seen from a functional programming angle, is exactly a fold function. Where a class would implement an Enumerable interface in an object-oriented data structures library, a type comes with a fold function in a functional data structure library.
Stream is a slightly quirky imperative lazy list (imperative in that reading an element is destructive). CamlP5 comes with a functional lazy list library, Fstream. This already cited thread offers some alternatives.
It seems you are looking for something like Lazy lists.
Check out this SO question
The Batteries library mentioned also provides the (--) operator:
# 1 -- 100;;
- : int BatEnum.t = <abstr>
The enum is not evaluated until you traverse the items, so it provides a feature similar to your second request. Just beware that Batteries' enums are mutable. Ideally, there would also be a lazy list implementation that all data structures could be converted to/from, but that work has not been done yet.

Is there a reason workflow builders in F# don't use interfaces?

This is a question just out of curiosity: when you implement a workflow factory, you don't do it as an interface implementation, but rather just make sure the function signatures of the monad functions match. Is there a design reason for this?
For one thing, the lack of higher-kinded types in .NET means that you can't give the methods useful signatures. For instance, ListBuilder.Return should have type 't -> 't list, while OptionBuilder.Return should have type 't -> 't option. There's no way to create an interface with a Return method that has a signature supporting both of these methods.
I think that the lack of higher-kinded types as mentioned by kvb is probably the main reason. There are ways to workaround that, but that makes the code a bit obscure (see this snippet).
Another reason is that F# computation expressions allow you to define different combinations of methods. It's not always just Bind and Return. For example:
Some define Yield, YieldFrom, Combine to allow generating results
Some define Return, ReturnFrom, Bind to define a monad
Some define Return, ReturnFrom, Bind, Combine to define a monad that can return multiple things
Some also define Delay or Delay and Run to handle laziness
... so computation expressions would need to be defined as quite a few different interfaces. I think the current design leaves some nice flexibility in what features of computations you can support.

Time series modeling in f#-- seq vs array vs vector vs list vs generic list

If I want to make a time series type in F# to hold stock prices, which basic type should I use? We need
Select a subset based on time index,
Calculate basic statistics for a subset like mean, STD or for several subsets like correlations,
Append item for new data and fast update statistics or technical indicators,
Do linear regression between time series, etc
I have read that array has a better performance, seq has a smaller memory footnote, list is better for adding items and F# vector is easier for certain math calculation. To balance all the trade offs, how would you model a stock price time series in f#? Thanks.
As a concrete representation you can choose either array or list or some other .NET colllection type. A sequence seq<'T> is an abstract type and both array and list are automatically also sequences - this means that when you write some code that works with sequences, it will work with any concrete data type (array, list or any other .NET collection).
So, when writing data processing, you can use Seq by default (as it gives you great flexibility - it doesn't matter what concrete representation you use) and then optimize some operations to use the concrete representation (whatever that will be) if you need something to run faster.
Regarding the concrete representation - I think the crucial question is whether you want to add elements without changing original data structure (immutable list or array used in an immutable way) or whether you want to mutate the data structure (e.g. use some mutable .NET collection).
If you need to add new items freuqently then you can either use immutable list (which supports appending elements to front) or a mutable collection (array won't do as it cannot be resized).
If you're working on a more sophisticated system, I would recommend taking a look at ObservableCollection<T> (see MSDN). This is a collection that automatically notifies you when it is changed. In response to the notification, you could update your statistics (it also tells you which elements were added, so you don't need to recalculate everything). However, F# doesn't have any libraries for working with this type, so you'll need to write a lot of things yourself.
If you're adding data only rarely or adding them in larger groups, you could use array (and allocate new array each time you add items). If you have only relatively small number of items in the collection, you could use lists (where adding item is easy).
For numerical calculations, the F# PowerPack (and types like vector) offer only quite limitied set of features, so you may need to look at some thrid party libraries. Extreme optimizations is a commercial library with some F# examples and Math.NET is an open source alternative.
Otherwise, it is difficult to give any concrete advice - can you add some more details about your system? (e.g. how large the data set is, how many items need to be added how often etc...)

Resources