Why does F# Set need IComparable? - f#

So I am trying to use the F# Set as a hash table. But my element type doesn't implement the IComparable interface (although it implements IEquatable). I got an error saying the construction is not allowed because of comparison constraint. And through some further read, I discovered that F# Set is implemented using binary tree, which makes insertion causes O(log(n)). This looks weird to me, why is the Set structure designed this way?
Edit: So I learned that Set in F# is actually a SortedSet. And I guess the question becomes, why is Sorted Set somehow more preferable than a general Hash Set as an immutable/functional data structure?

There are two important points that should help you understand how sets in F# (and in functional languages in general) work and how they are used:
Implementing immutable hashtables (like .NET HashSet) is hard - when you remove or add elements, you want to avoid copying everything in the data structure and (as far as I know) there is no general way of doing that (you would end up copying too much, so it would be inefficient).
For this reason, most functional sets are implemented as (some form of trees). Those require comparison to build a sorted tree. The nice property of balanced trees is that removing and adding elements does not have to copy everything in the tree, so even the worst case scenario is reasonably efficient (though mutable hashtable is still faster).
Now, F# is functional-first, which means that immutable structures are preferred, but it is perfectly fine to use mutable data structures (especially if you limit the usage to some well defined and restricted scope). For this reason, F# programmers often use Dictionary or HashSet, especially when this is only within the scope of a single function.

Related

There seem to be a lot of ruby methods that are very similar, how do I pick which one to use?

I'm relatively new to Ruby, so this is a pretty general question. I have found through the Ruby Docs page a lot of methods that seem to do the exact same thing or very similar. For example chars vs split(' ') and each vs map vs collect. Sometimes there are small differences and other times I see no difference at all.
My question here is how do I know which is best practice, or is it just personal preference? I'm sure this varies from instance to instance, so if I can learn some of the more important ones to be cognizant of I would really appreciate that because I would like to develop good habits early.
I am a bit confused by your specific examples:
map and collect are aliases. They don't "do the exact same thing", they are the exact same thing. They are just two names for the same method. You can use whatever name you wish, or what reads best in context, or what your team has decided as a Coding Standard. The Community seems to have settled on map.
each and map/collect are completely different, there is no similarity there, apart from the general fact that they both operate on collections. map transform a collection by mapping every element to a new element using a transformation operation. It returns a new collection (an Array, actually) with the transformed elements. each performs a side-effect for every element of the collection. Since it is only used for its side-effect, the return value is irrelevant (it might just as well return nil like Kernel#puts does, in languages like C, C++, Java, C♯, it would return void), but it is specified to always return its receiver.
split splits a String into an Array of Strings based on a delimiter that can be either a Regexp (in which case you can also influence whether or not the delimiter itself gets captured in the output or ignored) or a String, or nil (in which case the global default separator gets used). chars returns an Array with the individual characters (represented as Strings of length 1, since Ruby doesn't have an specific Character type). chars belongs together in a family with bytes and codepoints which do the same thing for bytes and codepoints, respectively. split can only be used as a replacement for one of the methods in this family (chars) and split is much more general than that.
So, in the examples you gave, there really isn't much similarity at all, and I cannot imagine any situation where it would be unclear which one to choose.
In general, you have a problem and you look for the method (or combination of methods) that solve it. You don't look at a bunch of methods and look for the problem they solve.
There'll typically be only one method that fits a specific problem. Larger problems can be broken down into different subproblems in different ways, so it is indeed possible that you may end up with different combinations of methods to solve the same larger problem, but for each individual subproblem, there will generally be only one applicable method.
When documentation states that 2 methods do the same, it's just matter of preference. To learn the details, you should always start with Ruby API documentation

Does the coder we select significantly affect performance?

I'm having trouble understanding the purpose of "coders". My understanding is that we choose coders in order to "teach" dataflow how a particular object should be encoded in byte format and how equality and hash code should be evaluated.
By default, and perhaps by mistake, I tend to put the words " implement serializable" on almost all my custom classes. This has the advantage the dataflow tends not to complain. However, because some of these classes are huge objects, I'm wondering if the performance suffers, and instead I should implement a custom coder in which I specify exactly which one or two fields can be used to determine equality and hash code etc. Does this make sense? Put another way, does creating a custom coder (which may only use one or two small primitive fields) instead of the default serial coder improve performance for very large classes?
Java serialization is very slow compared to other forms of encoding, and can definitely cause performance problems. However, only serializing part of your object means that the rest of the object will be dropped when it is sent between processes.
Much better that using Serializable, and pretty much just as easy, you can use AvroCoder by annotation your classes with
#DefaultCoder(AvroCoder.class)
This will automatically deduce an Avro schema from your class. Note that this does not work for generic types, so you'll likely want to use a custom coder in that case.

Why are mutables allowed in F#? When are they essential?

Coming from C#, trying to get my head around the language.
From what I understand one of the main benefits of F# is that you ditch the concept of state, which should (in many cases) make things much more robust.
If this is the case (and correct me if it's not), why allow us to break this principle with mutables? To me it feels like it they don't belong in the language. I understand you don't have to use them, but it gives you the tools to go off track and think in an OOP manner.
Can anyone provide an example of where a mutable value is essential?
Current compilers for declarative (stateless) code are not very smart. This results in lots of memory allocations and copy operations, which are rather expensive. Mutating some property of an object allows to re-use the object in its new state, which is much faster.
Imagine you make a game with 10000 units moving around at 60 ticks a second. You can do this in F#, including collisions with a mutable quad- or octree, on a single CPU core.
Now imagine the units and quadtree are immutable. The compiler would have no better idea than to allocate and construct 600000 units per second and create 60 new trees per second. And this excludes any changes in other management structures. In a real-world use case with complex units, this kind of solution will be too slow.
F# is a multi-paradigm language that enables the programmer to write functional, object-oriented, and, to an extent, imperative programs. Currently, each variant has its valid uses. Maybe, at some point in the future, better compilers will allow for better optimization of declarative programs, but right now, we have to fall back to imperative programming when performance becomes an issue.
Having the ability to use mutable state is often important for performance reasons, among other things.
Consider implementing the API List.take: count : int -> list : 'a list -> 'a list which returns a list consisting of only the first count elements from the input list.
If you are bound by immutability, Lists can only be built up back-to-front. Implementing take then boils down to
Build up result list back-to-front with first count guys from input: O(count)
Reverse that result and return O(count)
The F# runtime, for performance reasons, has the magic special ability to build Lists front-to-back when needed (i.e. to mutate the tail of the last guy to point to a new tail element). The basic algorithm used for List.take is:
Build up result list front-to-back with first count guys from input: O(count)
Return the result
Same asymptotic performance, but in practical terms it's twice as fast to use mutation in this case.
Pervasive mutable state can be a nightmare as code is difficult to reason about. But if you factor your code so that mutable state is tightly encapsulated (e.g. in implementation details of List.take), then you can enjoy its benefits where it makes sense. So making immutability the default, but still allowing mutability, is a very practical and useful feature of the language.
First of all, what makes F# powerful is, in my opinion, not just the immutability by default, but a whole mix of features like: immutability by default, type inference, lightweight syntax, sum (DUs) and product types (tuples), pattern matching and currying by default. Possibly more.
These make F# very functional by default and they make you program in a certain way. In particular they make you feel uncomfortable when you use mutable state, as it requires the mutable keyword. Uncomfortable in this sense means more careful. And that is exactly what you should be.
Mutable state is not forbidden or evil per se, but it should be controlled. The need to explicitly use mutable is like a warning sign making you aware of danger. And good ways how to control it, is using it internally within a function. That way you can have your own internal mutable state and still be perfectly thread-safe because you don't have shared mutable state. In fact, your function can still be referentially transparent even if it uses mutable state internally.
As for why F# allows mutable state; it would be very difficult to write usual real-world code without the possibility for it. For instance in Haskell, something like a random number cannot be done in the same way as it can be done in F#, but rather needs threading through the state explicitly.
When I write applications, I tend to have about 95% of the code base in a very functional style that would be pretty much 1:1 portable to say Haskell without any trouble. But then at the system boundaries or at some performance-critical inner loop mutable state is used. That way you get the best of both worlds.

F#: Difference between Dictionary, Hashtable and Map

I am new to .NET programming. Sorry if this question has been asked before.
I am currently learning F#. What are the differences between Dictionary, Hashtable and Map? When should I use each?
I also have another question that is not mentioned in the title. When should I use Async.RunSynchronously? It seems rather self-contradictory to me, so I am sure that I am missing something.
The choice between Dictionary, Hashtable and Map depends on the uses cases. You should however know the characteristics of each. This is not an exhaustive list but just some key differences you might want to start from :
Hashtable Represents a collection of key/value pairs that are organized based on the hash code of the key. This is a mutable collection from .NET BCL
Dictionary<> this is a generic implementation of the hashtable. Also a mutable collection from .NET BCL
Map This is the F# immutable type. It is implemented based on AVL trees which is an entirely different data structure with different performance characteristics and use cases.
If you are doing many writes, Hash tables collections have significantly better fill rate performance than AVL trees.
Retrieving a value from a Dictionary by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
F# maps are implemented as immutable AVL trees, an efficient data structure which forms a self-balancing binary tree. AVL trees are well-known for their efficiency, in which they can search, insert, and delete elements in the tree in O(log n) time, where n is the number of elements in the tree.
As for the map uses case, if you’ve got a set of sta­tic data (such as con­fig­u­ra­tion data that’s loaded when your appli­ca­tion starts up) you need to look up by key fre­quently, a Map is as good a choice as any, its immutabil­ity in this case ensures that the sta­tic data can­not be mod­i­fied by mis­take and has lit­tle impact to per­for­mance as you never need to mutate it once initialized.
Async.RunSynchronously runs the provided asynchronous computation and awaits its result. You can use it in F# interactive window for example to test your asynchronous workflows.

Is there an idiomatic way to order function arguments in Erlang?

Seems like it's inconsistent in the lists module. For example, split has the number as the first argument and the list as the second, but sublists has the list as the first argument and the len as the second argument.
OK, a little history as I remember it and some principles behind my style.
As Christian has said the libraries evolved and tended to get the argument order and feel from the impulses we were getting just then. So for example the reason why element/setelement have the argument order they do is because it matches the arg/3 predicate in Prolog; logical then but not now. Often we would have the thing being worked on first, but unfortunately not always. This is often a good choice as it allows "optional" arguments to be conveniently added to the end; for example string:substr/2/3. Functions with the thing as the last argument were often influenced by functional languages with currying, for example Haskell, where it is very easy to use currying and partial evaluation to build specific functions which can then be applied to the thing. This is very noticeable in the higher order functions in lists.
The only influence we didn't have was from the OO world. :-)
Usually we at least managed to be consistent within a module, but not always. See lists again. We did try to have some consistency, so the argument order in the higher order functions in dict/sets match those of the corresponding functions in lists.
The problem was also aggravated by the fact that we, especially me, had a rather cavalier attitude to libraries. I just did not see them as a selling point for the language, so I wasn't that worried about it. "If you want a library which does something then you just write it" was my motto. This meant that my libraries were structured, just not always with the same structure. :-) That was how many of the initial libraries came about.
This, of course, creates unnecessary confusion and breaks the law of least astonishment, but we have not been able to do anything about it. Any suggestions of revising the modules have always been met with a resounding "no".
My own personal style is a usually structured, though I don't know if it conforms to any written guidelines or standards.
I generally have the thing or things I am working on as the first arguments, or at least very close to the beginning; the order depends on what feels best. If there is a global state which is chained through the whole module, which there usually is, it is placed as the last argument and given a very descriptive name like St0, St1, ... (I belong to the church of short variable names). Arguments which are chained through functions (both input and output) I try to keep the same argument order as return order. This makes it much easier to see the structure of the code. Apart from that I try to group together arguments which belong together. Also, where possible, I try to preserve the same argument order throughout a whole module.
None of this is very revolutionary, but I find if you keep a consistent style then it is one less thing to worry about and it makes your code feel better and generally be more readable. Also I will actually rewrite code if the argument order feels wrong.
A small example which may help:
fubar({f,A0,B0}, Arg2, Ch0, Arg4, St0) ->
{A1,Ch1,St1} = foo(A0, Arg2, Ch0, St0),
{B1,Ch2,St2} = bar(B0, Arg4, Ch1, St1),
Res = baz(A1, B1),
{Res,Ch2,St2}.
Here Ch is a local chained through variable while St is a more global state. Check out the code on github for LFE, especially the compiler, if you want a longer example.
This became much longer than it should have been, sorry.
P.S. I used the word thing instead of object to avoid confusion about what I was talking.
No, there is no consistently-used idiom in the sense that you mean.
However, there are some useful relevant hints that apply especially when you're going to be making deeply recursive calls. For instance, keeping whichever arguments will remain unchanged during tail calls in the same order/position in the argument list allows the virtual machine to make some very nice optimizations.

Resources