How to sortby a variable ascending in F#? - f#

How would I sort a list by a variable but ascending?
Something like:
Data |> List.sortBy(fun t -> t.Date,ascending(t.Value))
The above is an example, I know that this will not work if run.

Based on your example, it looks like you want to use multiple sorting keys and some of them should be in the ascending order while others in descending order. I think this is a scenario that has not been answered by any of the other questions.
In general, you can use multiple sorting keys in F# by using tuples. F# has functions List.sortBy and List.sortByDescending which give you the two possible orders:
data |> Seq.sortByDescending (fun x -> x.FirstKey, x.SecondKey)
However, this way the sort order for both of the keys will be the same. There is no easy way to use one key in one order and another key in another order. In many cases, you could just use numerical minus and do something like:
data |> Seq.sortByDescending (fun x -> x.FirstKey, -x.SecondKey)
This is not entirely bullet-proof because of MaxInt values, but it will probably often work. In F# query expressions (which are inspired by how LINQ works), you can use multiple sorting keys using sortBy and thenBy (or sortByDescending and thenByDescending):
query {
for x in data do
sortByDescending x.FirstKey
thenBy x.SecondKey }
Here, the first key will be used for descending sort and, when there are multiple items with the same FirstKey, the second key will be used for ascending sort within that group. I suspect this is probably what you need in the general case - but it's a bit unfortunate there is no nice way of writing this with the pipeline syntax.

You can easily sort by multiple keys, ascending, descending, or any other complex order, with List.sortWith and the power of function composition:
All you need is a couple of helper functions and an operator:
let asc f a b = compare (f a) (f b)
let desc f a b = compare (f b) (f a)
let (&>) c1 c2 a b = match c1 a b with 0 -> c2 a b | r -> r
Both asc and desc receive a key-retrieve-function of type 'T->'K and call the generic function compare to sort in ascending or descending order.
The operator &> lets you compose them to sort by as many keys as you like. And since you can also add your custom comparers, any kind of sorting is possible with this technique:
let ls = [ "dd"; "a"; "b"; "c"; "aa"; "bb"; "cc"]
ls |> List.sortWith(desc Seq.length &>
asc id)
// result = ["aa"; "bb"; "cc"; "dd"; "a"; "b"; "c"]
ls |> List.sortWith( asc Seq.length &>
desc id)
// result = ["c"; "b"; "a"; "dd"; "cc"; "bb"; "aa"]
Your example would look like this:
Data |> List.sortWith( desc (fun t -> t.Date) &>
asc (fun t -> t.Value))

Related

A more functional way to create tuples from two arrays

I've created a function that gets all integers from 1 to n and then combines with the same sequence to create a sequence of tuples of all combinations. So passing it the integer 2 would give you [(1,1);(1,2);(2,1);(2,2)]:
let allTuplesUntil x =
let primary = seq { 1 .. x }
let secondary = seq { 1 .. x }
[for x in primary do
for y in secondary do
yield (x,y)]
This implementations works, but it uses an inner and outer for loop, similar to what I would do in c#.
Could this be achieved in a more idiomatic functional way? Would a more functional way typically be more desirable or is this acceptable in a functional language because of its brevity and clarity?
I'm relatively new to f# and looking for some feedback.
These loops are part of what's called computation expression, which is quite idiomatic to F#. It's just made to look like familiar loops. I can't see any problem with your code being written in this way. If what you want is to get rid of the loops, you could hide them in functions:
let cartesianProduct xs ys =
xs |> Seq.collect (fun x -> ys |> Seq.map (fun y -> x, y))
cartesianProduct [1;2;3] ['a';'b';'c']
val it : seq<int * char> = seq [(1, 'a'); (1, 'b'); (1, 'c'); (2, 'a'); ...]
First, just because there is a for doesn't mean its not functional. In this example you go over each element and yield a new element that will turn into a new element of a new immutable list. Such feature is also named "List Comprehension" and part of languages like Haskell. Imperative would be to loop over a list and mutate the list.
Second, remember that other functions like map, fold, filter also just loop over each element, like a for expression. They are just less powerful than a for loop.
Third, even if it would be "not 100% functional". Who cares? Code should be easily readable and understandable. The intention of two for loops is easy to understand.
Fourth, the equivalent function of the for expression is usually the bind or in this case the Seq.collect function. You also could write, this code.
[for x in primary do
for y in secondary do
yield (x,y)]
Like this:
primary |> Seq.collect (fun x ->
secondary |> Seq.collect (fun y ->
[x,y]
))
I prefer the for loops for readability!

What are the essential functions to find duplicate elements within a list?

What are the essential functions to find duplicate elements within a list?
Translated, how can I simplify the following function:
let numbers = [ 3;5;5;8;9;9;9 ]
let getDuplicates = numbers |> List.groupBy id
|> List.map snd
|> List.filter (fun set -> set.Length > 1)
|> List.map (fun set -> set.[0])
I'm sure this is a duplicate. However, I am unable to locate the question on this site.
UPDATE
let getDuplicates numbers =
numbers |> List.groupBy id
|> List.choose (fun (k,v) -> match v.Length with
| x when x > 1 -> Some k
| _ -> None)
Simplifying your function:
Whenever you have a filter followed by a map, you can probably replace the pair with a choose. The purpose of choose is to run a function for each value in the list, and return only the items which return Some value (None values are removed, which is the filter portion). Whatever value you put inside Some is the map portion:
let getDuplicates = numbers |> List.groupBy id
|> List.map snd
|> List.choose( fun( set ) ->
if set.Length > 1
then Some( set.[0] )
else None )
We can take it one additional step by removing the map. In this case, keeping the tuple which contains the key is helpful, because it eliminates the need to get the first item of the list:
let getDuplicates = numbers |> List.groupBy id
|> List.choose( fun( key, set ) ->
if set.Length > 1
then Some key
else None )
Is this simpler than the original? Perhaps. Because choose combines two purposes, it is by necessity more complex than those purposes kept separate (the filter and the map), and this makes it harder to understand at a glance, perhaps undoing the more "simplified" code. More on this later.
Decomposing the concept
Simplifying the code wasn't the direct question, though. You asked about functions useful in finding duplicates. At a high level, how do you find a duplicate? It depends on your algorithm and specific needs:
Your given algorithm uses the "put items in buckets based on their value", and "look for buckets with more than one item". This is a direct match to List.groupBy and List.choose (or filter/map)
A different algorithm could be to "iterate through all items", "modify an accumulator as we see each", then "report all items which have been seen multiple times". This is kind of like the first algorithm, where something like List.fold is replacing List.groupBy, but if you need to drag some other kind of state along, it may be helpful.
Perhaps you need to know how many times there are duplicates. A different algorithm satisfying these requirements may be "sort the items so they are always ascending", and "flag if the next item is the same as the current item". In this case, you have a List.sort followed by a List.toSeq then Seq.windowed:
let getDuplicates = numbers |> List.sort
|> List.toSeq
|> Seq.windowed 2
|> Seq.choose( function
| [|x; y|] when x = y -> Some x
| _ -> None )
Note that this returns a sequence with [5; 9; 9], informing you that 9 is duplicated twice.
These were algorithms mostly based on List functions. There are already two answers, one mutable, the other not, which are based on sets and existence.
My point is, a complete list of functions helpful to finding duplicates would read like a who's who list of existing collection functions -- it all depends on what you're trying to do and your specific requirements. I think your choice of List.groupBy and List.choose is probably about as simple as it gets.
Simplifying for maintainability
The last thought on simplification is to remember that simplifying code will improve the readability of your code to a certain extent. "Simplifying" beyond that point will most likely involve tricks, or obscure intent. If I were to look back at a sample of code I wrote even several weeks and a couple of projects ago, the shortest and perhaps simplest code would probably not be the easiest to understand. Thus the last point -- simplifying future code maintainability may be your goal. If this is the case, your original algorithm modified only keeping the groupBy tuple and adding comments as to what each step of the pipeline is doing may be your best bet:
// combine numbers into common buckets specified by the number itself
let getDuplicates = numbers |> List.groupBy id
// only look at buckets with more than one item
|> List.filter( fun (_,set) -> set.Length > 1)
// change each bucket to only its key
|> List.map( fun (key,_) -> key )
The original question comments already show that your code was unclear to people unfamiliar with it. Is this a question of experience? Definitely. But, regardless of whether we work on a team, or are lone wolves, optimizing code (where possible) for quick understanding should probably be close to everyone's top priority. (climbing down off sandbox...) :)
Regardless, best of luck.
If you don't mind using a mutable collection in a local scope, this could do it:
open System.Collections.Generic
let getDuplicates numbers =
let known = HashSet()
numbers |> List.filter (known.Add >> not) |> set
You can wrap the last three operations in a List.choose:
let duplicates =
numbers
|> List.groupBy id
|> List.choose ( function
| _, x::_::_ -> Some x
| _ -> None )
Here's a solution which uses only basic functions and immutable data structures:
let findDups elems =
let findDupsHelper (oneOccurrence, manyOccurrences) elem =
if oneOccurrence |> Set.contains elem
then (oneOccurrence, manyOccurrences |> Set.add elem)
else (oneOccurrence |> Set.add elem, manyOccurrences)
List.fold findDupsHelper (Set.empty, Set.empty) elems |> snd

F#: Generating a word count summary

I am new to programming and F# is my first .NET language.
I would like to read the contents of a text file, count the number of occurrences of each word, and then return the 10 most common words and the number of times each of them appears.
My questions are: Is using a dictionary encouraged in F#? How would I write the code if I wish to use a dictionary? (I have browsed through the Dictionary class on MSDN, but I am still puzzling over how I can update the value to a key.) Do I always have to resort to using Map in functional programming?
While there's nothing wrong with the other answers, I'd like to point out that there's already a specialized function to get the number of unique keys in a sequence: Seq.countBy. Plumbing the relevant parts of Reed's and torbonde's answers together:
let countWordsTopTen (s : string) =
s.Split([|','|])
|> Seq.countBy (fun s -> s.Trim())
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
"one, two, one, three, four, one, two, four, five"
|> countWordsTopTen
|> printfn "%A" // seq [("one", 3); ("two", 2); ("four", 2); ("three", 1); ...]
My questions are: Is using a dictionary encouraged in F#?
Using a Dictionary is fine from F#, though it does use mutability, so it's not quite as common.
How would I write the code if I wish to use a dictionary?
If you read the file, and have a string with comma separated values, you could
parse using something similar to:
// Just an example of input - this would come from your file...
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let dict = Dictionary<_,_>()
words
|> Array.iter (fun w ->
match dict.TryGetValue w with
| true, v -> dict.[w] <- v + 1
| false, _ -> dict.[w] <- 1)
// Creates a sequence of tuples, with (word,count) in order
let topTen =
dict
|> Seq.sortBy (fun kvp -> -kvp.Value)
|> Seq.truncate 10
|> Seq.map (fun kvp -> kvp.Key, kvp.Value)
I would say an obvious choice for this task is to use the Seq module, which is really one of the major workhorses in F#. As Reed said, using dictionary is not as common, since it is mutable. Sequences, on the other hand, are immutable. An example of how to do this using sequences is
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let topTen =
words
|> Seq.groupBy id
|> Seq.map (fun (w, ws) -> (w, Seq.length ws))
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
I think the code speaks pretty much for itself, although maybe the second last line requires a short explanation:
The snd-function gives the second entry in a pair (i.e. snd (a,b) is b), >> is the functional composition operator (i.e. (f >> g) a is the same as g (f a)) and ~- is the unary minus operator. Note here that operators are essentially functions, but when using (and declaring) them as functions, you have to wrap them in parentheses. That is, -3 is the same as (~-) 3, where in the last case we have used the operator as a function.
In total, what the second last line does, is sort the sequence by the negative value of the second entry in the pair (the number of occurrences).

F# sort using head::tail

I am trying to write a recursive function that uses head::tail. I understand that head in the first element of the list and tail is all other elements in the list. I also understand how recursions works. What I am wondering is how to go about sorting the elements in the list. Is there a way to compare the head to every element in the tail then choose the smallest element? My background in C++ and I am not allowed to use the List.sort(). Any idea of how to go about it? I have looked at the tutorials on the msdn site and still have had no luck
Here is recursive list-based implementation of quicksort algorithm in F#
let rec quicksort list =
match list with
| [] -> []
| h::t ->
let lesser = List.filter ((>) h) t
let greater = List.filter ((<=) h) t
(quicksort lesser) #[h] #(quicksort greater)
You need to decide a sorting methodology before worrying about the data structure used. If you were to do, say, insertion sort, you would likely want to start from the end of the list and insert an item at each recursion level, being careful how you handle the insertion itself.
Technically at any particular level you only have access to one data element, however you can pass a particular data element as a parameter to preserve it. For instance here is the inserting part of an insertion sort algorithm, it assumes the list is sorted.
let rec insert i l =
match l with
| [] -> [i]
| h::t -> if h > i then
i::l
else
h::(insert i t)
Note how I now have access to two elements, the cached one and the remainder. Another variation would be a merge sort where you had two sorted lists and therefore two items to work with any particular iteration.
Daniel's commented answer mentions a particular implementation (quicksort) if you are interested.
Finally list's aren't optimal for sorting algorithms due to their rigid structure, and the number of allocations required. Given that all known sorting algorithms are > O(n) complexity, you can translate you list to and from an array in order to improve performance without hurting your asymptotic performance.
EDIT:
Note that above isn't in tail recursive format, you would need to do something like this:
let insert i l =
let rec insert i l acc =
match l with
| [] -> List.foldBack (fun e a -> e :: a) acc [i]
| h::t -> if h > i then
List.foldBack (fun e a -> e :: a) acc i::l
else
insert i l (i::acc)
insert i l []
I don't remember offhand the best way to reverse a list so went with an example from https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/lists

f# iterating over two arrays, using function from a c# library

I have a list of words and a list of associated part of speech tags. I want to iterate over both, simultaneously (matched index) using each indexed tuple as input to a .NET function. Is this the best way (it works, but doesn't feel natural to me):
let taggingModel = SeqLabeler.loadModel(lthPath +
"models\penn_00_18_split_dict.model");
let lemmatizer = new Lemmatizer(lthPath + "v_n_a.txt")
let input = "the rain in spain falls on the plain"
let words = Preprocessor.tokenizeSentence( input )
let tags = SeqLabeler.tagSentence( taggingModel, words )
let lemmas = Array.map2 (fun x y -> lemmatizer.lookup(x,y)) words tags
Your code looks quite good to me - most of it deals with some loading and initialization, so there isn't much you could do to simplify that part. Alternatively to Array.map2, you could use Seq.zip combined with Seq.map - the zip function combines two sequences into a single one that contains pairs of elements with matching indices:
let lemmas = Seq.zip words tags
|> Seq.map (fun (x, y) -> lemmatizer.lookup (x, y))
Since lookup function takes a tuple that you got as an argument, you could write:
// standard syntax using the pipelining operator
let lemmas = Seq.zip words tags |> Seq.map lemmatizer.lookup
// .. an alternative syntax doing exactly the same thing
let lemmas = (words, tags) ||> Seq.zip |> Seq.map lemmatizer.lookup
The ||> operator used in the second version takes a tuple containing two values and passes them to the function on the right side as two arguments, meaning that (a, b) ||> f means f a b. The |> operator takes only a single value on the left, so (a, b) |> f would mean f (a, b) (which would work if the function f expected tuple instead of two, space separated, parameters).
If you need lemmas to be an array at the end, you'll need to add Array.ofSeq to the end of the processing pipeline (all Seq functions work with sequences, which correspond to IEnumerable<T>)
One more alternative is to use sequence expressions (you can use [| .. |] to construct an array directly if that's what you need):
let lemmas = [| for wt in Seq.zip words tags do // wt is tuple (string * string)
yield lemmatizer.lookup wt |]
Whether to use sequence expressions or not - that's just a personal preference. The first option seems to be more succinct in this case, but sequence expressions may be more readable for people less familiar with things like partial function application (in the shorter version using Seq.map)

Resources