A more functional way to create tuples from two arrays - f#

I've created a function that gets all integers from 1 to n and then combines with the same sequence to create a sequence of tuples of all combinations. So passing it the integer 2 would give you [(1,1);(1,2);(2,1);(2,2)]:
let allTuplesUntil x =
let primary = seq { 1 .. x }
let secondary = seq { 1 .. x }
[for x in primary do
for y in secondary do
yield (x,y)]
This implementations works, but it uses an inner and outer for loop, similar to what I would do in c#.
Could this be achieved in a more idiomatic functional way? Would a more functional way typically be more desirable or is this acceptable in a functional language because of its brevity and clarity?
I'm relatively new to f# and looking for some feedback.

These loops are part of what's called computation expression, which is quite idiomatic to F#. It's just made to look like familiar loops. I can't see any problem with your code being written in this way. If what you want is to get rid of the loops, you could hide them in functions:
let cartesianProduct xs ys =
xs |> Seq.collect (fun x -> ys |> Seq.map (fun y -> x, y))
cartesianProduct [1;2;3] ['a';'b';'c']
val it : seq<int * char> = seq [(1, 'a'); (1, 'b'); (1, 'c'); (2, 'a'); ...]

First, just because there is a for doesn't mean its not functional. In this example you go over each element and yield a new element that will turn into a new element of a new immutable list. Such feature is also named "List Comprehension" and part of languages like Haskell. Imperative would be to loop over a list and mutate the list.
Second, remember that other functions like map, fold, filter also just loop over each element, like a for expression. They are just less powerful than a for loop.
Third, even if it would be "not 100% functional". Who cares? Code should be easily readable and understandable. The intention of two for loops is easy to understand.
Fourth, the equivalent function of the for expression is usually the bind or in this case the Seq.collect function. You also could write, this code.
[for x in primary do
for y in secondary do
yield (x,y)]
Like this:
primary |> Seq.collect (fun x ->
secondary |> Seq.collect (fun y ->
[x,y]
))
I prefer the for loops for readability!

Related

F#: Generating a word count summary

I am new to programming and F# is my first .NET language.
I would like to read the contents of a text file, count the number of occurrences of each word, and then return the 10 most common words and the number of times each of them appears.
My questions are: Is using a dictionary encouraged in F#? How would I write the code if I wish to use a dictionary? (I have browsed through the Dictionary class on MSDN, but I am still puzzling over how I can update the value to a key.) Do I always have to resort to using Map in functional programming?
While there's nothing wrong with the other answers, I'd like to point out that there's already a specialized function to get the number of unique keys in a sequence: Seq.countBy. Plumbing the relevant parts of Reed's and torbonde's answers together:
let countWordsTopTen (s : string) =
s.Split([|','|])
|> Seq.countBy (fun s -> s.Trim())
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
"one, two, one, three, four, one, two, four, five"
|> countWordsTopTen
|> printfn "%A" // seq [("one", 3); ("two", 2); ("four", 2); ("three", 1); ...]
My questions are: Is using a dictionary encouraged in F#?
Using a Dictionary is fine from F#, though it does use mutability, so it's not quite as common.
How would I write the code if I wish to use a dictionary?
If you read the file, and have a string with comma separated values, you could
parse using something similar to:
// Just an example of input - this would come from your file...
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let dict = Dictionary<_,_>()
words
|> Array.iter (fun w ->
match dict.TryGetValue w with
| true, v -> dict.[w] <- v + 1
| false, _ -> dict.[w] <- 1)
// Creates a sequence of tuples, with (word,count) in order
let topTen =
dict
|> Seq.sortBy (fun kvp -> -kvp.Value)
|> Seq.truncate 10
|> Seq.map (fun kvp -> kvp.Key, kvp.Value)
I would say an obvious choice for this task is to use the Seq module, which is really one of the major workhorses in F#. As Reed said, using dictionary is not as common, since it is mutable. Sequences, on the other hand, are immutable. An example of how to do this using sequences is
let strings = "one, two, one, three, four, one, two, four, five"
let words =
strings.Split([|','|])
|> Array.map (fun s -> s.Trim())
let topTen =
words
|> Seq.groupBy id
|> Seq.map (fun (w, ws) -> (w, Seq.length ws))
|> Seq.sortBy (snd >> (~-))
|> Seq.truncate 10
I think the code speaks pretty much for itself, although maybe the second last line requires a short explanation:
The snd-function gives the second entry in a pair (i.e. snd (a,b) is b), >> is the functional composition operator (i.e. (f >> g) a is the same as g (f a)) and ~- is the unary minus operator. Note here that operators are essentially functions, but when using (and declaring) them as functions, you have to wrap them in parentheses. That is, -3 is the same as (~-) 3, where in the last case we have used the operator as a function.
In total, what the second last line does, is sort the sequence by the negative value of the second entry in the pair (the number of occurrences).

efficient way to create map of lists in functional style

Given a dataset, for example a CSV file that might look like this:
x,y
1,2
1,5
2,1
2,2
1,1
...
I wish to create a map of lists containing the y's for a given x... The result could look like this:
{1:[2,5,1], 2:[1,2]}
In python this would be straight forward to do in an imperative manner.. and would probably look somewhat like this:
d = defaultdict(list)
for x,y in csv_data:
d[x].append(y)
How would you go about achieving the same using functional programming techniques in F#?
Is it possible to do it as short, efficient and concise (and read-able) as in the given python example, using only functional style?, or would you have to fall back to imperative programming style with mutable data structures..?
Note: this is not a homework assignment, just me trying to wrap my head around functional programming
EDIT: My conclusion based on answers thus far
I tried timing each of the provided answers on a relative big csv file, just to get a feeling of the performance.. Furthermore I did a small test with the imperative approach:
let res = new Dictionary<string, List<string>>()
for row in l do
if (res.ContainsKey(fst row) = false) then
res.[fst row] <- new List<string>()
res.[fst row].Add(snd row)
The imperative approach completed in ~0.34 sec.
I think that the answer provided by Lee is the most general FP one, however the running time was ~4sec.
The answer given by Daniel ran in ~1.55sec.
And at last the answer given by jbtule ran in ~0.26. (I find it very interesting that it beat the imperative approach)
I used 'System.Diagnostics.Stopwatch()' for timing, and the code is executed as F# 3.0 in .Net 4.5
EDIT2: fixed stupid error in imperative f# code, and ensured that it uses the same list as the other solutions
[
1,2
1,5
2,1
2,2
1,1
]
|> Seq.groupBy fst
|> Seq.map (fun (x, ys) -> x, [for _, y in ys -> y])
|> Map.ofSeq
let addPair m (x, y) =
match Map.tryFind x m with
| Some(l) -> Map.add x (y::l) m
| None -> Map.add x [y] m
let csv (pairs : (int * int) list) = List.fold addPair Map.empty pairs
Note this adds the y values to the list in reverse order
use LINQ in F#, LINQ is functional.
open System.Linq
let data =[
1,2
1,5
2,1
2,2
1,1
]
let lookup = data.ToLookup(fst,snd)
lookup.[1] //seq [2;5;1]
lookup.[2] //seq [1;2
For fun, an implementation using a query expression:
let res =
query { for (k, v) in data do
groupValBy v k into g
select (g.Key, List.ofSeq g) }
|> Map.ofSeq

Different argument order for getting N-th element of Array, List or Seq

Is there a good reason for a different argument order in functions getting N-th element of Array, List or Seq:
Array.get source index
List .nth source index
Seq .nth index source
I would like to use pipe operator and it seems possible only with Seq:
s |> Seq.nth n
Is there a way to have the same notation with Array or List?
I don't think of any good reason to define Array.get and List.nth this way. Given that pipeplining is very common in F#, they should have been defined so that the source argument came last.
In case of List.nth, it doesn't change much because you can use Seq.nth and time complexity is still O(n) where n is length of the list:
[1..100] |> Seq.nth 10
It's not a good idea to use Seq.nth on arrays because you lose random access. To keep O(1) running time of Array.get, you can define:
[<RequireQualifiedAccess>]
module Array =
/// Get n-th element of an array in O(1) running time
let inline nth index source = Array.get source index
In general, different argument order can be alleviated by using flip function:
let inline flip f x y = f y x
You can use it directly on the functions above:
[1..100] |> flip List.nth 10
[|1..100|] |> flip Array.get 10
    
Just use backward pipe operator:
[1..1000] |> List.nth <| 42
Since both operators are left associative, x |> f <| y is parsed as (x |> f) <| y, and this does the trick.
Backward pipe operator is also useful if you want to remove parentheses: f (very long expression) can be replaced with f <| very long expression.
Since Pad and bytebuster answered your last question I will focus on the why part.
This is based my current knowledge and not historical facts.
Since F# derived from OCaml and OCaml has Array and List but not Seq and F# uses |> for natural pipelining and type checking and OCaml lacks the pipleline operator, the authors of F# made the switch for Seq. But obviously to be backward compatablie with OCaml they did not switch everything.

Functions vs methods

I'm very new to F#. One of the first things I noticed was that collection operations are defined as functions rather than as methods.
As an experiment, I defined a couple of methods on list:
type List<'a> with
member this.map f = List.map f this
member this.filter f = List.filter f this
Then, given these helpers:
let square x = x * x
let isEven n = n % 2 = 0
here's an example of using the methods:
[1 .. 10].filter(isEven).map(square)
And here's the traditional way:
[1 .. 10] |> List.filter isEven |> List.map square
So concision clearly wasn't a reason to choose functions over methods. :-)
From a library design perspective, why were functions chosen over methods?
My guess is that it's because you can pass List.filter around, but can't really pass the filter method around unless it's "tied" to a list or wrapped in an anonymous function (i.e. (fun (ls : 'a list) -> ls.filter) effectively turns the filter method back into a function on list).
However, even with that reason, it seems the most common case of invoking an operation directly would give favor to methods since they are more concise. So I'm wondering if there's another reason.
Edit:
My second guess is function specialization. I.e. it's straightforward to specialize List.filter (e.g. let evens List.filter isEven). It seems more verbose to have to define an evens method.
What functions have over methods is function specialization and the easy factoring it enables.
Here's the example expression involving functions:
let square x = x * x
let isEven n = n % 2 = 0
[1 .. 10] |> List.filter isEven |> List.map square
Let's factor out a function called evens for filtering evens:
let evens = List.filter isEven
And now let's factor out a function which squares a list of ints:
let squarify = List.map square
Our original expression is now:
[1 .. 10] |> evens |> squarify
Now let's go back to the original method based expression:
[1 .. 10].filter(isEven).map(square)
Factoring out a filter on evens isn't as trivial in this case.
I think your guess is correct. Concision aside, being able to treat List.filter as a first class thing that may be passed around (your first guess) and partially applied (your second guess) is key. It's a verb- rather than noun-oriented way of looking at the world. I think Steve Yegge said it best :)

f# iterating over two arrays, using function from a c# library

I have a list of words and a list of associated part of speech tags. I want to iterate over both, simultaneously (matched index) using each indexed tuple as input to a .NET function. Is this the best way (it works, but doesn't feel natural to me):
let taggingModel = SeqLabeler.loadModel(lthPath +
"models\penn_00_18_split_dict.model");
let lemmatizer = new Lemmatizer(lthPath + "v_n_a.txt")
let input = "the rain in spain falls on the plain"
let words = Preprocessor.tokenizeSentence( input )
let tags = SeqLabeler.tagSentence( taggingModel, words )
let lemmas = Array.map2 (fun x y -> lemmatizer.lookup(x,y)) words tags
Your code looks quite good to me - most of it deals with some loading and initialization, so there isn't much you could do to simplify that part. Alternatively to Array.map2, you could use Seq.zip combined with Seq.map - the zip function combines two sequences into a single one that contains pairs of elements with matching indices:
let lemmas = Seq.zip words tags
|> Seq.map (fun (x, y) -> lemmatizer.lookup (x, y))
Since lookup function takes a tuple that you got as an argument, you could write:
// standard syntax using the pipelining operator
let lemmas = Seq.zip words tags |> Seq.map lemmatizer.lookup
// .. an alternative syntax doing exactly the same thing
let lemmas = (words, tags) ||> Seq.zip |> Seq.map lemmatizer.lookup
The ||> operator used in the second version takes a tuple containing two values and passes them to the function on the right side as two arguments, meaning that (a, b) ||> f means f a b. The |> operator takes only a single value on the left, so (a, b) |> f would mean f (a, b) (which would work if the function f expected tuple instead of two, space separated, parameters).
If you need lemmas to be an array at the end, you'll need to add Array.ofSeq to the end of the processing pipeline (all Seq functions work with sequences, which correspond to IEnumerable<T>)
One more alternative is to use sequence expressions (you can use [| .. |] to construct an array directly if that's what you need):
let lemmas = [| for wt in Seq.zip words tags do // wt is tuple (string * string)
yield lemmatizer.lookup wt |]
Whether to use sequence expressions or not - that's just a personal preference. The first option seems to be more succinct in this case, but sequence expressions may be more readable for people less familiar with things like partial function application (in the shorter version using Seq.map)

Resources