F# noob - map & reduce words - f#

I am trying F# and trying to make a map reduce for a list of words to a word, count.
Here's what I have so far,
let data1 = ["Hello"; "Hello"; "How"; "How"; "how"; "are"]
let map = data1 |> List.map (fun x -> (x, 1))
printfn "%A" map
which gives the following output:
val map : (string * int) list =
[("Hello", 1); ("Hello", 1); ("How", 1); ("How", 1); ("how", 1); ("are", 1)]
but
let reduce = ...???
Now I am confused to how design a reduce function so that it has the word, count pair list. Any suggestions? I appreciate your help! Thanks

There's a built-in function for that:
data1 |> Seq.countBy id
which will give you a sequence of tuples:
val it : seq<string * int> =
seq [("Hello", 2); ("How", 2); ("how", 1); ("are", 1)]
The id function is another built-in function that takes a value and returns the same value, so in this case it means that you count by the strings themselves.
If you rather want a list than a seq, you can use Seq.toList:
> data1 |> Seq.countBy id |> Seq.toList;;
val it : (string * int) list =
[("Hello", 2); ("How", 2); ("how", 1); ("are", 1)]
If you want a map, this is also easy:
> data1 |> Seq.countBy id |> Map.ofSeq;;
val it : Map<string,int> =
map [("Hello", 2); ("How", 2); ("are", 1); ("how", 1)]

You don't actually need the map list. It's simpler to directly put the list into an associative map:
let reduce x =
x |> List.fold (fun m x -> match Map.tryFind x m with
| None -> Map.add x 1 m
| Some c -> Map.add x (c+1) m)
Map.empty
Let's try it in the interpreter:
> reduce data1
val it : Map<string,int> = map [("Hello", 2); ("How", 2); ("are", 1); ("how", 1)]
There is a good explanation of how to use the reducing function fold here, and a good explanation of how to use the associative map data structure Map<'Key,'T> here.

Here is a relatively inefficient but easy to understand solution:
data1 |> Seq.groupBy id |> Seq.map (fun (a,b) -> a,Seq.length b)
essentially, do the grouping and then see how many elements are in each group.
#ildjarn pointed out an improvement, which is probably the most efficient and also even simpler:
data1 |> Seq.countBy id

Related

Decorate an F# sequence with (simple) computed state for each element

I have a solution to this, and several working-but-unsatisfactory solutions, but it took a lot of work and seems unnecessarily complex.
Am I missing something in F#?
The Problem
I have a sequence of numbers
let nums = seq { 9; 12; 4; 17; 9; 7; 13; }
I want to decorate each number with an "index", so the result is
seq [(9, 0); (12, 1); (4, 2); (17, 3); ...]
Looks simple!
In practice the input can be very large and of indeterminate size. In my application, it is coming from a REST service.
Further
the operation must support lazy evaluation (because of the REST backend)
must be purely functional, which eliminates the obvious seq { let mutable i = o; for num in nums do .. } solution, ditto for while ... do ...
Lets call the function decorate, of type (seq<'a> -> seq<'a * int>), so it would work as follows:
nums
|> decorate
|> Seq.iter (fun (n,index) -> printfn "%d: %d" index n)
Producing:
0: 9
1: 12
2: 4
...
6: 13
This is a trivial problem with Lists (apart from the lazy evaluation), but tricky with Sequences.
My solution is to use Seq.unfold, as follows:
let decorate numSeq =
(0,numSeq)
|> Seq.unfold
(fun (count,(nums : int seq)) ->
if Seq.isEmpty nums then
None
else
let result = ((Seq.head nums),count)
let remaining = Seq.tail nums
Some( result, (count+1,remaining)))
This meets all requirements, and is the best I've come up with.
Here's the whole solution, with diagnostics to show lazy evaluation:
let nums =
seq {
// With diagnostic
let getN n =
printfn "get: %d" n
n
getN 9;
getN 12;
getN 4;
getN 17;
getN 9;
getN 7;
getN 13
}
let decorate numSeq =
(0,numSeq)
|> Seq.unfold
(fun (count,(nums : int seq)) ->
if Seq.isEmpty nums then
None
else
let result = ((Seq.head nums),count)
let remaining = Seq.tail nums
printfn "unfold: %A" result
Some( result, (count+1,remaining)))
nums
|> Seq.cache
// To prevent re-computation of the sequence.
// Will be necessary for any solution. This solution required only one.
|> decorate
|> Seq.iter (fun (n,index) -> printfn "ITEM %d: %d" index n)
PROBLEM: This took a LOT of work to reach. It looks complex, compared to the (apparently) simple requirement.
QUESTION: Is there a simpler solution?
Discussion of some alternatives.
All work, but are unsatisfactory for the reasons given
// Most likely: Seq.mapFold
// Fails lazy evalation. The final state must be evaluated, even if not used
let decorate numSeq =
numSeq
|> Seq.mapFold
(fun count num ->
let result = (num,count)
printfn "yield: %A" result
(result,(count + 1)))
0
|> fun (nums,finalState) -> nums // And, no, using "_" doesn't fix it!
// 'for' loop, with MUTABLE
// Lazy evaluation works
// Not extensible, as the state 'count' is specific to this problem
let decorate numSeq =
let mutable count = 0
seq {
for num in numSeq do
let result = num,count
printfn "yield: %A" result
yield result;
count <- count+1
}
// 'for' loop, without mutable
// Fails lazy evaluation, and is ugly
let decorate numSeq =
seq {
for index in 0..((Seq.length numSeq) - 1) do
let result = ((Seq.item index numSeq), // Ugly!
index)
printfn "yield: %A" result
yield result
}
// "List" like recursive descent,
// Fails lazy evaluation. Ugly, because we are not meant to use recursion on Sequences
// https://stackoverflow.com/questions/11451727/recursive-functions-for-sequences-in-f
let rec decorate' count (nums : int seq) =
if Seq.isEmpty nums then
Seq.empty
else
let hd = Seq.head nums
let tl = Seq.tail nums
let result = (hd,count)
let tl' = decorate' (count+1) tl
printfn "yield: %A" result
seq { yield result; yield! tl'}
let decorate : (seq<'a> -> seq<'a * int>) = decorate' 0
You can use Seq.mapi to do what you need.
let nums = seq { 9; 12; 4; 17; 9; 7; 13; }
nums |> Seq.mapi (fun i num -> (num, i))
This gives (9, 0); (12, 1); etc...
Seq is "lazy" in the same sense as IEnumerable in C#.
You can read about Seq.mapi here:
https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-seqmodule.html#mapi
Read more about the use of map here:
https://fsharpforfunandprofit.com/posts/elevated-world/#map
In addition to the Seq.mapi function mentioned in Sean's answer, F# also has a built-in Seq.indexed function, which decorates a sequence with index. This does not do exactly what you're asking, because the index becomes the first element of the tuple, but depending on your use case, it may do the trick:
> let nums = seq { 9; 12; 4; 17; 9; 7; 13; };;
val nums : seq<int>
> Seq.indexed nums;;
val it : seq<int * int> = seq [(0, 9); (1, 12); (2, 4); (3, 17); ...]
If I was trying to implement this on my own using a more primitive function, it could be done using Seq.scan, which is a bit like fold but produces a lazy sequence of states. The only tricky thing is that you have to construct the initial state and then process the rest of the sequence:
Seq.tail nums
|> Seq.scan (fun (prevIndex, _) v -> (prevIndex+1, v)) (0, Seq.head nums)
This will not work for empty lists, even though the function should logically be able to handle this.
Using for is not bad, or wrong. for and yield in a seq {} is how you write new seq functions, if none of the provided functions in Seq Module is a best-fit. It is neither wrong, or bad to use this special construct. It's the same as C# foreach and yield syntax.
Using a mutable in a limited scope, is also not wrong. Mutables are a bad idea, if they escape the scope. For example, you return a mutable value, from a function.
Its important to put the mutable inside the seq, and not outside. Your version is wrong.
Let's assume this
let xs = decorate [3;6;7;12;9]
for x in xs do
printfn "%A" x
for x in xs do
printfn "%A" x
Now you have two versions of decorate. The first version
let decorate numSeq =
let mutable count = 0
seq {
for num in numSeq do
yield (num,count)
count <- count + 1
}
will print:
(3, 0)
(6, 1)
(7, 2)
(12, 3)
(9, 4)
(3, 5)
(6, 6)
(7, 7)
(12, 8)
(9, 9)
Or in other words. The mutable is shared across all invocation whenever you iterate through the sequence. As a general tip. If you want to return a seq then put all your code into seq. And put the seq {} after the = sign. If you do this instead.
let decorate numSeq = seq {
let mutable count = 0
for num in numSeq do
yield (num,count)
count <- count + 1
}
you get the correct output:
(3, 0)
(6, 1)
(7, 2)
(12, 3)
(9, 4)
(3, 0)
(6, 1)
(7, 2)
(12, 3)
(9, 4)
Forther you explain, that this version is not "extensible". But the version with mapi you select as "correct". Has the same problem, it only provides an index, nothing more.
If you want a more generic version, you always can make a function that expects its values as a function argument. You could for example change the above function to this code.
let decorate2 f (state:'State) (xs:'T seq) = seq {
let mutable state = state
for x in xs do
yield state, x
let newState = f state x
state <- newState
}
Now decorate2 expects a state that you can freely pass, and a function to change the state. With this function you could then write:
decorate2 (fun state _ -> state+1) 0 [3;6;7;12;9]
The function signature is nearly the same as Seq.scan, but still a little bit different. But if you want to create a indexed function, you could use scan like this.
let indexed xs =
Seq.scan (fun (count,_) x -> (count+1,x)) (0,Seq.head xs) (Seq.skip 1 xs)
Just in my opinion. This version is harder rot read, understand, and just fugly compared to decorate or decorate2.
And just a note. There is already a Seq.indexed function in the standard library, that does what you wish.
for x in Seq.indexed [3;6;7;12;9] do
printfn "%A" x
will print
(0, 3)
(1, 6)
(2, 7)
(3, 12)
(4, 9)

F# build a list/array of values + consecutive duplicates

I need to pack data like this:
let data = [1; 2; 2; 3; 2; 2; 2; 4]
let packed = [(1, 1); (2, 2); (3, 1); (2, 3); (4, 1)]
Where each item say how much times it exist before the next. However, it must work with non-adjacent duplications.
I can work this with classical imperative code, but wonder how do this functionally.
Also, Seq.countBy not work because it take in account all the values
If you already have an imperative version, you can follow a set of small steps to refector to a recursive implementation.
Recursion
While I don't know what your imperative version looks like, here's a recursive version:
let pack xs =
let rec imp acc = function
| [] -> acc
| h::t ->
match acc with
| [] -> imp [(h, 1)] t
| (i, count) :: ta ->
if h = i
then imp ((i, count + 1) :: ta) t
else imp ((h, 1) :: (i, count) :: ta) t
xs |> imp [] |> List.rev
This function has the type 'a list -> ('a * int) list when 'a : equality. It uses a private 'implementation function' called imp to do the work. This function is recursive, and threads an accumulator (called acc) throughout. This accumulator is the result list, having the type ('a * int) list.
If the accumulator list is empty, the head of the original list (h), as well as the count 1, is created as a tuple as the only element of the updated accumulator, and the imp function is recursively called with that updated accumulator.
If the accumulator already contains at least one element, the element is extracted via pattern matching, and the element in that tuple (i) is compared to h. If h = i, the accumulator is updated; otherwise, a new tuple is consed on acc. In both cases, though, imp is recursively called with the new accumulator.
You can call it with a list equivalent to your original tuple like this:
> pack [1; 2; 2; 3; 2; 2; 2; 4];;
val it : (int * int) list = [(1, 1); (2, 2); (3, 1); (2, 3); (4, 1)]
Fold
Once you have a recursive version, you often have the recipe for a version using a fold. In this case, since the above pack function has to reverse the accumulator in the end (using List.rev), a right fold is most appropriate. In F#, this is done with the built-in List.foldBack function:
let pack' xs =
let imp x = function
| (i, count) :: ta when i = x -> (i, count + 1) :: ta
| ta -> (x, 1) :: ta
List.foldBack imp xs []
In this case, the function passed to List.foldBack is a bit too complex to pass as an anonymous function, so I chose to define it as a private inner function. It's equivalent to the recursive imp function used by the above pack function, but you'll notive that it doesn't have to call itself recursively. Instead, it just has to return the new value for the accumulator.
The result is the same:
> pack' [1; 2; 2; 3; 2; 2; 2; 4];;
val it : (int * int) list = [(1, 1); (2, 2); (3, 1); (2, 3); (4, 1)]
My solution assumes the data collection is a list. If having it as a tuple (as per your example) was intentional then for my solution to work the tuple has to be converted to a list (an example how to do it can be found here).
let groupFunc list =
let rec groupFuncRec acc lst init count =
match lst with
| [] -> List.rev acc
| head::[] when head = init
-> groupFuncRec ((init, count)::acc) [] 0 0
| head::[] when head <> init
-> groupFuncRec ((head, 1)::acc) [] 0 0
| head::tail when head = init
-> groupFuncRec acc tail head (count+1)
| head::tail when head <> init
-> groupFuncRec ((init, count)::acc) tail head 1
let t = List.tail list
let h = List.head list
groupFuncRec [] t h 1
When I run the function on your sample data I get back the expected result:
list = [(1, 1); (2, 2); (3, 1); (4, 1)]
You can get Seq.countBy to work by including some positional information in its argument. Of course, you need then to map back to your original data.
[1; 2; 2; 3; 2; 2; 2; 4]
|> Seq.scan (fun (s, i) x ->
match s with
| Some p when p = x -> Some x, i
| _ -> Some x, i + 1 ) (None, 0)
|> Seq.countBy id
|> Seq.choose (function
| (Some t, _), n -> Some(t, n)
| _ -> None )
|> Seq.toList
// val it : (int * int) list = [(1, 1); (2, 2); (3, 1); (2, 3); (4, 1)]

Extract elements from sequences, tuples

Say I have this:
let coor = seq { ... }
// val coor : seq<int * int> = seq[(12,34); (56, 78); (90, 12); ...]
I'm trying to get the value of the first number of the second element in the sequence, in this case 56. Looking at the MSDN Collection API reference, Seq.nth 1 coor returns (56, 78), of type seq <int * int>. How do I get 56 out of it?
I suggest you go through Tuple article:
http://msdn.microsoft.com/en-us/library/dd233200.aspx
A couple of exceptions that might shed some light on the problem:
Function fst is used to access the first element of the tuple:
(1, 2) |> fst // returns 1
Function snd is used to access the second element
(1, 2) |> snd // returns 2
In order to extract element from wider tuples you can use following syntax:
let _,_,a,_ = (1, 2, 3, 4) // a = 3
To use it in various collections (well lambdas that are passed to collection's functions), let's start with following sequence:
let s = seq {
for i in 1..3 do yield i,-i
}
We end up with
seq<int * int> = seq [(1, -1); (2, -2); (3, -3)]
Let's say we want to extract only the first element (note the arguments of the lambda):
s |> Seq.map (fun (a, b) -> a)
Or even shorter:
s |> Seq.map fst
And lets finally go back to your question.
s |> Seq.nth 1 |> fst
It's a tuple, so you could use the function fst;
> let value = fst(Seq.nth 1 coor);;
val value : int = 56
...or access it via pattern matching;
> let value,_ = Seq.nth 1 coor;;
val value : int = 56

Initial state in F# List.scan

I have a simple problem and as I'm an F# newbie I can't seem to figure out how to do this. I have a list of tuples:
let l = [ (a, 2); (b, 3); (c, 2); (d, 6) ]
that I want to transform into this:
let r = [ (a, 2); (b, 5); (c, 7); (d, 13) ]
This simply adds the values of the second element in each tuple: 2 + 3 + 2 + 6. The objects a, b, c and d are complex objects that I simply want to keep.
I thought I should use List.scan for this. It takes a list, threads an accumulator through the computation and returns a list:
let r = l |> List.scan (fun (_, s) (o, i) -> (o, s + i)) (??, 0) |> List.tail
But I don't know what to fill in for the question marks. I'm not interested in the initial state except for the 0. And I don't want to specify some 'empty' instance of the first tuple element.
Or is there a simpler way of doing this?
You can use first element as an initial state:
let l = [ ("a", 2); ("b", 3); ("c", 2); ("d", 6) ]
let x::xs = l
let res = (x, xs) ||> List.scan (fun (_, x) (o, n) -> o, x + n) // [("a", 2); ("b", 5); ("c", 7); ("d", 13)]
Special case with empty list should be processed separately

How to remove imperative code from a function?

I'm new to functional world and appreciate help on this one.
I want to SUPERCEDE ugly imperative code from this simple function, but don't know how to do it.
What I want is to randomly pick some element from IEnumerable (seq in F#) with a respect to probability value - second item in tuple (so item with "probability" 0.7 will be picked more often than with 0.1).
/// seq<string * float>
let probabilitySeq = seq [ ("a", 0.7); ("b", 0.6); ("c", 0.5); ("d", 0.1) ]
/// seq<'a * float> -> 'a
let randomPick probSeq =
let sum = Seq.fold (fun s dir -> s + snd dir) 0.0 probSeq
let random = (new Random()).NextDouble() * sum
// vvvvvv UGLY vvvvvv
let mutable count = random
let mutable ret = fst (Seq.hd probSeq )
let mutable found = false
for item in probSeq do
count <- count - snd item
if (not found && (count < 0.0)) then
ret <- fst item //return ret; //in C#
found <- true
// ^^^^^^ UGLY ^^^^^^
ret
////////// at FSI: //////////
> randomPick probabilitySeq;;
val it : string = "a"
> randomPick probabilitySeq;;
val it : string = "c"
> randomPick probabilitySeq;;
val it : string = "a"
> randomPick probabilitySeq;;
val it : string = "b"
I think that randomPick is pretty straightforward to implement imperatively, but functionally?
This is functional, but take list not seq (wanted).
//('a * float) list -> 'a
let randomPick probList =
let sum = Seq.fold (fun s dir -> s + snd dir) 0.0 probList
let random = (new Random()).NextDouble() * sum
let rec pick_aux p list =
match p, list with
| gt, h::t when gt >= snd h -> pick_aux (p - snd h) t
| lt, h::t when lt < snd h -> fst h
| _, _ -> failwith "Some error"
pick_aux random probList
An F# solution using the principle suggested by Matajon:
let randomPick probList =
let ps = Seq.skip 1 (Seq.scan (+) 0.0 (Seq.map snd probList))
let random = (new Random()).NextDouble() * (Seq.fold (fun acc e -> e) 0.0 ps)
Seq.find (fun (p, e) -> p >= random)
(Seq.zip ps (Seq.map fst probList))
|> snd
But I would probably also use a list-based approach in this case since the sum of the probability values needs to be precalculated anyhow...
I will provide only Haskell version since I don't have F# present on my notebook, it should be similar. The principle is to convert your sequence to sequence like
[(0.7,"a"),(1.3,"b"),(1.8,"c"),(1.9,"d")]
where each first element in the tuple is representing not probablity but something like range. Then it is easy, pick one random number from 0 to last number (1.9) and check in which range it belongs to. For example if 0.5 is chosen, it will be "a" because 0.5 is lower than 0.7.
Haskell code -
probabilitySeq = [("a", 0.7), ("b", 0.6), ("c", 0.5), ("d", 0.1)]
modifySeq :: [(String, Double)] -> [(Double, String)]
modifySeq seq = modifyFunction 0 seq where
modifyFunction (_) [] = []
modifyFunction (acc) ((a, b):xs) = (acc + b, a) : modifyFunction (acc + b) xs
pickOne :: [(Double, String)] -> IO String
pickOne seq = let max = (fst . last) seq in
do
random <- randomRIO (0, max)
return $ snd $ head $ dropWhile (\(a, b) -> a < random) seq
result :: [(String, Double)] -> IO String
result = pickOne . modifySeq
Example -
*Main> result probabilitySeq
"b"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"d"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"b"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"c"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"c"
The way I understand it, you're logic works like this:
Sum all the weights, then select a random double somewhere between 0 and the sum of all the weights. Find the item which corresponds to your probability.
In other words, you want to map your list as follows:
Item Val Offset Max (Val + Offset)
---- --- ------ ------------------
a 0.7 0.0 0.7
b 0.6 0.7 1.3
c 0.5 1.3 1.8
d 0.1 1.8 1.9
Transforming a list of (item, probability) to (item, max) is straightforward:
let probabilityMapped prob =
[
let offset = ref 0.0
for (item, probability) in prob do
yield (item, probability + !offset)
offset := !offset + probability
]
Although this falls back on mutables, its pure, deterministic, and in the spirit of readable code. If you insist on avoiding mutable state, you can use this (not tail-recursive):
let probabilityMapped prob =
let rec loop offset = function
| [] -> []
| (item, prob)::xs -> (item, prob + offset)::loop (prob + offset) xs
loop 0.0 prob
Although we're threading state through the list, we're performing a map, not a fold operation, so we shouldn't use the Seq.fold or Seq.scan methods. I started writing code using Seq.scan, and it looked hacky and strange.
Whatever method you choose, once you get your list mapped, its very easy to select a randomly weighted item in linear time:
let rnd = new System.Random()
let randomPick probSeq =
let probMap =
[
let offset = ref 0.0
for (item, probability) in probSeq do
yield (item, probability + !offset)
offset := !offset + probability
]
let max = Seq.maxBy snd probMap |> snd
let rndNumber = rnd.NextDouble() * max
Seq.pick (fun (item, prob) -> if rndNumber <= prob then Some(item) else None) probMap
I would use Seq.to_list to transform the input sequence into a list and then use the list based approach. The list quoted is short enough that it shouldn't be an unreasonable overhead.
The simplest solution is to use ref to store state between calls to iterator for any suitable function from Seq module:
let probabilitySeq = seq [ ("a", 0.7); ("b", 0.6); ("c", 0.5); ("d", 0.1) ]
let randomPick probSeq =
let sum = Seq.fold (fun s (_,v) -> s + v) 0.0 probSeq
let random = ref (System.Random().NextDouble() * sum)
let aux = function
| _,v when !random >= v ->
random := !random - v
None
| s,_ -> Some s
match Seq.first aux probSeq with
| Some r -> r
| _ -> fst (Seq.hd probSeq)
I would use your functional, list-based version, but adapt it to use LazyList from the F# PowerPack. Using LazyList.of_seq will give you the moral equivalent of a list, but without evaluating the whole thing at once. You can even pattern match on LazyLists with the LazyList.(|Cons|Nil|) pattern.
I think that cfern's suggestion is actually simplest (?= best) solution to this.
Entire input needs to be evaluated, so seq's advantage of yield-on-demand is lost anyway. Easiest seems to take sequence as input and convert it to a list and total sum at the same time. Then use the list for the list-based portion of the algorithm (list will be in reverse order, but that doesn't matter for the calculation).
let randomPick moveList =
let sum, L = moveList
|> Seq.fold (fun (sum, L) dir -> sum + snd dir, dir::L) (0.0, [])
let rec pick_aux p list =
match p, list with
| gt, h::t when gt >= snd h -> pick_aux (p - snd h) t
| lt, h::t when lt < snd h -> fst h
| _, _ -> failwith "Some error"
pick_aux (rand.NextDouble() * sum) L
Thanks for Yours solutions, especially Juliet and Johan (I've to read it few times to actually get it).
:-)

Resources