Mapping a list of tuples by tuple members - f#

Suppose I have a list of decimal*decimal
let tup = [(1M, 2M); (2M, 3M); (3M, 3M); (4M, 5M); (5M, 6M); (7M, 6M); (8M, 9M); (10M, 9M)]
I need a function that can group all of the values together if they can be connected, e.g.,
map[(100, [1M; 2M; 3M]); (101, [4M; 5M; 6M; 7M]); (102, [8M; 9M; 10M])]
I can't just do a List.groupBy because that misses anything else that may be connected "down the line" by another decimal value. The int values in the map are arbitrary. I'd like to be able to "seed" the starting value then increase each incrementally by some value.
What's the function look like that can do this?

Am I right that by 'connected' you mean that numbers represent nodes and tuples represent edges in undirected graph? As far as I know there is no function in standard library which would do that. You can search for some library that perform basic graph operations. The operation you want to perform is division to connected components.
You can also try to implement that function from the scratch.
Here is some nice attempt.

What you're trying to accomplish seems to be adequately addressed already; as to how to accomplish it, here's one approach:
let groupConnected initId idTups =
let mergeGroups projectIds input =
(List.empty<SortedSet<_>>, input)
||> List.fold (fun groups x ->
let ids = projectIds x
match groups |> List.tryFind (fun g -> g.Overlaps ids) with
| Some g -> g.UnionWith ids
groups
| _ -> ids::groups)
idTups
|> mergeGroups (fun (a, b) -> SortedSet([| a; b |]))
|> mergeGroups id
|> List.sortBy (fun g -> g.Min)
|> Seq.mapi (fun i g -> initId + i, List.ofSeq g)
|> Map.ofSeq
Testing with this and your followup question's inputs:
> groupConnected 100 [(1M, 2M); (2M, 3M); (3M, 3M); (4M, 5M); (5M, 6M); (7M, 6M);
(8M, 9M); (10M, 9M)];;
val it : Map<int,decimal list> =
map [(100, [1M; 2M; 3M]); (101, [4M; 5M; 6M; 7M]); (102, [8M; 9M; 10M])]
> groupConnected 100 [(1M, 1M); (2M, 18M); (3M, 3M); (4M, 5M); (5M, 24M); (24M, 6M);
(7M, 6M); (8M, 9M); (10M, 9M)];;
val it : Map<int,decimal list> =
map
[(100, [1M]); (101, [2M; 18M]); (102, [3M]); (103, [4M; 5M; 6M; 7M; 24M]);
(104, [8M; 9M; 10M])]
Online Demo

Here is one not very pretty solution:
let tup = [(1M, 2M); (2M, 3M); (3M, 3M); (4M, 5M); (5M, 6M); (7M, 6M); (8M, 9M); (10M, 9M)]
let findGroupings lst =
let rec findGroup input previous acc =
match input with
| [] -> acc
| (a,b)::t ->
match previous with
| [] -> if a >= b then
findGroup t [] acc
else
findGroup t [b;a] acc
| h::_ -> if a > h && a < b then
findGroup t (b::(a::previous)) acc
elif a > h && a >=b then
let full = List.rev (a::previous)
findGroup t [] (full::acc)
elif a >= b then
findGroup t [] ((List.rev previous)::acc)
elif a < h then
findGroup t [b;a] (previous::acc)
else // a = h and a < b
findGroup t (b::previous) acc
findGroup lst [] []
|> List.rev
Using
let result = findGroupings tup
gives
val result : decimal list list = [[1M; 2M; 3M]; [4M; 5M; 6M; 7M]; [8M; 9M; 10M]]

Related

F# list group by running total?

I have the following list of tuples ordered by the first item. I want to cluster the times by
If the second item of the tuple is greater then 50, it will be in its own cluster.
Otherwise, cluster the items whose sum is less than 50.
The order cannot be changed.
code:
let values =
[("ACE", 78);
("AMR", 3);
("Aam", 6);
("Acc", 1);
("Adj", 23);
("Aga", 12);
("All", 2);
("Ame", 4);
("Amo", 60);
//....
]
values |> Seq.groupBy(fun (k,v) -> ???)
The expected value will be
[["ACE"] // 78
["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] // 47
["Ame"] // 4
["Amo"] // 60
....]
Ideally, I want to evenly distribute the second group (["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] which got the sum of 47) and the third one (["Ame"] which has only 4).
How to implement it in F#?
I had the following solution. It uses a mutable variable. It's not F# idiomatic? Is for ... do imperative in F# or is it a syntactic sugar of some function construct?
seq {
let mutable c = []
for v in values |> Seq.sortBy(fun (k, _) -> k) do
let sum = c |> Seq.map(fun (_, v) -> v) |> Seq.sum
if not(c = []) && sum + (snd v) > 50
then
yield c
c <- [v]
else
c <- List.append c [v]
}
I think I got it. Not the nicest code ever, but works and is immutable.
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last)
let newLast = [((fst last) # [name]), (snd last) + value]
newLast |> List.append withoutLast
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else [[name], value] |> List.append acc
values |> List.fold foldFn [] |> List.map fst
Update: Since append can be quite expensive operation, I added prepend only version (still fulfills original requirement to keep order).
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last) |> List.rev
let newLast = ((fst last) # [name]), (snd last) + value
(newLast :: withoutLast) |> List.rev
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else ([name], value) :: (List.rev acc) |> List.rev
Note: There is still # operator on line 4 (when creating new list of names in cluster), but since the theoretical maximum amount of names in cluster is 50 (if all of them would be equal 1), the performance here is negligible.
If you remove List.map fst on last line, you would get sum value for each cluster in list.
Append operations are expensive. A straight-forward fold with prepended intermediate results is cheaper, even if the lists need to be reversed after processing.
["ACE", 78; "AMR", 3; "Aam", 6; "Acc", 1; "Adj", 23; "Aga", 12; "All", 2; "Ame", 4; "Amd", 6; "Amo", 60]
|> List.fold (fun (r, s1, s2) (t1, t2) ->
if t2 > 50 then [t1]::s1::r, [], 0
elif s2 + t2 > 50 then s1::r, [t1], t2
else r, t1::s1, s2 + t2 ) ([], [], 0)
|> fun (r, s1, _) -> s1::r
|> List.filter (not << List.isEmpty)
|> List.map List.rev
|> List.rev
// val it : string list list =
// [["ACE"]; ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]; ["Ame"; "Amd"];
// ["Amo"]]
Here is a recursive version - working much the same way as fold-versions:
let groupBySums data =
let rec group cur sum acc lst =
match lst with
| [] -> acc |> List.where (not << List.isEmpty) |> List.rev
| (name, value)::tail when value > 50 -> group [] 0 ([(name, value)]::(cur |> List.rev)::acc) tail
| (name, value)::tail ->
match sum + value with
| x when x > 50 -> group [(name, value)] 0 ((cur |> List.rev)::acc) tail
| _ -> group ((name, value)::cur) (sum + value) acc tail
(data |> List.sortBy (fun (name, _) -> name)) |> group [] 0 []
values |> groupBySums |> List.iter (printfn "%A")
Yet another solution using Seq.mapFold and Seq.groupBy:
let group values =
values
|> Seq.mapFold (fun (group, total) (name, count) ->
let newTotal = count + total
let newGroup = group + if newTotal > 50 then 1 else 0
(newGroup, name), (newGroup, if newGroup = group then newTotal else count)
) (0, 0)
|> fst
|> Seq.groupBy fst
|> Seq.map (snd >> Seq.map snd >> Seq.toList)
Invoke it like this:
[ "ACE", 78
"AMR", 3
"Aam", 6
"Acc", 1
"Adj", 23
"Aga", 12
"All", 2
"Ame", 4
"Amo", 60
]
|> group
|> Seq.iter (printfn "%A")
// ["ACE"]
// ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]
// ["Ame"]
// ["Amo"]

F# Finding distinct elements in list of tuples

I have a list of tuples:
(string * (int * int)) list
let st = [("a1",(100,10)); ("a2",(50,20)); ("a3",(25,40))]
Where I want to make a function which returns a bool on a few conditions: Both the ints have to be more than 0 and the strings in the list have to be distinct.
So far I have:
let rec inv st = List.forall (fun (a,(n,p)) -> n>0 && p>0) st
But I'm having trouble figuring out how to find out if all the strings in the list are distinct. Any hints?
Use distinctBy:
let inv st =
List.length (List.distinctBy fst st) = List.length st && List.forall (fun (a,(n,p)) -> n>0 && p>0) st
or you can combine both checks in a single pipeline:
let inv st =
st
|> List.filter (fun (_,(n,p)) -> n>0 && p>0)
|> List.distinctBy fst
|> List.length
|> (=) (List.length st)
The shortest way would be to just compile the list of all distinct strings (via List.distinct) and see if it ended up the same size as the original list:
let allDistinctStirngs = st |> List.map fst |> List.distinct
let allStringsAreDistinct = List.length st = List.length allDistinctStrings

F# get set of subsets containing k elements

Given a set with n elements {1, 2, 3, ..., n}, I want to declare a function which returns the set containing the sets with k number of elements such as:
allSubsets 3 2
Would return [[1;2];[1;3];[2;3]] since those are the sets with 2 elements in a set created by 1 .. n
I've made the initial create-a-set-part but I'm a little stuck on how to find out all the subsets with k elements in it.
let allSubsets n k =
Set.ofList [1..n] |>
UPDATE:
I managed to get a working solution using yield:
let allSubsets n k =
let setN = Set.ofList [1..n]
let rec subsets s =
set [
if Set.count s = k then yield s
for e in s do
yield! subsets (Set.remove e s) ]
subsets setN
allSubsets 3 2
val it : Set<Set<int>> = set [set [1; 2]; set [1; 3]; set [2; 3]]
But isn't it possible to do it a little cleaner?
What you have is pretty clean, but it's also pretty inefficient. Try running allSubsets 10 3 and you'll know what I mean.
This is what I came up with:
let input = Set.ofList [ 1 .. 15 ]
let subsets (size:int) (input: Set<'a>) =
let rec inner elems =
match elems with
| [] -> [[]]
| h::t ->
List.fold (fun acc e ->
if List.length e < size then
(h::e)::e::acc
else e::acc) [] (inner t)
inner (Set.toList input)
|> Seq.choose (fun subset ->
if List.length subset = size then
Some <| Set.ofList subset
else None)
|> Set.ofSeq
subsets 3 input
The inner recursive function is a modified power set function from here. My first hunch was to generate the power set and then filter it, which would be pretty elegant, but that proved to be rather inefficient as well.
If this was to be production-quality code, I'd look into generating lists of indices of a given length, and use them to index into the input array. This is how FsCheck generates subsets, for example.
You can calculate the powerset and then filter in order to get only the ones with the specified length":
let powerset n k =
let lst = Set.toList n
seq [0..(lst.Length |> pown 2)-1]
|> Seq.map (fun i ->
set ([0..lst.Length-1] |> Seq.choose (fun x ->
if i &&& (pown 2 x) = 0 then None else Some lst.[x])))
|> Seq.filter (Seq.length >> (=) k)
However this is not efficient for large sets (n) of where k is close to n. But it's easy to optimize, you'll have to filter out early based on the digit count of the binary representation of each number.
This function implements the popular n-choose-k function:
let n_choose_k (arr: 'a []) (k: int) : 'a list list =
let len = Array.length arr
let rec choose lo x =
match x with
| 0 -> [[]]
| i -> [ for j in lo..(len-1) do
for ks in choose (j+1) (i-1) do
yield arr.[j]::ks ]
choose 0 k
> n_choose_k [|1..3|] 2;;
val it : int list list = [[1; 2]; [1; 3]; [2; 3]]
You can use Set.toArray and Set.ofList to convert to and from Set.
You can consider the following approach:
get powerset
let rec powerset xs =
match xs with
| [] -> [ [] ]
| h :: t -> List.fold (fun ys s -> (h :: s) :: s :: ys) [] (powerset t)
filter all subsets with a neccessary number of elements
let filtered xs k = List.filter (fun (x: 'a list) -> x.Length = k) xs
finally get the requested allSubsets
let allSubsets n k = Set.ofList (List.map (fun xs -> Set.ofList xs) (filtered (powerset [ 1 .. n ]) k))
Just to check and play with you can use:
printfn "%A" (allSubsets 3 2) // set [ set [1; 2]; set [1; 3]; set [2; 3] ]

Subtract two Maps of Map<'a, int>

I have the following type:
type Multiset<'a when 'a: comparison> = MSet of Map<'a, int>
I want to declare a function for this type that subtracts two MSets.
Let's say I have the following two Multisets:
let f = MSet (Map.ofList [("a",1);("b",2);("c",1)])
let g = MSet (Map.ofList [("a",1);("b",3);("c",1)])
I have now tried to create this subtract function which takes two Multisets.
let subtract fms sms =
match fms with
| MSet fs -> match sms with
| MSet ss ->
let toList ms = Map.fold (fun keys key value -> keys # [for i = 1 to value do yield key] ) [] ms
let fromList l = match l with
| [] -> MSet(Map.ofList [])
| x::xs -> MSet(Map.ofList (x::xs |> Seq.countBy id |> Seq.toList))
let sfList = toList fs
let ssList = toList ss
fromList (List.filter (fun n -> not (List.contains n sfList)) ssList)
If I run :
subtract f g
It returns :
MSet (map [])
Which is not what I wanted. g contains one more b than f, so I would want it to return:
MSet(map [("b", 1)])
My implementation doesn't account for multiple occurrences of the same key. I am not quite sure how I can fix this, so I get the wanted functionality?
I suspect you just have your arguments reversed, that's all. Try subtract g f.
That said, your solution seems way more complicated than it needs to be. How about just updating the values in the first map by subtracting the counts in the second, then removing non-positive counts?
let sub (MSet a) (MSet b) =
let bCount key = match Map.tryFind key b with | Some c -> c | None -> 0
let positiveCounts, _ =
a
|> Map.map (fun key value -> value - (bCount key))
|> Map.partition (fun _ value -> value > 0)
MSet positiveCounts
Also, the nested match in your implementation doesn't need to be there. If you wanted to match on both arguments, you can just do:
match fms, sms with
| MSet fs, MSet ss -> ...
But even that is an overkill - you can just include the pattern in parameter declarations, like in my implementation above.
As for duplicate keys - in this case, there is no reason to worry: neither of the arguments can have duplicate keys (because they're both Maps), and the algorithm will never produce any.
The underlying issue, also evident in your other question, seems to be the unification of identical keys. This requires an equality constraint and can be easily effected by the high-level function Seq.groupBy. Since comparison isn't strictly necessary, I propose using a dictionary, but the approach would work also with maps.
Given a type
type MultiSet<'T> = MultiSet of System.Collections.Generic.IDictionary<'T, int>
and a helper which maps the keys, sums their values and validates the result;
let internal mapSum f =
Seq.groupBy (fun (KeyValue(k, _)) -> f k)
>> Seq.map (fun (k, kvs) -> k, Seq.sumBy (fun (KeyValue(_, v)) -> v) kvs)
>> Seq.filter (fun (_, v) -> v > 0)
>> dict
>> MultiSet
your operations become:
let map f (MultiSet s) =
mapSum f s
let add (MultiSet fms) (MultiSet sms) =
Seq.append fms sms
|> mapSum id
let subtract (MultiSet fms) (MultiSet sms) =
Seq.map (fun (KeyValue(k, v)) ->
System.Collections.Generic.KeyValuePair(k, -v)) sms
|> Seq.append fms
|> mapSum id
let f = MultiSet(dict["a", 1; "b", 2; "c", 1])
let g = MultiSet(dict["a", 1; "b", 3; "c", 1])
subtract f g
// val it : MultiSet<string> = MultiSet (seq [])
subtract g f
// val it : MultiSet<string> = MultiSet (seq [[b, 1] {Key = "b";
// Value = 1;}])

How to split a sequence in F# based on another sequence in an idiomatic way

I have, in F#, 2 sequences, each containing distinct integers, strictly in ascending order: listMaxes and numbers.
If not Seq.isEmpty numbers, then it is guaranteed that not Seq.isEmpty listMaxes and Seq.last listMaxes >= Seq.last numbers.
I would like to implement in F# a function that returns a list of list of integers, whose List.length equals Seq.length listMaxes, containing the elements of numbers divided in lists, where the elements of listMaxes limit each group.
For example: called with the arguments
listMaxes = seq [ 25; 56; 65; 75; 88 ]
numbers = seq [ 10; 11; 13; 16; 20; 25; 31; 38; 46; 55; 65; 76; 88 ]
this function should return
[ [10; 11; 13; 16; 20; 25]; [31; 38; 46; 55]; [65]; List.empty; [76; 88] ]
I can implement this function, iterating over numbers only once:
let groupByListMaxes listMaxes numbers =
if Seq.isEmpty numbers then
List.replicate (Seq.length listMaxes) List.empty
else
List.ofSeq (seq {
use nbe = numbers.GetEnumerator ()
ignore (nbe.MoveNext ())
for lmax in listMaxes do
yield List.ofSeq (seq {
if nbe.Current <= lmax then
yield nbe.Current
while nbe.MoveNext () && nbe.Current <= lmax do
yield nbe.Current
})
})
But this code feels unclean, ugly, imperative, and very un-F#-y.
Is there any functional / F#-idiomatic way to achieve this?
Here's a version based on list interpretation, which is quite functional in style. You can use Seq.toList to convert between them, whenever you want to handle that. You could also use Seq.scan in conjunction with Seq.partition ((>=) max) if you want to use only library functions, but beware that it's very very easy to introduce a quadratic complexity in either computation or memory when doing that.
This is linear in both:
let splitAt value lst =
let rec loop l1 = function
| [] -> List.rev l1, []
| h :: t when h > value -> List.rev l1, (h :: t)
| h :: t -> loop (h :: l1) t
loop [] lst
let groupByListMaxes listMaxes numbers =
let rec loop acc lst = function
| [] -> List.rev acc
| h :: t ->
let out, lst' = splitAt h lst
loop (out :: acc) lst' t
loop [] numbers listMaxes
It can be done like this with pattern matching and tail recursion:
let groupByListMaxes listMaxes numbers =
let rec inner acc numbers =
function
| [] -> acc |> List.rev
| max::tail ->
let taken = numbers |> Seq.takeWhile ((>=) max) |> List.ofSeq
let n = taken |> List.length
inner (taken::acc) (numbers |> Seq.skip n) tail
inner [] numbers (listMaxes |> List.ofSeq)
Update: I also got inspired by fold and came up with the following solution that strictly refrains from converting the input sequences.
let groupByListMaxes maxes numbers =
let rec inner (acc, (cur, numbers)) max =
match numbers |> Seq.tryHead with
// Add n to the current list of n's less
// than the local max
| Some n when n <= max ->
let remaining = numbers |> Seq.tail
inner (acc, (n::cur, remaining)) max
// Complete the current list by adding it
// to the accumulated result and prepare
// the next list for fold.
| _ ->
(List.rev cur)::acc, ([], numbers)
maxes |> Seq.fold inner ([], ([], numbers)) |> fst |> List.rev
I have found a better implementation myself. Tips for improvements are still welcome.
Dealing with 2 sequences is really a pain. And I really do want to iterate over numbers only once without turning that sequence into a list. But then I realized that turning listMaxes (generally the shorter of the sequences) is less costly. That way only 1 sequence remains, and I can use Seq.fold over numbers.
What should be the state that we want to keep and change while iterating with Seq.fold over numbers? First, it should definitely include the remaining of the listMaxes, yet the previous maxes that we already have surpassed are no longer of interest. Second, the accumulated lists so far, although, like in the other answers, these can be kept in reverse order. More to the point: the state is a couple which has as second element a reversed list of reversed lists of the numbers so far.
let groupByListMaxes listMaxes numbers =
let rec folder state number =
match state with
| m :: maxes, _ when number > m ->
folder (maxes, List.empty :: snd state) number
| m :: maxes, [] ->
fst state, List.singleton (List.singleton number)
| m :: maxes, h :: t ->
fst state, (number :: h) :: t
| [], _ ->
failwith "Guaranteed not to happen"
let listMaxesList = List.ofSeq listMaxes
let initialState = listMaxesList, List.empty
let reversed = snd (Seq.fold folder initialState numbers)
let temp = List.rev (List.map List.rev reversed)
let extraLength = List.length listMaxesList - List.length temp
let extra = List.replicate extraLength List.empty
List.concat [temp; extra]
I know this is an old question but I had a very similar problem and I think this is a simple solution:
let groupByListMaxes cs xs =
List.scan (fun (_, xs) c -> List.partition (fun x -> x <= c) xs)
([], xs)
cs
|> List.skip 1
|> List.map fst

Resources