I have a set of data of arrays of arrays. As an example
[[1,3],
[4,3],
[1,2],
[7,2]]
I'd like to transform this to
[(3,[1,4])
(2,[1,7])]
that is: create an array of tuples, where the first member is from index 1 of the original and the array is all the values of index 0 from the original grouped based on index 1. I can solve this imperatively but would like to do it in a more FP kind of way
Use Seq.groupBy in combination with a few maps will get the desired result
[[1;3];
[4;3];
[1;2];
[7;2]]
|> Seq.groupBy (fun (a::b) -> b)
|> Seq.map (fun (a,b) -> a,b|> Seq.toList)
|> Seq.map (fun (a,b) -> a,b|>List.map (fun (c::d) -> c ))
F# is a statically typed functional programming language so the first thing you want to do is convert your input into a typeful representation such as a list of pairs of ints:
[ 1, 3
4, 3
1, 2
7, 2 ]
Then you can pipe it through the Seq.groupBy function using the snd function to key on the second element of each pair:
|> Seq.groupBy snd
This gives you [3, [1, 3; 4, 3]; ...] etc. so you want to map over the right hand sides extracting just the values (i.e. stripping out the keys) using the fst function:
|> Seq.map (fun (k, kvs) -> k, Seq.map fst kvs)
This gives your desired answer: [(3, [1; 4]); (2, [1; 7])].
Similar to #John's answer, but assume that inner collections are arrays with at least two elements:
[|[|1; 3|];
[|4; 3|];
[|1; 2|];
[|7; 2|]|]
|> Seq.map (fun arr -> arr.[0], arr.[1])
|> Seq.groupBy snd
|> Seq.map (fun (k, v) -> k, Seq.map fst v)
// val it : seq<int * seq<int>> = seq [(3, seq [1; 4]); (2, seq [1; 7])]
My answer is not essentially different to the answers above, but it uses a bit of combinatory logic, so it looks more idiomatic (to me). Also, it has some validity check.
Apply2 is essentially an S combinator.
let data =
[[1;3];
[4;3];
[1;2];
[7;2]]
// Apply2 operator applies two functions to x
// and returns both results as a tuple
let (.&.) f g x = f x, g x
// A naive validator for sequences
let assert' predicate message xs =
if not <| Seq.forall predicate xs then
failwith message
xs
let aggregate data =
data
// validate the input
|> assert' (List.length >> (=) 2) "All elements must be of length of two"
// essentially, convert a 2-element list to a tuple
|> Seq.map (List.head .&. (List.tail >> List.head))
// group over the second element of a tuple
|> Seq.groupBy snd
// we no longer need the key element in a tuple, so remove it
|> Seq.map (fst .&. (snd >> Seq.map fst))
aggregate data |> printf "%A"
Related
How do I identify the max length from a Map's value set?
let numbers = [1;2;2;3;3;3;4;5;5]
let map = numbers |> Seq.groupBy id
|> Map.ofSeq
I want to do this:
map.Values |> List.max
or...
let longestSequence = Map.map (fun (k, v) -> List.max(List.ofSeq(v)));
you can get something similar to Dictionary.Values with Map.toSeq >> Seq.map snd so you can get the largest collected sequence in your map like this:
> map |> Map.toSeq |> Seq.map snd |> Seq.maxBy Seq.length;;
val it : seq<int> = seq [3; 3; 3]
of course when your list is already in a sorted stage it seems strange to take the detour over Map as
> numbers |> Seq.groupBy id |> Seq.map snd |> Seq.maxBy Seq.length;;
val it : seq<int> = seq [3; 3; 3]
will do the same ;)
also if you think about the problem here can write a List.fold (with a additional map of the result) doing this as well which will only require to traverse the (sorted) list once ... maybe you can try to do this yourself ^^
Does groupBy guarantee that sort order is preserved in code like the following?
x
|> Seq.sortBy (fun (x, y) -> y)
|> Seq.groupBy (fun (x, y) -> x)
By preserving sort order, I mean can we guarantee that within each grouping by x, the result is still sorted by y.
This is true for simple examples,
[(1, 3);(2, 1);(1, 1);(2, 3)]
|> Seq.sortBy (fun (x, y) -> y)
|> Seq.groupBy (fun (x, y) -> x)
// seq [(2, seq [(2, 1); (2, 3)]); (1, seq [(1, 1); (1, 3)])]
I want to make sure there are no weird edge cases.
What do you mean by preserving sort order? Seq.groupBy changes the type of the sequence, so how can you even meaningfully compare before and after?
For a given xs of the type seq<'a * 'b>, the type of the expression xs |> Seq.sortBy snd is seq<'a * 'b>, whereas the type of the expression xs |> Seq.sortBy snd |> Seq.groupBy fst is seq<'a * seq<'a * 'b>>. Thus, whether or not the answer to the question is yes or no depends on what you mean by preserving the sort order.
As #Petr wrote in the comments, it's easy to test this. If you're worried about special cases, write a Property using FsCheck and see if it generalises:
open FsCheck.Xunit
open Swensen.Unquote
[<Property>]
let isSortOrderPreserved (xs : (int * int) list) =
let actual = xs |> Seq.sortBy snd |> Seq.groupBy fst
let expected = xs |> Seq.sortBy snd |> Seq.toList
expected =! (actual |> Seq.map snd |> Seq.concat |> Seq.toList)
In this property, I've interpreted the property of sort order preservation to mean that if you subsequently concatenate the grouped sequences, the sort order is preserved. Your definition may be different.
Given this particular definition, however, running the property clearly demonstrates that the property doesn't hold:
Falsifiable, after 6 tests (13 shrinks) (StdGen (1448745695,296088811)):
Original:
[(-3, -7); (4, -7); (4, 0); (-4, 0); (-4, 7); (3, 7); (3, -1); (-5, -1)]
Shrunk:
[(3, 1); (3, 0); (0, 0)]
---- Swensen.Unquote.AssertionFailedException : Test failed:
[(3, 0); (0, 0); (3, 1)] = [(3, 0); (3, 1); (0, 0)]
false
Here we see that if the input is [(3, 1); (3, 0); (0, 0)], the grouped sequence doesn't preserve the sort order (which isn't surprising to me).
Based on the updated question, here's a property that examines that question:
[<Property(MaxTest = 10000)>]
let isSortOrderPreservedWithEachGroup (xs : (int * int) list) =
let actual = xs |> Seq.sortBy snd |> Seq.groupBy fst
let expected =
actual
|> Seq.map (fun (k, vals) -> k, vals |> Seq.sort |> Seq.toList)
|> Seq.toList
expected =!
(actual |> Seq.map (fun (k, vals) -> k, Seq.toList vals) |> Seq.toList)
This property does, indeed, hold:
Ok, passed 10000 tests.
You should still consider carefully whether you want to rely on behaviour that isn't documented, since it could change in later incarnations of F#. Personally, I'd adopt a piece of advice from the Zen of Python:
Explicit is better than implicit.
BTW, the reason for all that conversion to F# lists is because lists have structural equality, while sequences don't.
The documentation doesn't say explicitly (except through the example), but the implementation does preserve the order of the original sequence. It would be quite surprising if it didn't: the equivalent functions in other languages that I am aware of do.
Who cares. Instead of sorting and then grouping, just group and then sort and the ordering is guaranteed even if the F# implementation of groupBy eventually changes:
x
|> Seq.groupBy (fun (x, y) -> x)
|> Seq.map (fun (k, v) -> k, v |> Seq.sortBy (fun (x, y) -> y))
I have an integer array in which I want to send every two elements from it to the constructor of another function.
Something like intArray |> Array.map (fun x, y -> new Point(x, y))
Is this possible? I'm new to F# and functional programming so I'm trying to avoid just looping through every 2 items in the array and adding the point to a list. I hope that's reasonable.
If using F# 4.0, use Gustavo's approach. For F# 3, you can do:
intArray
|> Seq.pairwise // get sequence of tuples of element (1,2); (2,3); (3,4); (4,5) etc
|> Seq.mapi (fun i xy -> i, xy) // combine the index with the tuple
|> Seq.filter (fun (i,_) -> i % 2 = 0) // Filter for only the even indices to get (1,2); (3,4)
|> Seq.map (fun xy -> Point xy) // make a point from the tuples
|> Array.ofSeq // convert back to array
You can use Array.chunkBySize:
intArray
|> Array.chunkBySize 2
|> Array.map (function
| [|x; y|] -> new Point (x, y)
| _ -> failwith "Array length is not even.")
An alternative solution to the existing answers, would be to write a custom function, that creates a list/an array of tuples using pattern matching:
let chunkify arr =
let rec chunkify acc lst =
if (List.length lst) > 1 then (* proceed if there are at least two elements *)
match lst with
(* save every constructed pair, until the input is not empty *)
| h1 :: h2 :: tail -> chunkify ([(h1, h2)] # acc) tail
| _ -> acc (* else return the list of pairs *)
else (* return the list of pairs *)
acc
chunkify List.empty (List.ofSeq arr) |> List.rev |> Array.ofSeq
The function can be then used like this:
// helper
let print = (fun (x:'a, y:'a) -> printfn "new Object(%A,%A)" x y)
// ints
[|1;2;3;4;5;6|]
|> chunkify
|> Array.iter print
// strings
[|"a";"b";"c";"d";"e"|]
|> chunkify
|> Array.map print
|> ignore
The output is:
new Object(1,2)
new Object(3,4)
new Object(5,6)
new Object("a","b")
new Object("c","d")
This solution/approach uses pattern matching with lists.
I have the following variable:
data:seq<(DateTime*float)>
and I want to do something like the following F# code but using Deedle:
data
|> Seq.groupBy (fun (k,v) -> k.Year)
|> Seq.map (fun (k,v) ->
let vals = v |> Seq.pairwise
let first = seq { yield v |> Seq.head }
let diffs = vals |> Seq.map (fun ((t0,v0),(t1,v1)) -> (t1, v1 - v0))
(k, diffs |> Seq.append first))
|> Seq.collect snd
This works fine using F# sequences but I want to do it using Deedle series. I know I can do something like:
(data:Series<DateTime*float>) |> Series.groupBy (fun k v -> k.Year)...
But then I need to take the within group year diffs except for the head value which should just be the value itself and then flatten the results into on series...I am bit confused with the deedle syntax
Thanks!
I think the following might be doing what you need:
ts
|> Series.groupInto
(fun k _ -> k.Month)
(fun m s ->
let first = series [ fst s.KeyRange => s.[fst s.KeyRange]]
Series.merge first (Series.diff 1 s))
|> Series.values
|> Series.mergeAll
The groupInto function lets you specify a function that should be called on each of the groups
For each group, we create series with the differences using Series.diff and append a series with the first value at the beginning using Series.merge.
At the end, we get all the nested series & flatten them using Series.mergeAll.
I have a map reduce code for which I group in each of the threads by some key and then in the reduce part merge the results. My current approach is to search for an specific key index in the accumulator and then mapi to retrieve the combined result only for this key, leaving the rest unmodified:
let rec groupFolder sequence acc =
match sequence with
| (by:string, what) :: rest ->
let index = acc |> Seq.tryFindIndex( fun (byInAcc, _) -> byInAcc.Equals(by) )
match index with
| Some (idx) ->
acc |> Seq.mapi( fun i (byInAcc, whatInAcc) -> if i = idx then (by, (what |> Array.append whatInAcc) ) else byInAcc, whatInAcc )
|> groupFolder rest
| None -> acc |> Seq.append( seq{ yield (by, what) } )
|> groupFolder rest
My question is, is it a more functional way to achieve just this?
As an example input to this reducer
let GroupsCommingFromMap = [| seq { yield! [|("key1", [|1;2;3|] ); ("key2", [|1;2;3|] ); ("key3", [|1;2;3|]) |] }, seq { yield! [|("key1", [|4;5;6|] ); ("key2", [|4;5;6|] ); ("key3", [|4;5;6|]) |] } |];;
GroupsCommingFromMap |> Seq.reduce( fun acc i ->
acc |> groupFolder (i |> Seq.toList))
the expected result should contain all key1..key3 each with the array 1..6
From the code you posted, it is not very clear what you're trying to do. Could you include some sample inputs (together with the output that you would like to get)? And does your code actually work on any of the inputs (it has incomplete pattern match, so I doubt that...)
Anyway, you can implement key-based map reduce using Seq.groupBy. For example:
let mapReduce mapper reducer input =
input
|> Seq.map mapper
|> Seq.groupBy fst
|> Seq.map (fun (k, vs) ->
k, vs |> Seq.map snd |> Seq.reduce reducer)
Here:
The mapper takes a value from the input sequence and turns it into key value pair. The mapReduce function then groups the values using the key
The reducer is then used to reduce all values associated with each key
This lets you create a word count function like this (using simple mapper that returns the word as the key with 1 as a value and reducer that just adds all the numbers):
"hello world hello people hello world".Split(' ')
|> mapReduce (fun w -> w, 1) (+)
EDIT: The example you mentioned does not really have "mapper" part, but instead it has array of arrays as an input - so perhaps it is easier to write this directly using Seq.groupBy like this:
let GroupsCommingFromMap =
[| [|("key1", [|1;2;3|] ); ("key2", [|1;2;3|] ); ("key3", [|1;2;3|]) |]
[|("key1", [|4;5;6|] ); ("key2", [|4;5;6|] ); ("key3", [|4;5;6|]) |] |]
GroupsCommingFromMap
|> Seq.concat
|> Seq.groupBy fst
|> Seq.map (fun (k, vs) -> k, vs |> Seq.map snd |> Array.concat)