F#: grouping by recurring sequences of elements - f#

I have a sequence of pairs (key, value) like
[("a", 1), ("a", 2), ("a", 111), ("b", 3), ("bb", 1), ("bb", -1), ...]
, what is the most effective way to convert it into sequence like
[("a", [1,2,111]), ("b", [3]), ("bb", [1,-1])]
or similar?
The sequence has following property: it's really big (>2Gb)
This makes Seq.groupBy really ineffective and incorrect, are there any other ways to do it?
P.S.: this sequence:
[("a", 1), ("a", 2), ("a", 111), ("bb", 1), ("bb", -1), ("a", 5), ("a", 6), ...]
should be converted as
[("a", [1,2,111]), ("bb", [1,-1]), ("a", [5,6]), ...]
--
edit #1: Fixed incorrect sample
edit #2: Sequence is big, so lazy (or fastest) solution is preferred

If you want the option to get lazy results, then I don't think there's an elegant way without maintaining mutable state. Here's a relatively straight-forward one with mutation. You maintain a store of the last key you saw, and all the values that correspond to that:
let s = [("a", 1); ("a", 2); ("a", 111); ("bb", 1); ("bb", -1); ("a", 5); ("a", 6)]
let s2 =
[
let mutable prevKey = None
let mutable values = System.Collections.Generic.List<_>()
let init key value =
prevKey <- Some key
values.Clear()
values.Add value
for (key, value) in s do
match prevKey with
| None -> init key value
| Some k when k = key -> values.Add value
| Some k ->
yield (k, List.ofSeq values)
init key value
match prevKey with
| Some k -> yield (k, List.ofSeq values)
| _ -> ()
]
This gives:
val s2 : (string * int list) list =
[("a", [1; 2; 111]); ("bb", [1; -1]); ("a", [5; 6])]
For lazy evaluation, replace the [ ... ] with seq { ... }

A simple recursive approach with no mutable state.
let rec chunk inseq (accumelem,accumlist) =
match inseq with
|(a,b)::c ->
match accumelem with
|Some(t) -> if t=a then chunk c (accumelem,b::accumlist) else (t,accumlist)::(chunk c (Some(a),b::[]))
|None -> chunk c (Some a,b::[])
|[] ->
match accumelem with
|Some(t) -> (t,accumlist)::[]
|None -> []
chunk [("a", 1); ("a", 2); ("a", 111); ("bb", 1); ("bb", -1); ("a", 5);("a", 6)] (None,[])
val it : (string * int list) list =
[("a", [111; 2; 1]); ("bb", [-1; 1]); ("a", [6; 5])]

Here is a recursive solution:
let test = [("a", 1); ("a", 2); ("a", 111); ("bb", 1); ("bb", -1); ("a", 5); ("a", 6)]
let groupByAdjacentElements alist =
let rec group a groupAcc prevElement adjacentAcc =
match a with
| [] -> match adjacentAcc with
| [] -> groupAcc
| _ -> (prevElement, List.rev adjacentAcc)::groupAcc
| (b, c)::tail -> if b = prevElement then
group tail groupAcc prevElement (c::adjacentAcc)
else
group tail ((prevElement, List.rev adjacentAcc)::groupAcc) b [c]
group alist [] (fst alist.Head) []
|> List.rev
let b = groupByAdjacentElements test
It returns: [("a", [1; 2; 111]); ("bb", [1; -1]); ("a", [5; 6])]
If you want lazy evaluation, you should consider trying LazyList
EDIT: Here's a script comparing LazyList from ExtCore to the accepted solution. It generates a large text file and then does the transformations asked for. Note that the LazyList is returned in reverse order:
open System.Diagnostics
open System.IO
open ExtCore
let fileName = "Test.txt"
let outFile = new StreamWriter(fileName)
for i in [1..20000*300] do
outFile.WriteLine("a,1")
outFile.WriteLine("a,2")
outFile.WriteLine("a,111")
outFile.WriteLine("bb,1")
outFile.WriteLine("bb,-1")
outFile.WriteLine("a,5")
outFile.WriteLine("a,6")
outFile.WriteLine("c,8")
outFile.Close()
printfn "Finished Writing to File"
let data = System.IO.File.ReadLines(fileName)
|> Seq.map (fun i -> let parts = i.Split(',')
(parts.[0], parts.[1]))
printfn "Finished Reading File"
let s2 data =
[
let mutable prevKey = None
let mutable values = System.Collections.Generic.List<_>()
let init key value =
prevKey <- Some key
values.Clear()
values.Add value
for (key, value) in data do
match prevKey with
| None -> init key value
| Some k when k = key -> values.Add value
| Some k ->
yield (k, List.ofSeq values)
init key value
match prevKey with
| Some key -> yield (key, List.ofSeq values)
| _ -> ()
]
let groupByAdjacentElements aseq =
let alist = LazyList.ofSeq aseq
let rec group alist groupAcc prevElement adjacentAcc =
match alist with
| Cons((b, c), tail) ->
if b = prevElement then
group tail groupAcc prevElement (c::adjacentAcc)
else
group tail (LazyList.consDelayed (prevElement, List.rev adjacentAcc) (fun () -> groupAcc)) b [c]
| Nil ->
match adjacentAcc with
| [] -> groupAcc
| _ -> LazyList.consDelayed (prevElement, List.rev adjacentAcc) (fun () -> groupAcc)
group alist LazyList.empty (fst (alist.Head())) []
let groupByAdjacentElementsList aseq =
let alist = aseq |> Seq.toList
let rec group a groupAcc prevElement adjacentAcc =
match a with
| [] -> match adjacentAcc with
| [] -> groupAcc
| _ -> (prevElement, List.rev adjacentAcc)::groupAcc
| (b, c)::tail -> if b = prevElement then
group tail groupAcc prevElement (c::adjacentAcc)
else
group tail ((prevElement, List.rev adjacentAcc)::groupAcc) b [c]
group alist [] (fst alist.Head) []
|> List.rev
[<EntryPoint>]
let main argv =
let stopwatch = new Stopwatch()
stopwatch.Start()
let b = s2 data
printfn "The result is: %A" b
stopwatch.Stop()
printfn "It took %A ms." stopwatch.ElapsedMilliseconds
System.GC.WaitForFullGCComplete() |> ignore
stopwatch.Reset()
stopwatch.Start()
let b = groupByAdjacentElements data
printfn "The result is: %A" b
stopwatch.Stop()
printfn "It took %A ms." stopwatch.ElapsedMilliseconds
System.GC.WaitForFullGCComplete() |> ignore
stopwatch.Reset()
stopwatch.Start()
let b = groupByAdjacentElementsList data
printfn "The result is: %A" b
stopwatch.Stop()
printfn "It took %A ms." stopwatch.ElapsedMilliseconds
0
I when using files of around 300MB in size, LazyList was slightly slower (83s to 94s) than the seq solution. That said LazyList has the major advantage that iterating over it is cached, unlike the sequence solution. The normal list solution was faster than both even when doing List.rev (without it was around 73s).

Grouping by adjacent keys can be also done without mutable bindings. With Seq.scan, it's possible to generate a lazy sequence with eager chunk. It already provides for one of the special cases, the first element of the sequence; by wrapping the input sequence as options followed by None we can take care of the other. Afterwards, we skip over intermediate results and strip out the state with Seq.choose.
For maximum versatility, I'd like to suggest a signature similar to Seq.groupBy,
f:('T -> 'Key) -> xs:seq<'T> -> seq<'Key * 'T list> when 'Key : equality
which takes a key projection function as first argument.
let chunkBy (f : 'T-> 'Key) xs =
// Determine key and wrap in option
seq{for x in xs -> Some(f x, x)
// Indicates end of sequence
yield None }
|> Seq.scan (fun (_, acc, previous) current ->
match previous, current with
| Some(pKey, _), Some(key, value) when pKey = key ->
// No intermediate result, but add to accumulator
None, value::acc, current
| _ ->
// New state is 3-tuple of previous key and completed chunk,
// accumulator from current element, and new previous element
Option.map (fun (k, _) -> k, List.rev acc) previous,
Option.map snd current |> Option.toList, current )
(None, [], None)
|> Seq.choose (fun (result, _, _) -> result)
This can be adopted to OP's requirements by providing also a result projection function.
let chunkBy2 (f : 'T-> 'Key) (g : 'T->'Result) =
chunkBy f >> Seq.map (fun (k, gs) -> k, List.map g gs)
// val chunkBy2 :
// f:('T -> 'Key) -> g:('T -> 'Result) -> (seq<'T> -> seq<'Key * 'Result list>)
// when 'Key : equality
["a", 1; "a", 2; "a", 111; "b", 3; "bb", 1; "bb", -1]
|> chunkBy2 fst snd
// val it : seq<string * int list> =
// seq [("a", [1; 2; 111]); ("b", [3]); ("bb", [1; -1])]
Seq.initInfinite (fun x ->
if (x / 2) % 2 = 0 then "a", x else "b", x)
|> chunkBy2 fst snd
|> Seq.skip 50000
// val it : seq<string * int list> =
// seq
// [("a", [100000; 100001]); ("b", [100002; 100003]); ("a", [100004; 100005]);
// ("b", [100006; 100007]); ...]

Related

F# convert array to array of tuples

Let's say I have an array
let arr = [|1;2;3;4;5;6|]
I would like to convert it to something like
[|(1,2);(3,4);(5,6)|]
I've seen Seq.window but this one is going to generate something like
[|(1,2);(2,3);(3,4);(4,5);(5,6)|]
which is not what I want
You can use Array.chunkBySize and then map each sub-array into tuples:
let input = [|1..10|]
Array.chunkBySize 2 list |> Array.map (fun xs -> (xs.[0], xs.[1]))
#Slugart's accepted answer is the best approach (IMO) assuming you know that the array has an even number of elements, but here's another approach that doesn't throw an exception if there does happen to be an odd number (it just omits the last trailing element):
let arr = [|1;2;3;4;5|]
seq { for i in 0 .. 2 .. arr.Length - 2 -> (arr.[i], arr.[i+1]) } |> Seq.toArray
You could use Seq.pairwise, as long as you filter out every other tuple. The filtering needs to pass a state through the iteration, which is usually effected by the scan function.
[|1..10|]
|> Seq.pairwise
|> Seq.scan (fun s t ->
match s with None -> Some t | _ -> None )
None
|> Seq.choose id
|> Seq.toArray
// val it : (int * int) [] = [|(1, 2); (3, 4); (5, 6); (7, 8); (9, 10)|]
But then it's also possible to have scan generate the tuples directly, on penalty of an intermediate array.
[|1..10|]
|> Array.scan (function
| Some x, _ -> fun y -> None, Some(x, y)
| _ -> fun x -> Some x, None )
(None, None)
|> Array.choose snd
Use Seq.pairwise to turn a sequence into tuples
[|1;2;3;4;5;6|]
|> Seq.pairwise
|> Seq.toArray
val it : (int * int) [] = [|(1, 2); (2, 3); (3, 4); (4, 5); (5, 6)|]
Should be:
let rec slice =
function
| [] -> []
| a::b::rest -> (a,b) :: slice (rest)
| _::[] -> failwith "cannot slice uneven list"

Slice/Group a sequence of equal chars in F#

I need to extract the sequence of equal chars in a text.
For example:
The string "aaaBbbcccccccDaBBBzcc11211" should be converted to a list of strings like
["aaa";"B";"bb";"ccccccc";"D";"a";"BBB";"z";"cc";"11";"2";"11"].
That's my solution until now:
let groupSequences (text:string) =
let toString chars =
System.String(chars |> Array.ofList)
let rec groupSequencesRecursive acc chars = seq {
match (acc, chars) with
| [], c :: rest ->
yield! groupSequencesRecursive [c] rest
| _, c :: rest when acc.[0] <> c ->
yield (toString acc)
yield! groupSequencesRecursive [c] rest
| _, c :: rest when acc.[0] = c ->
yield! groupSequencesRecursive (c :: acc) rest
| _, [] ->
yield (toString acc)
| _ ->
yield ""
}
text
|> List.ofSeq
|> groupSequencesRecursive []
groupSequences "aaaBbbcccccccDaBBBzcc11211"
|> Seq.iter (fun x -> printfn "%s" x)
|> ignore
I'm a F# newbie.
This solution can be better?
Here a completely generic implementation:
let group xs =
let folder x = function
| [] -> [[x]]
| (h::t)::ta when h = x -> (x::h::t)::ta
| acc -> [x]::acc
Seq.foldBack folder xs []
This function has the type seq<'a> -> 'a list list when 'a : equality, so works not only on strings, but on any (finite) sequence of elements, as long as the element type supports equality comparison.
Used with the input string in the OP, the return value isn't quite in the expected shape:
> group "aaaBbbcccccccDaBBBzcc11211";;
val it : char list list =
[['a'; 'a'; 'a']; ['B']; ['b'; 'b']; ['c'; 'c'; 'c'; 'c'; 'c'; 'c'; 'c'];
['D']; ['a']; ['B'; 'B'; 'B']; ['z']; ['c'; 'c']; ['1'; '1']; ['2'];
['1'; '1']]
Instead of a string list, the return value is a char list list. You can easily convert it to a list of strings using a map:
> group "aaaBbbcccccccDaBBBzcc11211" |> List.map (List.toArray >> System.String);;
val it : System.String list =
["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
This takes advantage of the String constructor overload that takes a char[] as input.
As initially stated, this implementation is generic, so can also be used with other types of lists; e.g. integers:
> group [1;1;2;2;2;3;4;4;3;3;3;0];;
val it : int list list = [[1; 1]; [2; 2; 2]; [3]; [4; 4]; [3; 3; 3]; [0]]
How about with groupby
"aaaBbbcccccccD"
|> Seq.groupBy id
|> Seq.map (snd >> Seq.toArray)
|> Seq.map (fun t -> new string (t))
If you input order matters, here is a method that works
"aaaBbbcccccccDaBBBzcc11211"
|> Seq.pairwise
|> Seq.toArray
|> Array.rev
|> Array.fold (fun (accum::tail) (ca,cb) -> if ca=cb then System.String.Concat(accum,string ca)::tail else string(ca)::accum::tail) (""::[])
This one is also based on recursion though the matching gets away with smaller number of checks.
let chop (txt:string) =
let rec chopInner txtArr (word: char[]) (res: List<string>) =
match txtArr with
| h::t when word.[0] = h -> chopInner t (Array.append word [|h|]) res
| h::t when word.[0] <> h ->
let newWord = word |> (fun s -> System.String s)
chopInner t [|h|] (List.append res [newWord])
| [] ->
let newWord = word |> (fun s -> System.String s)
(List.append res [newWord])
let lst = txt.ToCharArray() |> Array.toList
chopInner lst.Tail [|lst.Head|] []
And the result is as expected:
val text : string = "aaaBbbcccccccDaBBBzcc11211"
> chop text;;
val it : string list =
["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
When you're folding, you'll need to carry along both the previous value and the accumulator holding the temporary results. The previous value is wrapped as option to account for the first iteration. Afterwards, the final result is extracted and reversed.
"aaaBbbcccccccDaBBBzcc11211"
|> Seq.map string
|> Seq.fold (fun state ca ->
Some ca,
match state with
| Some cb, x::xs when ca = cb -> x + ca::xs
| _, xss -> ca::xss )
(None, [])
|> snd
|> List.rev
// val it : string list =
// ["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
Just interesting why everyone publishing solutions based on match-with? Why not go plain recursion?
let rec groups i (s:string) =
let rec next j = if j = s.Length || s.[i] <> s.[j] then j else next(j+1)
if i = s.Length then []
else let j = next i in s.Substring(i, j - i) :: (groups j s)
"aaaBbbcccccccDaBBBzcc11211" |> groups 0
val it : string list = ["aaa"; "B"; "bb"; "ccccccc"; "D"; "a"; "BBB"; "z"; "cc"; "11"; "2"; "11"]
As someone other here:
Know thy fold ;-)
let someString = "aaaBbbcccccccDaBBBzcc11211"
let addLists state elem =
let (p, ls) = state
elem,
match p = elem, ls with
| _, [] -> [ elem.ToString() ]
| true, h :: t -> (elem.ToString() + h) :: t
| false, h :: t -> elem.ToString() :: ls
someString
|> Seq.fold addLists ((char)0, [])
|> snd
|> List.rev

F# Array Of String: concatenating values

I have an array like this:
let items = ["A";"B";"C";"D"]
I want to transform it into an array like this:
let result = ["AB";"AC";"AD";"BC";"BD";"CD"]
I can't find anything in the language spec that does this - though I might be searching incorrectly. I thought of Seq.Fold like this:
let result = items |> Seq.fold(fun acc x -> acc+x) ""
but I am getting "ABCD"
Does anyone know how to do this? Will a modified CartesianProduct work?
Thanks in advance
What you have there are lists, not arrays -- lists use the [...] syntax, arrays use the [|...|] syntax.
That said, here's a simple implementation:
let listProduct (items : string list) =
items
|> List.collect (fun x ->
items
|> List.choose (fun y ->
if x < y then Some (x + y)
else None))
If you put it into F# interactive:
> let items = ["A"; "B"; "C"; "D"];;
val items : string list = ["A"; "B"; "C"; "D"]
> items |> listProduct |> Seq.toList;;
val it : string list = ["AB"; "AC"; "AD"; "BC"; "BD"; "CD"]
Something like this should do it:
items
|> List.map (fun x -> items |> List.map (fun y -> (x, y)))
|> List.concat
|> List.filter (fun (x, y) -> x < y)
|> List.map (fun (x, y) -> x + y)
|> List.sort
I don't know if it's efficient for large lists, but it does produce this output:
["AB"; "AC"; "AD"; "BC"; "BD"; "CD"]
Breakdown
The first step produces a list of list of tuples, by mapping items twice:
[[("A", "A"); ("A", "B"); ("A", "C"); ("A", "D")];
[("B", "A"); ("B", "B"); ("B", "C"); ("B", "D")];
[("C", "A"); ("C", "B"); ("C", "C"); ("C", "D")];
[("D", "A"); ("D", "B"); ("D", "C"); ("D", "D")]]
Second, List.concat turns the list of list into a single list:
[("A", "A"); ("A", "B"); ("A", "C"); ("A", "D"); ("B", "A"); ("B", "B");
("B", "C"); ("B", "D"); ("C", "A"); ("C", "B"); ("C", "C"); ("C", "D");
("D", "A"); ("D", "B"); ("D", "C"); ("D", "D")]
Third, List.filter removes the tuples where the first element is equal to or larger than the second element:
[("A", "B"); ("A", "C"); ("A", "D"); ("B", "C"); ("B", "D"); ("C", "D")]
Fourth, List.map produces a list of concatenated strings:
["AB"; "AC"; "AD"; "BC"; "BD"; "CD"]
Finally, List.sort sorts the list, although in this case it's not necessary, as the list already has the correct order.
You might also consider using Seq.distinct to remove duplicates, if there are any.
You could create a function to create a list of all head/tail pairs in a list:
let rec dec = function
| [] -> []
| (x::xs) -> (x, xs) :: dec xs
or a tail-recursive version:
let dec l =
let rec aux acc = function
| [] -> acc
| (x::xs) -> aux ((x, xs)::acc) xs
aux [] l |> List.rev
you can then use this function to create your list:
let strs (l: string list) = l |> dec |> List.collect (fun (h, t) -> List.map ((+)h) t)
I'd do it like this:
let rec loop = function
[] -> []
| x :: xs -> List.map ((^) x) xs # loop xs
This has the advantage of not building every pair of elements from the list only to discard half. (I'll leave getting rid of the append as an exercise :-)
For me, it is a bit easier to tell what's going on here compared some of the other proposed solutions. For this kind of problem, where to process an element x you need also access to the rest of the list xs, standard combinators won't always make solutions clearer.
let items = ["A";"B";"C";"D"]
let rec produce (l: string list) =
match l with
// if current list is empty or contains one element - return empty list
| [] | [_] -> []
// if current list is not empty - match x to head and xs to tail
| x::xs ->
[
// (1)
// iterate over the tail, return string concatenation of head and every item in tail
for c in xs -> x + c
// apply produce to tail, concat return values
yield! produce xs
]
1st iteration: l = [A, B, C, D] - is not empty, in second match case we'll have x = A, xs = [B, C, D]. 'for' part of the list expression will yield [AB, AC, AD] and result of applying produce to xs.
2nd iteration:l = [B, C, D] is not empty so second match case we'll have x = B, xs = [C, D]. 'for' part of the list expression will yield [BC, BD] and result of applying produce to xs.
3rd iteration:l = [C, D] is not empty in second match case we'll have x = C, xs = [D]. 'for' part of the list expression will yield [CD] and result of applying produce to xs.
4th iteration:l = [D] contains one element -> return empty list.
Final result will be concatenation of [AB, AC, AD] ++ [BC, BD] ++ [CD]
This is an apt motivating example for implementing a List monad in F#. Using F# computation expressions, we get:
type ListMonadBuilder() =
member b.Bind(xs, f) = List.collect f xs
member b.Delay(f) = fun () -> f()
member b.Let(x, f) = f x
member b.Return(x) = [x]
member b.Zero() = []
let listM = new ListMonadBuilder()
Now, to solve the original problem we simply use our List monad.
let run = listM {
let! x = ['A' .. 'D']
let! y = List.tail [ x .. 'D']
return string x + string y
}
run();; in F# Interactive will return the desired result.
For another example of using the List monad, we can get the Pythagorean triples <= n.
let pythagoreanTriples n = listM {
let! c = [1 .. n]
let! b = [1 .. c]
let! a = [1 .. b]
if a*a + b*b = c*c then return (a, b, c)
}
Running pythagoreanTriples 10 ();; in F# interactive returns:
val it : (int * int * int) list = [(3, 4, 5); (6, 8, 10)]

write records just in time when the value (for id of something) is changing

How to write records just in time when the value for id of something is changing ? id for each record when ture->false and false->true for some list?
for example table
id value
1 0
2 0
2 0
2 0
1 0
2 1 --> the only changes here
2 1
1 0
2 0 --> and here (node with id 2 changed 1 -> 0 )
1 1 --> node with id 1 changed 0 -> 1
result table
2 1
2 0
1 1
my idea is not functional and a bit weird, I'm thinking about functional or linq way of making the same.
let oop = ref (filteredsq
|> Seq.distinctBy(fun (node,v,k) -> k)
|> Seq.map(fun (node,v,k) -> k, false )
|> Array.ofSeq )
[for (node,value,key) in filteredsq do
let i = ref 0
for (k,v) in !oop do
if key = k && value <> v then
(!oop).[!i] <- (k,value)
yield node
i := !i + 1 ]
Thank you
I think if you define a function like the following:
let getChanges f items =
items
|> Seq.map (fun x -> f x, x)
|> Seq.pairwise
|> Seq.choose (fun ((a, _), (b, x)) -> if a <> b then Some x else None)
Then you can do:
filteredsq
|> Seq.groupBy (fun (_, _, k) -> k)
|> Seq.collect (fun (_, items) ->
items
|> getChanges (fun (_, value, _) -> value)
|> Seq.map (fun (node, _, _) -> node))
|> Seq.toList
I'm not sure if I fully understand your question, but the following gives the right output according to your sample. The idea is to first filter out values that don't have the right key and then use Seq.pairwaise (as in jpalmer's solution) to find the places where the value changes:
let input = [ (1, 0); (2, 0); (2, 0); (2, 0); (1, 0); (2, 1); (2, 1); (1, 0); (2, 0) ]
let findValueChanges key input =
input
|> Seq.filter (fun (k, v) -> k = key) // Get values with the right key
|> Seq.pairwise // Make tuples with previous & next value
|> Seq.filter (fun ((_, prev), (_, next)) -> prev <> next) // Find changing points
|> Seq.map snd // Return the new key-value pair (after change)
If you wanted to find changes for all different keys, then you could use Seq.groupBy to find all possible keys (then you wouldn't need the first line in findValueChanges):
input
|> Seq.groupBy fst
|> Seq.map (fun (k, vals) -> findValueChanges k vals)
(For your input, there are no changes in values for the key 1, because the value is always 1, 0)
I would do something like
List
|> List.toSeq
|> Seq.pairwise
|> Seq.pick (fun ((fa,fb),(sa,sb)) -> if fb <> sb then Some(sa,sb) else None)
I'd just use an internal mutable dictionary to keep track of the last-seen values for each key and yield (key,value) when any value is different from the last value at that key:
let filterChanges (s:('a*'b) seq) =
let dict = new System.Collections.Generic.Dictionary<'a,'b>()
seq {
for (key,value) in s do
match dict.TryGetValue(key) with
| false,_ -> dict.[key] <- value
| true,lastValue ->
if lastValue <> value then
yield (key,value)
dict.[key] <- value
}
Test:
> filterChanges [(1,0);(2,0);(2,0);(2,0);(1,0);(2,1);(2,1);(1,0);(2,0);(1,1)];;
val it : seq<int * int> = seq [(2, 1); (2, 0); (1, 1)]
Updated
open System.Collections.Generic
let filter (acc:'a) (f:('a -> 'b -> bool * 'a)) (s:'b seq) =
let rec iter (acc:'a) (e:IEnumerator<'b>) =
match e.MoveNext() with
| false -> Seq.empty
| true -> match f acc e.Current with
| (true,newAcc) -> seq { yield e.Current; yield! iter newAcc e}
| (false,newAcc) -> seq { yield! iter newAcc e}
iter acc (s.GetEnumerator())
let skipUntilChange (f : 'a -> 'b) (s : 'a seq) =
s |> Seq.skip 1
|> filter (s |> Seq.head |> f)
(fun a b -> if a = f b then false,f b else true,f b)
[(1,0);(2,0);(2,0);(2,0);(1,0);(2,1);(2,1);(1,0);(2,0);]
|> Seq.mapi (fun c (i,v) -> (i,v,c))
|> Seq.groupBy (fun (i,v,c) -> i)
|> Seq.map (snd >> skipUntilChange (fun (_,v,_) -> v))
|> Seq.concat |> Seq.sortBy (fun (i,v,c) -> c)
|> Seq.map (fun (i,v,c) -> (i,v))
|> printfn "%A"

How to remove imperative code from a function?

I'm new to functional world and appreciate help on this one.
I want to SUPERCEDE ugly imperative code from this simple function, but don't know how to do it.
What I want is to randomly pick some element from IEnumerable (seq in F#) with a respect to probability value - second item in tuple (so item with "probability" 0.7 will be picked more often than with 0.1).
/// seq<string * float>
let probabilitySeq = seq [ ("a", 0.7); ("b", 0.6); ("c", 0.5); ("d", 0.1) ]
/// seq<'a * float> -> 'a
let randomPick probSeq =
let sum = Seq.fold (fun s dir -> s + snd dir) 0.0 probSeq
let random = (new Random()).NextDouble() * sum
// vvvvvv UGLY vvvvvv
let mutable count = random
let mutable ret = fst (Seq.hd probSeq )
let mutable found = false
for item in probSeq do
count <- count - snd item
if (not found && (count < 0.0)) then
ret <- fst item //return ret; //in C#
found <- true
// ^^^^^^ UGLY ^^^^^^
ret
////////// at FSI: //////////
> randomPick probabilitySeq;;
val it : string = "a"
> randomPick probabilitySeq;;
val it : string = "c"
> randomPick probabilitySeq;;
val it : string = "a"
> randomPick probabilitySeq;;
val it : string = "b"
I think that randomPick is pretty straightforward to implement imperatively, but functionally?
This is functional, but take list not seq (wanted).
//('a * float) list -> 'a
let randomPick probList =
let sum = Seq.fold (fun s dir -> s + snd dir) 0.0 probList
let random = (new Random()).NextDouble() * sum
let rec pick_aux p list =
match p, list with
| gt, h::t when gt >= snd h -> pick_aux (p - snd h) t
| lt, h::t when lt < snd h -> fst h
| _, _ -> failwith "Some error"
pick_aux random probList
An F# solution using the principle suggested by Matajon:
let randomPick probList =
let ps = Seq.skip 1 (Seq.scan (+) 0.0 (Seq.map snd probList))
let random = (new Random()).NextDouble() * (Seq.fold (fun acc e -> e) 0.0 ps)
Seq.find (fun (p, e) -> p >= random)
(Seq.zip ps (Seq.map fst probList))
|> snd
But I would probably also use a list-based approach in this case since the sum of the probability values needs to be precalculated anyhow...
I will provide only Haskell version since I don't have F# present on my notebook, it should be similar. The principle is to convert your sequence to sequence like
[(0.7,"a"),(1.3,"b"),(1.8,"c"),(1.9,"d")]
where each first element in the tuple is representing not probablity but something like range. Then it is easy, pick one random number from 0 to last number (1.9) and check in which range it belongs to. For example if 0.5 is chosen, it will be "a" because 0.5 is lower than 0.7.
Haskell code -
probabilitySeq = [("a", 0.7), ("b", 0.6), ("c", 0.5), ("d", 0.1)]
modifySeq :: [(String, Double)] -> [(Double, String)]
modifySeq seq = modifyFunction 0 seq where
modifyFunction (_) [] = []
modifyFunction (acc) ((a, b):xs) = (acc + b, a) : modifyFunction (acc + b) xs
pickOne :: [(Double, String)] -> IO String
pickOne seq = let max = (fst . last) seq in
do
random <- randomRIO (0, max)
return $ snd $ head $ dropWhile (\(a, b) -> a < random) seq
result :: [(String, Double)] -> IO String
result = pickOne . modifySeq
Example -
*Main> result probabilitySeq
"b"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"d"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"b"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"c"
*Main> result probabilitySeq
"a"
*Main> result probabilitySeq
"c"
The way I understand it, you're logic works like this:
Sum all the weights, then select a random double somewhere between 0 and the sum of all the weights. Find the item which corresponds to your probability.
In other words, you want to map your list as follows:
Item Val Offset Max (Val + Offset)
---- --- ------ ------------------
a 0.7 0.0 0.7
b 0.6 0.7 1.3
c 0.5 1.3 1.8
d 0.1 1.8 1.9
Transforming a list of (item, probability) to (item, max) is straightforward:
let probabilityMapped prob =
[
let offset = ref 0.0
for (item, probability) in prob do
yield (item, probability + !offset)
offset := !offset + probability
]
Although this falls back on mutables, its pure, deterministic, and in the spirit of readable code. If you insist on avoiding mutable state, you can use this (not tail-recursive):
let probabilityMapped prob =
let rec loop offset = function
| [] -> []
| (item, prob)::xs -> (item, prob + offset)::loop (prob + offset) xs
loop 0.0 prob
Although we're threading state through the list, we're performing a map, not a fold operation, so we shouldn't use the Seq.fold or Seq.scan methods. I started writing code using Seq.scan, and it looked hacky and strange.
Whatever method you choose, once you get your list mapped, its very easy to select a randomly weighted item in linear time:
let rnd = new System.Random()
let randomPick probSeq =
let probMap =
[
let offset = ref 0.0
for (item, probability) in probSeq do
yield (item, probability + !offset)
offset := !offset + probability
]
let max = Seq.maxBy snd probMap |> snd
let rndNumber = rnd.NextDouble() * max
Seq.pick (fun (item, prob) -> if rndNumber <= prob then Some(item) else None) probMap
I would use Seq.to_list to transform the input sequence into a list and then use the list based approach. The list quoted is short enough that it shouldn't be an unreasonable overhead.
The simplest solution is to use ref to store state between calls to iterator for any suitable function from Seq module:
let probabilitySeq = seq [ ("a", 0.7); ("b", 0.6); ("c", 0.5); ("d", 0.1) ]
let randomPick probSeq =
let sum = Seq.fold (fun s (_,v) -> s + v) 0.0 probSeq
let random = ref (System.Random().NextDouble() * sum)
let aux = function
| _,v when !random >= v ->
random := !random - v
None
| s,_ -> Some s
match Seq.first aux probSeq with
| Some r -> r
| _ -> fst (Seq.hd probSeq)
I would use your functional, list-based version, but adapt it to use LazyList from the F# PowerPack. Using LazyList.of_seq will give you the moral equivalent of a list, but without evaluating the whole thing at once. You can even pattern match on LazyLists with the LazyList.(|Cons|Nil|) pattern.
I think that cfern's suggestion is actually simplest (?= best) solution to this.
Entire input needs to be evaluated, so seq's advantage of yield-on-demand is lost anyway. Easiest seems to take sequence as input and convert it to a list and total sum at the same time. Then use the list for the list-based portion of the algorithm (list will be in reverse order, but that doesn't matter for the calculation).
let randomPick moveList =
let sum, L = moveList
|> Seq.fold (fun (sum, L) dir -> sum + snd dir, dir::L) (0.0, [])
let rec pick_aux p list =
match p, list with
| gt, h::t when gt >= snd h -> pick_aux (p - snd h) t
| lt, h::t when lt < snd h -> fst h
| _, _ -> failwith "Some error"
pick_aux (rand.NextDouble() * sum) L
Thanks for Yours solutions, especially Juliet and Johan (I've to read it few times to actually get it).
:-)

Resources