F# divide sequence up in blocks [duplicate] - f#

I'm trying to learn F# by rewriting some C# algorithms I have into idiomatic F#.
One of the first functions I'm trying to rewrite is a batchesOf where:
[1..17] |> batchesOf 5
Which would split the sequence into batches with a max of five in each, i.e:
[[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]
My first attempt at doing this is kind of ugly where I've resorted to using a mutable ref object after running into errors trying to use mutable type inside the closure. Using ref is particularly unpleasant since to dereference it you have to use the ! operator which when inside a condition expression can be counter intuitive to some devs who will read it as logical not. Another problem I ran into is where Seq.skip and Seq.take are not like their Linq aliases in that they will throw an error if size exceeds the size of the sequence.
let batchesOf size (sequence: _ seq) : _ list seq =
seq {
let s = ref sequence
while not (!s |> Seq.isEmpty) do
yield !s |> Seq.truncate size |> List.ofSeq
s := System.Linq.Enumerable.Skip(!s, size)
}
Anyway what would be the most elegant/idiomatic way to rewrite this in F#? Keeping the original behaviour but preferably without the ref mutable variable.

Implementing this function using the seq<_> type idiomatically is difficult - the type is inherently mutable, so there is no simple nice functional way. Your version is quite inefficient, because it uses Skip repeatedly on the sequence. A better imperative option would be to use GetEnumerator and just iterate over elements using IEnumerator. You can find various imperative options in this snippet: http://fssnip.net/1o
If you're learning F#, then it is better to try writing the function using F# list type. This way, you can use idiomatic functional style. Then you can write batchesOf using pattern matching with recursion and accumulator argument like this:
let batchesOf size input =
// Inner function that does the actual work.
// 'input' is the remaining part of the list, 'num' is the number of elements
// in a current batch, which is stored in 'batch'. Finally, 'acc' is a list of
// batches (in a reverse order)
let rec loop input num batch acc =
match input with
| [] ->
// We've reached the end - add current batch to the list of all
// batches if it is not empty and return batch (in the right order)
if batch <> [] then (List.rev batch)::acc else acc
|> List.rev
| x::xs when num = size - 1 ->
// We've reached the end of the batch - add the last element
// and add batch to the list of batches.
loop xs 0 [] ((List.rev (x::batch))::acc)
| x::xs ->
// Take one element from the input and add it to the current batch
loop xs (num + 1) (x::batch) acc
loop input 0 [] []
As a footnote, the imperative version can be made a bit nicer using computation expression for working with IEnumerator, but that's not standard and it is quite advanced trick (for example, see http://fssnip.net/37).

A friend asked me this a while back. Here's a recycled answer. This works and is pure:
let batchesOf n =
Seq.mapi (fun i v -> i / n, v) >>
Seq.groupBy fst >>
Seq.map snd >>
Seq.map (Seq.map snd)
Or an impure version:
let batchesOf n =
let i = ref -1
Seq.groupBy (fun _ -> i := !i + 1; !i / n) >> Seq.map snd
These produce a seq<seq<'a>>. If you really must have an 'a list list as in your sample then just add ... |> Seq.map (List.ofSeq) |> List.ofSeq as in:
> [1..17] |> batchesOf 5 |> Seq.map (List.ofSeq) |> List.ofSeq;;
val it : int list list = [[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]
Hope that helps!

This can be done without recursion if you want
[0..20]
|> Seq.mapi (fun i elem -> (i/size),elem)
|> Seq.groupBy (fun (a,_) -> a)
|> Seq.map (fun (_,se) -> se |> Seq.map (snd));;
val it : seq<seq<int>> =
seq
[seq [0; 1; 2; 3; ...]; seq [5; 6; 7; 8; ...]; seq [10; 11; 12; 13; ...];
seq [15; 16; 17; 18; ...]; ...]
Depending on how you think this may be easier to understand. Tomas' solution is probably more idiomatic F# though

Hurray, we can use List.chunkBySize, Seq.chunkBySize and Array.chunkBySize in F# 4, as mentioned by Brad Collins and Scott Wlaschin.

This isn't perhaps idiomatic but it works:
let batchesOf n l =
let _, _, temp', res' = List.fold (fun (i, n, temp, res) hd ->
if i < n then
(i + 1, n, hd :: temp, res)
else
(1, i, [hd], (List.rev temp) :: res))
(0, n, [], []) l
(List.rev temp') :: res' |> List.rev

Here's a simple implementation for sequences:
let chunks size (items:seq<_>) =
use e = items.GetEnumerator()
let rec loop i acc =
seq {
if i = size then
yield (List.rev acc)
yield! loop 0 []
elif e.MoveNext() then
yield! loop (i+1) (e.Current::acc)
else
yield (List.rev acc)
}
if size = 0 then invalidArg "size" "must be greater than zero"
if Seq.isEmpty items then Seq.empty else loop 0 []
let s = Seq.init 10 id
chunks 3 s
//output: seq [[0; 1; 2]; [3; 4; 5]; [6; 7; 8]; [9]]

My method involves converting the list to an array and recursively chunking the array:
let batchesOf (sz:int) lt =
let arr = List.toArray lt
let rec bite curr =
if (curr + sz - 1 ) >= arr.Length then
[Array.toList arr.[ curr .. (arr.Length - 1)]]
else
let curr1 = curr + sz
(Array.toList (arr.[curr .. (curr + sz - 1)])) :: (bite curr1)
bite 0
batchesOf 5 [1 .. 17]
[[1; 2; 3; 4; 5]; [6; 7; 8; 9; 10]; [11; 12; 13; 14; 15]; [16; 17]]

I found this to be a quite terse solution:
let partition n (stream:seq<_>) = seq {
let enum = stream.GetEnumerator()
let rec collect n partition =
if n = 1 || not (enum.MoveNext()) then
partition
else
collect (n-1) (partition # [enum.Current])
while enum.MoveNext() do
yield collect n [enum.Current]
}
It works on a sequence and produces a sequence. The output sequence consists of lists of n elements from the input sequence.

You can solve your task with analog of Clojure partition library function below:
let partition n step coll =
let rec split ss =
seq {
yield(ss |> Seq.truncate n)
if Seq.length(ss |> Seq.truncate (step+1)) > step then
yield! split <| (ss |> Seq.skip step)
}
split coll
Being used as partition 5 5 it will provide you with sought batchesOf 5 functionality:
[1..17] |> partition 5 5;;
val it : seq<seq<int>> =
seq
[seq [1; 2; 3; 4; ...]; seq [6; 7; 8; 9; ...]; seq [11; 12; 13; 14; ...];
seq [16; 17]]
As a premium by playing with n and step you can use it for slicing overlapping batches aka sliding windows, and even apply to infinite sequences, like below:
Seq.initInfinite(fun x -> x) |> partition 4 1;;
val it : seq<seq<int>> =
seq
[seq [0; 1; 2; 3]; seq [1; 2; 3; 4]; seq [2; 3; 4; 5]; seq [3; 4; 5; 6];
...]
Consider it as a prototype only as it does many redundant evaluations on the source sequence and not likely fit for production purposes.

This version passes all my tests I could think of including ones for lazy evaluation and single sequence evaluation:
let batchIn batchLength sequence =
let padding = seq { for i in 1 .. batchLength -> None }
let wrapped = sequence |> Seq.map Some
Seq.concat [wrapped; padding]
|> Seq.windowed batchLength
|> Seq.mapi (fun i el -> (i, el))
|> Seq.filter (fun t -> fst t % batchLength = 0)
|> Seq.map snd
|> Seq.map (Seq.choose id)
|> Seq.filter (fun el -> not (Seq.isEmpty el))
I am still quite new to F# so if I'm missing anything - please do correct me, it will be greatly appreciated.

Related

Is there a way in f# to perform a cross operation on lists?

Is there a way in f# to perform an opreation on the all the possible element combinations of two lists in f#?
Example
l1 = [1;2;3]
l2=[4;5;6]
let plus x y = x+y
Then fun plus l1 l2 would perform [(1+4);(1+5);(1+6);(2+4);(2+5);(2+6);(3+4);(3+5);(3+6)]
Hence the output: [5;6;7;6;7;8;7;8;9]
Note: I have tried using zip but it only takes each element once.
Yep, easiest way is to use a list comprehension.
let t1 = [1;2;3]
let t2 = [4;5;6]
[for a in t1 do for b in t2 do yield a+b] //val it : int list = [5; 6; 7; 6; 7; 8; 7; 8; 9]
//as a function
let f lst1 lst2 = [for a in lst1 do for b in lst2 do yield a+b]
Another possibility is to combine a List.collect with a List.map:
let l1 = [1;2;3]
let l2 = [4;5;6]
l1 |> List.collect (fun x -> List.map ((+) x) l2) //output: [5; 6; 7; 6; 7; 8; 7; 8; 9]

Replicate list items n times in a F# sequence

I have a sequence in F#:
let n = 2
let seq1 = {
yield "a"
yield "b"
yield "c"
}
I want to print every item in the sequence n times. I can do it this way:
let printx line t =
for i = 1 to t do
printfn "%s" line
seq1 |> Seq.iter (fun i -> printx i n)
Output of this is:
a
a
b
b
c
c
I think this is not the best solution. How to replicate the items in the sequence?
You can create a function to replicate each element of an input sequence:
let replicateAll n s = s |> Seq.collect (fun e -> Seq.init n (fun _ -> e))
then
seq1 |> replicateAll 2 |> Seq.iter (printfn "%s")
I would rather go with a sequence computation expression.
Looks cleaner:
let replicateAll n xs = seq {
for x in xs do
for _ in 1..n do
yield x
}
There is actually a replicate function:
let xs = [1; 2; 3; 4; 5]
xs |> List.collect (fun x -> List.replicate 3 x)
//val it : int list = [1; 1; 1; 2; 2; 2; 3; 3; 3; 4; 4; 4; 5; 5; 5]
And you can do function composition on it, which will get rid of the lambda:
let repCol n xs = (List.replicate >> List.collect) n xs

Infinite sequence with repeating elements

I need to create an infinite sequence containing a subsequence of elements which repeats infinitely.
[1; 2; 3; 4; 1; 2; 3; 4; 1; 2; 3; 4; ...]
So I wrote this:
let l = [1; 2; 3; 4]
let s = seq { while true do yield! l }
Is there a standard way (function) to do this?
I think that your approach is good in this scenario. There is no built-in function to implement repetition, but if you need to repeat sequences often, you can define one yourself and make it available in the Seq module:
module Seq =
let repeat items =
seq { while true do yield! items }
Then you can nicely write Seq.repeat [ 1 .. 4 ], as if repeat was a standard F# library function, because F# IntelliSense shows both functions from your Seq module and from the Seq module as if they were defined in a single module.
Aside from your implementation, you can also use recursive sequence expression, which is another quite common pattern when generating sequences. Using while is in some ways imperative (although you don't need any state for simple repetitions) compared to functional recursion:
let rec repeat items =
seq { yield! items
yield! repeat items }
This approach is better when you want to keep some state while generating. For example, generating all numbers 1 .. using while would not be so nice, because you'd need mutable state. Using recursion, you can write the same thing as:
let rec numbersFrom n =
seq { yield n
yield! numbersFrom (n + 1) }
I don't think there's an idiom for this, and what you have is fine, but here are some alternatives.
If you change your subsequence to an array, you can do
let a = [|1; 2; 3; 4|]
let s = Seq.initInfinite (fun i -> a.[i % a.Length])
Using what you have, you could also do
let l = [1; 2; 3; 4]
let s = Seq.initInfinite (fun _ -> l) |> Seq.concat
but it's no shorter.
Similar to Daniel's answer, but encapsulating it into a function, and pretending that function is in the Seq module:
module Seq =
let infiniteOf repeatedList =
Seq.initInfinite (fun _ -> repeatedList)
|> Seq.concat
// Tests
let intList = [1; 2; 3; 4]
let charList = ['a'; 'b'; 'c'; 'd']
let objList = [(new System.Object()); (new System.Object()); (new System.Object()); (new System.Object())]
do
Seq.infiniteOf intList |> Seq.take 20 |> Seq.iter (fun item -> printfn "%A" item)
Seq.infiniteOf charList |> Seq.take 20 |> Seq.iter (fun item -> printfn "%A" item)
Seq.infiniteOf objList |> Seq.take 20 |> Seq.iter (fun item -> printfn "%A" item)
This will do it as a (more-or-less) one-liner, without having to create any helper objects.
let s = seq { while true do
for i in 1 .. 4 -> i }

How can I remove duplicates in an F# sequence without using references

I have a sorted sequence and want to go through it and return the unique entries in the sequence. I can do it using the following function, but it uses reference variables and I don't think it's the correct way of solving the problem.
let takeFirstCell sectors =
let currentRNCId = ref -1
let currentCellId = ref -1
seq {
for sector in sectors do
if sector.RNCId <> !currentRNCId || sector.CellId <> !currentCellId then
currentRNCId := sector.RNCId
currentCellId := sector.CellId
yield sector
}
How can I do this in a functional way?
[1;1;1;2;2;2;3;3;3]
|> Seq.distinctBy id
|> printfn "%A"
Seq.distinct (1::[1..5]) returns seq [1; 2; 3; 4; 5]. Is that what you meant?
distinct and distinctBy both use Dictionary and therefore require hashing and a bit of memory for storing unique items. If your sequence is already sorted, you can use the following approach (similar to yours). It's nearly twice as fast and has constant memory use, making it usable for sequences of any size.
let distinctWithoutHash (items:seq<_>) =
seq {
use e = items.GetEnumerator()
if e.MoveNext() then
let prev = ref e.Current
yield !prev
while e.MoveNext() do
if e.Current <> !prev then
yield e.Current
prev := e.Current
}
let items = Seq.init 1000000 (fun i -> i / 2)
let test f = items |> f |> (Seq.length >> printfn "%d")
test Seq.distinct //Real: 00:00:01.038, CPU: 00:00:01.435, GC gen0: 47, gen1: 1, gen2: 1
test distinctWithoutHash //Real: 00:00:00.622, CPU: 00:00:00.624, GC gen0: 44, gen1: 0, gen2: 0
I couldn't figure out a way to use mutables instead of refs (short of hand-coding an enumerator), which I'm sure would speed it up considerably (I tried it--it makes no difference).
Just initialize a unique collection (like a set) with the sequence like this:
set [1; 2; 3; 3; 4; 5; 5];;
=> val it : Set<int> = set [1; 2; 3; 4; 5]
In my case I could not use Seq.distinct because I needed to preserve order of list elements.
I used solution from http://ocaml.org/learn/tutorials/99problems.html.
I think it is quite short
let rec compress = function
| a :: (b :: _ as t) -> if a = b then compress t else a :: compress t
| smaller -> smaller
The solution below, preserves order of elements and returns only the first occurance of an element in a generic list. Of course this generates a new List with the redundant items removed.
// **** Returns a list having subsequent redundant elements removed
let removeDuplicates(lst : 'a list) =
let f item acc =
match acc with
| [] -> [item]
| _ ->
match List.exists(fun x -> x = item) acc with
| false -> item :: acc
| true -> acc
lst
|> List.rev
|> fun x -> List.foldBack f x []
|> List.rev
// **** END OF FUNCTION removeDuplicates
val removeDuplicates : 'a list -> 'a list when 'a : equality
val testList : int list = [1; 4; 3; 1; 2; 2; 1; 1; 3; 4; 3]
val tryAbove : int list = [1; 4; 3; 2]

F# split sequence into sub lists on every nth element

Say I have a sequence of 100 elements. Every 10th element I want a new list of the previous 10 elements. In this case I will end up with a list of 10 sublists.
Seq.take(10) looks promising, how can I repeatedly call it to return a list of lists?
now there's Seq.chunkBySize available:
[1;2;3;4;5] |> Seq.chunkBySize 2 = seq [[|1; 2|]; [|3; 4|]; [|5|]]
This is not bad:
let splitEach n s =
seq {
let r = ResizeArray<_>()
for x in s do
r.Add(x)
if r.Count = n then
yield r.ToArray()
r.Clear()
if r.Count <> 0 then
yield r.ToArray()
}
let s = splitEach 5 [1..17]
for a in s do
printfn "%A" a
(*
[|1; 2; 3; 4; 5|]
[|6; 7; 8; 9; 10|]
[|11; 12; 13; 14; 15|]
[|16; 17|]
*)
I have an evolution of three solutions. None of them preserves the ordering of the input elements, which is hopefully OK.
My first solution is quite ugly (making use of ref cells):
//[[4; 3; 2; 1; 0]; [9; 8; 7; 6; 5]; [14; 13; 12; 11; 10]; [17; 16; 15]]
let solution1 =
let split s n =
let i = ref 0
let lst = ref []
seq {
for item in s do
if !i = n then
yield !lst
lst := [item]
i := 1
else
lst := item::(!lst)
i := !i+1
yield !lst
} |> Seq.toList
split {0..17} 5
My second solution factors out the use of ref cells in the first solution, but consequently forces the use of direct IEnumerator access (push in one side, pop out the other)!
//[[17; 16; 15]; [14; 13; 12; 11; 10]; [9; 8; 7; 6; 5]; [4; 3; 2; 1; 0]]
let solution2 =
let split (s:seq<_>) n =
let e = s.GetEnumerator()
let rec each lstlst lst i =
if e.MoveNext() |> not then
lst::lstlst
elif i = n then
each (lst::lstlst) [e.Current] 1
else
each lstlst ((e.Current)::lst) (i+1)
each [] [] 0
split {0..17} 5
My third solution is based on the second solution except it "cheats" by taking a list as input instead of a seq, which enables the most elegant solution using pattern matching as Tomas points out is lacking with seq (which is why we were forced to use direct IEnumerator access).
//[[17; 16; 15]; [14; 13; 12; 11; 10]; [9; 8; 7; 6; 5]; [4; 3; 2; 1; 0]]
let solution3 =
let split inputList n =
let rec each inputList lstlst lst i =
match inputList with
| [] -> (lst::lstlst)
| cur::inputList ->
if i = n then
each inputList (lst::lstlst) [cur] 1
else
each inputList lstlst (cur::lst) (i+1)
each inputList [] [] 0
split [0..17] 5
If preserving the ordering of the elements is important, you can use List.rev for this purpose. For example, in solution2, change the last line of the split function to:
each [] [] 0 |> List.rev |> List.map List.rev
Out of the top of my head:
let rec split size list =
if List.length list < size then
[list]
else
(list |> Seq.take size |> Seq.toList) :: (list |> Seq.skip size |> Seq.toList |> split size)
Perhaps this simple pure implementation might be useful:
let splitAt n xs = (Seq.truncate n xs, if Seq.length xs < n then Seq.empty else Seq.skip n xs)
let rec chunk n xs =
if Seq.isEmpty xs then Seq.empty
else
let (ys,zs) = splitAt n xs
Seq.append (Seq.singleton ys) (chunk n zs)
For example:
> chunk 10 [1..100];;
val it : seq<seq<int>> =
seq
[seq [1; 2; 3; 4; ...]; seq [11; 12; 13; 14; ...];
seq [21; 22; 23; 24; ...]; seq [31; 32; 33; 34; ...]; ...]
> chunk 5 [1..12];;
val it : seq<seq<int>> =
seq [seq [1; 2; 3; 4; ...]; seq [6; 7; 8; 9; ...]; seq [11; 12]]
If in doubt, use fold.
let split n = let one, append, empty = Seq.singleton, Seq.append, Seq.empty
Seq.fold (fun (m, cur, acc) x ->
if m = n then (1, one x, append acc (one cur))
else (m+1, append cur (one x), acc))
(0, empty, empty)
>> fun (_, cur, acc) -> append acc (one cur)
This has the advantage of being fully functional, yet touch each element of the input sequence only once(*) (as opposed to the Seq.take + Seq.skip solutions proposed above).
(*) Assuming O(1) Seq.append. I should certainly hope so.
I found this to be easily the fastest:
let windowChunk n xs =
let range = [0 .. Seq.length xs]
Seq.windowed n xs |> Seq.zip range
|> Seq.filter (fun d -> (fst d) % n = 0)
|> Seq.map(fun x -> (snd x))
i.e. window the list, zip with a list of integers, remove all the overlapping elements, and then drop the integer portion of the tuple.
I think that the solution from Brian is probably the most reasonable simple option. A probelm with sequences is that they cannot be easily processed with the usual pattern matching (like functional lists). One option to avoid that would be to use LazyList from F# PowerPack.
Another option is to define a computation builder for working with IEnumerator type. I wrote something like that recently - you can get it here. Then you can write something like:
let splitEach chunkSize (s:seq<_>) =
Enumerator.toSeq (fun () ->
let en = s.GetEnumerator()
let rec loop n acc = iter {
let! item = en
match item with
| Some(item) when n = 1 ->
yield item::acc |> List.rev
yield! loop chunkSize []
| Some(item) ->
yield! loop (n - 1) (item::acc)
| None -> yield acc |> List.rev }
loop chunkSize [] )
This enables using some functional patterns for list processing - most notably, you can write this as a usual recursive function (similar to the one you would write for lists/lazy lists), but it is imperative under the cover (the let! constructo of iter takes the next element and modifies the enumerator).

Resources