Drop duplicates except for the first occurrence with Deedle - f#

I have a table with one key with duplicate values. I would like to drop/reduce all duplicate keys but preserve the first row of each duplicate.
let data = "A;B\na;1\nb;\nb;2\nc;3"
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream = new MemoryStream( bytes )
let df=
Frame.ReadCsv(
stream = stream,
separators = ";",
hasHeaders = true
)
df.Print()
A B
0 -> a 1
1 -> b <missing>
2 -> b 2
3 -> c 3
The result should be
A B
0 -> a 1
1 -> b <missing>
2 -> c 3
I have tried applyLevel but I only get the value not the first entry:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.applyLevel fst (fun s -> s |> Series.firstValue)
df1.Print()
A B
a -> a 1
b -> b 2 <- wrong
c -> c 3

This is essentially a duplicate of a previous SO question. The short answer is:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.nest // convert to a series of frames
|> Series.mapValues (Frame.take 1) // take the first row from each frame
|> Frame.unnest // convert back to a single frame
|> Frame.mapRowKeys snd
df1.Print()
The output is:
A B
0 -> a 1
1 -> b <missing>
3 -> c 3
I've added a call to Frame.mapRowKeys at the end to match your desired output as closely as possible. Note that the actual output differs slightly from your expected output, because row 3 -> c 3 has original index 3 instead of 2. I think this is more correct, but you can renumber the rows if necessary.
The referenced question has more details.

Using Frame.nest/Frame.unnest is a reasonable solution. I have noticed, it is a little bit slow.
My solution involves putting the keys in a Map and checking:
let dropDuplicates (df:Frame<_,_>) =
let selectedMap =
df.RowKeys
|> Seq.fold (fun (m:Map<'A,'B>) (a,b) ->
if m.ContainsKey a then m else m |> Map.add a b) Map.empty
df
|> Frame.filterRows(fun (a,b) _ ->
match selectedMap.TryFind a with
| Some entry -> entry = b
| _ -> false)
let df1 =
df
|> Frame.groupRowsByString "A"
|> dropDuplicates
df1.Print()
A B
a 0 -> a 1
b 1 -> b <missing>
c 3 -> c 3

Related

F#, Deedle and OptionalValue: Object must implement IConvertible error

I'm facing trouble when I try to create missing values in a Frame and later perform operations with them. Here is a "working" sample:
open Deedle
open System.Text.RegularExpressions
do fsi.AddPrinter(fun (printer:Deedle.Internal.IFsiFormattable) -> "\n" + (printer.Format()))
module Frame = let mapAddCol col f frame = frame |> Frame.addCol col (Frame.mapRowValues f frame)
[ {|Desc = "A - 1.50ml"; ``Price ($)`` = 23.|}
{|Desc = "B - 2ml"; ``Price ($)`` = 18.5|}
{|Desc = "C"; ``Price ($)`` = 25.|} ]
|> Frame.ofRecords
(*
Desc Price ($)
0 -> A - 1.50ml 23
1 -> B - 2ml 18.5
2 -> C 25
*)
|> Frame.mapAddCol "Volume (ml)" (fun row ->
match Regex.Match(row.GetAs<string>("Desc"),"[\d\.]+").Value with
| "" -> OptionalValue.Missing
| n -> n |> float |> OptionalValue)
(*
Desc Price ($) Volume (ml)
0 -> A - 1.50ml 23 1.5
1 -> B - 2ml 18.5 2
2 -> C 25 <missing>
*)
|> fun df -> df?``Price ($/ml)`` <- df?``Price ($)`` / df?``Volume (ml)``
//error message: System.InvalidCastException: Object must implement IConvertible.
What is wrong with this approach?
Deedle internally stores a flag whether a value is present or missing. This is typically exposed via the OptionalValue type, but the internal representation is not actually using this type.
When you use a function such as mapRowValues to generate new data, Deedle needs to recognize which data is missing. This happens in only somewhat limited cases only. When you return OptionalValue<float>, Deedle actually produces a series where the type of values is OptionalValue<float> rather than float (the type system does not let it do anything else).
For float values, the solution is just to return nan as your missing value:
|> Frame.mapAddCol "Volume (ml)" (fun row ->
match Regex.Match(row.GetAs<string>("Desc"),"[\d\.]+").Value with
| "" -> nan
| n -> n |> float )
This will create a new series of float values, which you can then access using the ? operator.

Imperative to Functional

I have been doing a CodeWars exercise which can also be seen at dev.to.
The essence of it is:
There is a line for the self-checkout machines at the supermarket. Your challenge is to write a function that calculates the total amount of time required for the rest of the customers to check out!
INPUT
customers : an array of positive integers representing the line. Each integer represents a customer, and its value is the amount of time they require to check out.
n : a positive integer, the number of checkout tills.
RULES
There is only one line serving many machines, and
The order of the line never changes, and
The front person in the line (i.e. the first element in the array/list) proceeds to a machine as soon as it becomes free.
OUTPUT
The function should return an integer, the total time required.
The answer I came up with works - but it is highly imperative.
open System.Collections.Generic
open System.Linq
let getQueueTime (customerArray: int list) n =
let mutable d = new Dictionary<string,int>()
for i in 1..n do
d.Add(sprintf "Line%d" <| i, 0)
let getNextAvailableSupermarketLineName(d:Dictionary<string,int>) =
let mutable lowestValue = -1
let mutable lineName = ""
for myLineName in d.Keys do
let myValue = d.Item(myLineName)
if lowestValue = -1 || myValue <= lowestValue then
lowestValue <- myValue
lineName <- myLineName
lineName
for x in customerArray do
let lineName = getNextAvailableSupermarketLineName d
let lineTotal = d.Item(lineName)
d.Item(lineName) <- lineTotal + x
d.Values.Max()
So my question is ... is this OK F# code or should it be written in a functional way? And if the latter, how? (I started off trying to do it functionally but didn't get anywhere).
is this OK F# code or should it be written in a functional way?
That's a subjective question, so can't be answered. I'm assuming, however, that since you're doing an exercise, it's in order to learn. Learning functional programming takes years for most people (it did for me), but F# is a great language because it enables you learn gradually.
You can, however, simplify the algorithm. Think of a till as a number. The number represents the instant it's ready. At the beginning, you initialise them all to 0:
let tills = List.replicate n 0
where n is the number of tills. At the beginning, they're all ready at time 0. If, for example, n is 3, the tills are:
> List.replicate 3 0;;
val it : int list = [0; 0; 0]
Now you consider the next customer in the line. For each customer, you have to pick a till. You pick the one that is available first, i.e. with the lowest number. Then you need to 'update' the list of counters.
In order to do that, you'll need a function to 'update' a list at a particular index, which isn't part of the base library. You can define it yourself, however:
module List =
let set idx v = List.mapi (fun i x -> if i = idx then v else x)
For example, if you want to 'update' the second element to 3, you can do it like this:
> List.replicate 3 0 |> List.set 1 3;;
val it : int list = [0; 3; 0]
Now you can write a function that updates the set of tills given their current state and a customer (represented by a duration, which is also a number).
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
First, the next function finds the earliestTime in tills by using List.min. Then it finds the index of that value. Finally, it 'updates' that till by adding its current state to the customer duration.
Imagine that you have two tills and the customers [2;3;10]:
> List.replicate 2 0;;
val it : int list = [0; 0]
> List.replicate 2 0 |> fun tills -> next tills 2;;
val it : int list = [2; 0]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3;;
val it : int list = [2; 3]
> List.replicate 2 0 |> fun tills -> next tills 2 |> fun tills -> next tills 3
|> fun tills -> next tills 10;;
val it : int list = [12; 3]
You'll notice that you can keep calling the next function for all the customers in the line. That's called a fold. This gives you the final state of the tills. The final step is to return the value of the till with the highest value, because that represents the time it finished. The overall function, then, is:
let queueTime line n =
let next tills customer =
let earliestTime = List.min tills
let idx = List.findIndex (fun c -> earliestTime = c) tills
List.set idx (earliestTime + customer) tills
let tills = List.replicate n 0
let finalState = List.fold next tills line
List.max finalState
Here's some examples, taken from the original exercise:
> queueTime [5;3;4] 1;;
val it : int = 12
> queueTime [10;2;3;3] 2;;
val it : int = 10
> queueTime [2;3;10] 2;;
val it : int = 12
This solution is based entirely on immutable data, and all functions are pure, so that's a functional solution.
Here is a version that resembles your version, with all the mutability removed:
let getQueueTime (customerArray: int list) n =
let updateWith f key map =
let v = Map.find key map
map |> Map.add key (f v)
let initialLines = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> Map.ofList
let getNextAvailableSupermarketLineName(d:Map<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
let lines =
customerArray
|> List.fold (fun linesState x ->
let lineName = getNextAvailableSupermarketLineName linesState
linesState |> updateWith (fun l -> l + x) lineName) initialLines
lines |> Seq.map (fun l -> l.Value) |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
Those loops with mutable "outer state" can be swapped for either recursive functions or folds/reduce, here I suspect recursive functions would be nicer.
I've swapped out Dictionary for the immutable Map, but it feels like more trouble than it's worth here.
Update - here is a compromise solution I think reads well:
let getQueueTime (customerArray: int list) n =
let d = [1..n] |> List.map (fun i -> sprintf "Line%d" i, 0) |> dict
let getNextAvailableSupermarketLineName(d:IDictionary<string,int>) =
let lowestLine = d |> Seq.minBy (fun l -> l.Value)
lowestLine.Key
customerArray
|> List.iter (fun x ->
let lineName = getNextAvailableSupermarketLineName d
d.Item(lineName) <- d.Item(lineName) + 1)
d.Values |> Seq.max
getQueueTime [5;3;4] 1 |> printfn "%i"
I believe there is a more natural functional solution if you approach it freshly, but I wanted to evolve your current solution.
This is less an attempt at answering than an extended comment on Mark Seemann's otherwise excellent answer. If we do not restrict ourselves to standard library functions, the slightly cumbersome determination of the index with List.findIndex can be avoided. Instead, we may devise a function that replaces the first occurrence of a value in a list with a new value.
The implementation of our bespoke List.replace involves recursion, with an accumulator to hold the values before we encounter the first occurrence. When found, the accumulator needs to be reversed and also to have the new value and the tail of the original list appended. Both of this can be done in one operation: List.fold being fed the new value and tail of the original list as initial state while the elements of the accumulator are prepended in the loop, thereby restoring their order.
module List =
// Replace the first occurrence of a specific object in a list
let replace oldValue newValue source =
let rec aux acc = function
| [] -> List.rev acc
| x::xs when x = oldValue ->
(newValue::xs, acc)
||> List.fold (fun xs x -> x::xs)
| x::xs -> aux (x::acc) xs
aux [] source
let queueTime customers n =
(List.init n (fun _ -> 0), customers)
||> List.fold (fun xs customer ->
let x = List.min xs
List.replace x (x + customer) xs )
|> List.max
queueTime [5;3;4] 1 // val it : int = 12
queueTime [10;2;3;3] 2 // val it : int = 10
queueTime [2;3;10] 2 // val it : int = 12

Matrix transposition in F#

I'm trying to modify a matrix like this one:
/ 1 2 3 \
\ 4 5 6 /
to return:
/ 1 4 \
| 2 5 |
\ 3 6 /
Instead it is flipping my matrix by the corners. This is the code I have so far:
Let rec matrixadjust = function
| (_::_) : : as xss-> List.map List.head xss :: matrixadjust (List.map List.tail xss)
| _ ->[];;
I think that the best way to work with matrix is using the Array2D data structure. You can build an Array2D from an array of arrays and then create a new Array2D to acomplish what you want:
let arrayOfArrays = [| [| 1; 2; 3 |]; [|4; 5; 6 |] |]
let array2d = Array2D.init 2 3 (fun row column -> arrayOfArrays.[row].[column])
let newArray = Array2D.init (array2d |> Array2D.length2) (array2d |> Array2D.length1) (fun r c -> array2d.[c,r])
Assuming your data structure is a list of lists where each sub-list represents a row you could do it like this. Basically it loops once per source-list row and accumulates the result in the partial binding. Since its doing list accumulation, it reverses the order of the values so you have to do a List.rev on each row at the end.
let flip matrix =
match matrix with
| [] -> []
| x::xs ->
let rec loop matrix partial =
match matrix with
| [] -> partial
| y::ys ->let newPartial = (y, partial) ||> List.map2(fun x y->x::y)
loop ys newPartial
let length = List.length x
loop matrix (List.init length (fun _ -> [] ))
|> List.map(fun x->x |> List.rev)

get count of numbers in an infinite sequence when it reaches condition

i want to use the functional way to count this and i want to count them efficiently so i do not want to store the sequence, just go through it and count the numbers
let conjv2 x =
let next n = match n%2 with
|0 -> n/2
|_ -> n*3+1
Seq.initInfinite next
|> Seq.takeWhile(fun n -> n > 1)
|> Seq.length
this does not work and returns 0 for any positive number, it is the 3n+1 conjecture and i am finding it really hard to count them efficiently, this code works fine but i want to do it the functional way :
let conj x =
let mutable ansa = x
let mutable cycles = 1
while ansa > 1 do
cycles <- cycles+1
ansa <- match ansa%2 with
|0 -> ansa/2
|_ -> ansa*3+1
cycles
The key problem with the sample is that you're using Seq.initInfinite instead of Seq.unfold.
Seq.initInfinite calls the specified function with the index of the element as argument (0, 1, 2, ..)
Seq.unfold calls the specified function with the state generated by the previous iteration
Note that your code also does not use the argument x and so your function ends up being 'a -> int rather than int -> int which is what you'd expect - this is a good indication that there is something wrong!
To fix this, try something like this:
let conjv2 x =
let next n = match n%2 with
|0 -> n/2
|_ -> n*3+1
Seq.unfold (fun st -> let n = next st in Some(n, n)) x
|> Seq.takeWhile(fun n -> n > 1)
|> Seq.map (fun v -> printfn "%A" v; v)
|> Seq.length
The function passed to unfold needs to return an option with the new state & a value to emit. To generate infinite sequence, we always return Some and the emitted values are the intermediate states.
This returns values that are smaller by 2 than your original conj, because conj starts with 1 (rather than 0) and it also counts the last value (while here, we stop before ansa=1). So you'll need to add 2 to the result.

need help to read file with specific formatted contents

i'm using F#. I want to solve some problem that require me to read the input from a file, i don't know what to do. The first line in the file consist of three numbers, the first two numbers is the x and y for an map for the next line. The example file:
5 5 10
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
the meaning of 5 5 10 is the next line have 5x5 map and 10 is just some numbers that i need to solve the problem, the next until the end of the line is contents of the map that i have to solve using the 10 and i want to save this map numbers in 2 dimensional array. Someone can help me to write a code to save the all the numbers from the file so i can process it?
* Sorry my english is bad, hope my question can be understood :)
The answer for my own question :
Thanks for the answer from Daniel and Ankur. For my own purpose i mix code from both of you:
let readMap2 (path:string) =
let lines = File.ReadAllLines path
let [|x; y; n|] = lines.[0].Split() |> Array.map int
let data =
[|
for l in (lines |> Array.toSeq |> Seq.skip 1) do
yield l.Split() |> Array.map int
|]
x,y,n,data
Many Thanks :D
Here's some quick and dirty code. It returns a tuple of the last number in the header (10 in this case) and a two-dimensional array of the values.
open System.IO
let readMap (path:string) =
use reader = new StreamReader(path)
match reader.ReadLine() with
| null -> failwith "empty file"
| line ->
match line.Split() with
| [|_; _; _|] as hdr ->
let [|x; y; n|] = hdr |> Array.map int
let vals = Array2D.zeroCreate y x
for i in 0..(y-1) do
match reader.ReadLine() with
| null -> failwith "unexpected end of file"
| line ->
let arr = line.Split() |> Array.map int
if arr.Length <> x then failwith "wrong number of fields"
else for j in 0..(x-1) do vals.[i, j] <- arr.[j]
n, vals
| _ -> failwith "bad header"
In case the file is this much only (no further data to process) and always in correct format (no need to handle missing data etc) then it would be as simple as:
let readMap (path:string) =
let lines = File.ReadAllLines path
let [|_; _; n|] = lines.[0].Split() |> Array.map int
[|
for l in (lines |> Array.toSeq |> Seq.skip 1) do
yield l.Split() |> Array.map int
|]

Resources