Grouping Columns in Deedle

Grouping Columns in Deedle - f#

Given a frame:Frame<(string * int * int),int>
let df =
[ (("N1", 100,1), "C1", 1.0); (("N2",100,2), "C1", 3.1)
(("N3",100,3), "C1", 4.0); (("N4",100,4), "C1", -6.4);
(("N1", 200,5), "C2", 1.0); (("N2",200,6), "C2", 7.1)
(("N3",200,7), "C2", 4.0); (("N4",200,8), "C2", -2.4);
(("N1", 100,1), "C2", 1.0); (("N2",100,2), "C2", 5.1)
(("N3",100,3), "C2", 4.0); (("N4",100,4), "C2", -8.4);
(("N1", 200,5), "C1", 1.0); (("N2",200,6), "C1", 1.1)
(("N3",200,7), "C1", 4.0); (("N4",200,8), "C1", -9.4)]
|> Frame.ofValues
I'd like to be able to group the columns by the second item in the Row Key tuple - so by 100 and 200 and then change the frame to Frame<(string*int),(int*int)>
It seems like I have to use Frame.Transpose and then Frame.groupRowsUsing to group the columns but I'm at a loss of how to get the 100/200 in the selector function.
Output should look like:
(100,C1) (100,C2) (200,C1) (200,C2)
N1 1 -> 1 1 1 1
N2 2 -> 3.1 5.1 1.1 7.1
N3 3 -> 4 4 4 4
N4 4 -> -6.4 -8.4 -9.4 -2.4

It was not clear to me if the intent was to keep the column keys as they are and change the row values to tuples or if the columns were supposed to be tuples and the values to remain as floats.
Assuming the first option:
//adding a helper function to the module: transforms row data and replace a given columns
module Frame =
let mapiReplaceCol col f frame = frame |> Frame.replaceCol col (Frame.mapRows f frame)
(df,df.ColumnKeys)
||> Seq.fold (fun acc elem ->
acc |> Frame.mapiReplaceCol elem (fun (_,k,_) row -> k,row.GetAs<float>(elem)))
|> Frame.mapRowKeys (fun (a,_,c) -> a,c)
(*output:
C1 C2
N1 100 1 -> (100, 1) (100, 1)
N2 100 2 -> (100, 3.1) (100, 5.1)
N3 100 3 -> (100, 4) (100, 4)
N4 100 4 -> (100, -6.4) (100, -8.4)
N1 200 5 -> (200, 1) (200, 1)
N2 200 6 -> (200, 1.1) (200, 7.1)
N3 200 7 -> (200, 4) (200, 4)
N4 200 8 -> (200, -9.4) (200, -2.4)
*)
Assuming the second option:
Step 1: descontructing the Frame to (row * col * value) and reconstructing
let step1 =
df |> Frame.mapRows (fun (a,b,c) row ->
df.ColumnKeys |> Seq.map (fun col ->(a,c),(b,col),row.GetAs<float>(col)))
|> Series.values |> Seq.concat |> Frame.ofValues
(*
output:
100 200
C1 C2 C1 C2
N1 1 -> 1 1 <missing> <missing>
5 -> <missing> <missing> 1 1
N2 2 -> 3,1 5,1 <missing> <missing>
6 -> <missing> <missing> 1,1 7,1
N3 3 -> 4 4 <missing> <missing>
7 -> <missing> <missing> 4 4
N4 4 -> -6,4 -8,4 <missing> <missing>
8 -> <missing> <missing> -9,4 -2,4
*)
Step 2: reducing levels
let step2 = step1 |> Frame.reduceLevel fst (fun (a : float) b -> a + b)
(*
output:
100 200
C1 C2 C1 C2
N1 -> 1 1 1 1
N2 -> 3,1 5,1 1,1 7,1
N3 -> 4 4 4 4
N4 -> -6,4 -8,4 -9,4 -2,4
*)
Step 3 (optional): reacreating the tuples in the index
let step3 = step2 |> Frame.mapRowKeys (fun k -> k,k.Replace("N","") |> int)
(*
output:
100 200
C1 C2 C1 C2
N1 1 -> 1 1 1 1
N2 2 -> 3,1 5,1 1,1 7,1
N3 3 -> 4 4 4 4
N4 4 -> -6,4 -8,4 -9,4 -2,4
*)
Basically, we reconstructed the frame from scratch. Maybe a better approach would be to change how the original frame is being constructed instead of doing all this.

Related

Drop duplicates except for the first occurrence with Deedle

I have a table with one key with duplicate values. I would like to drop/reduce all duplicate keys but preserve the first row of each duplicate.
let data = "A;B\na;1\nb;\nb;2\nc;3"
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream = new MemoryStream( bytes )
let df=
Frame.ReadCsv(
stream = stream,
separators = ";",
hasHeaders = true
)
df.Print()
A B
0 -> a 1
1 -> b <missing>
2 -> b 2
3 -> c 3
The result should be
A B
0 -> a 1
1 -> b <missing>
2 -> c 3
I have tried applyLevel but I only get the value not the first entry:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.applyLevel fst (fun s -> s |> Series.firstValue)
df1.Print()
A B
a -> a 1
b -> b 2 <- wrong
c -> c 3

This is essentially a duplicate of a previous SO question. The short answer is:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.nest // convert to a series of frames
|> Series.mapValues (Frame.take 1) // take the first row from each frame
|> Frame.unnest // convert back to a single frame
|> Frame.mapRowKeys snd
df1.Print()
The output is:
A B
0 -> a 1
1 -> b <missing>
3 -> c 3
I've added a call to Frame.mapRowKeys at the end to match your desired output as closely as possible. Note that the actual output differs slightly from your expected output, because row 3 -> c 3 has original index 3 instead of 2. I think this is more correct, but you can renumber the rows if necessary.
The referenced question has more details.

Using Frame.nest/Frame.unnest is a reasonable solution. I have noticed, it is a little bit slow.
My solution involves putting the keys in a Map and checking:
let dropDuplicates (df:Frame<_,_>) =
let selectedMap =
df.RowKeys
|> Seq.fold (fun (m:Map<'A,'B>) (a,b) ->
if m.ContainsKey a then m else m |> Map.add a b) Map.empty
df
|> Frame.filterRows(fun (a,b) _ ->
match selectedMap.TryFind a with
| Some entry -> entry = b
| _ -> false)
let df1 =
df
|> Frame.groupRowsByString "A"
|> dropDuplicates
df1.Print()
A B
a 0 -> a 1
b 1 -> b <missing>
c 3 -> c 3

F# Please explain how the result is calculated (List.fold (+) (-))

The result is 4. But I do not understand why. Also to where is the 1 fed into? I thought it is being fed into ((-) 2) as the second parameter but its not. Please explain what is happening step by step.
1 |> List.fold (fun x y -> (+) (x y)) ((-) 2) [3;4]

Given is the expression in the question:
1 |> List.fold (fun x y -> (+) (x y)) ((-) 2) [3;4]
To make things a bit simpler, lets throw away the 1 |> part for the moment and focus on the fold.
List.fold a ??? [2;3;4;5] is turned into (a (a (a (a ??? 2) 3) 4) 5. Since we need something to start with, we'd need
to supply something for ???. This is the initial state. So for example List.fold a 1 [2;3;4;5] is turned
into (a (a (a (a 1 2) 3) 4) 5. Note that this doesn't have anything to do with the 1 we threw away earlier.
So to translate this to List.fold (fun x y -> (+) (x y)) ((-) 2) [3;4] its easier to replace the folder (first parameter)
as with something named. The same goes for the second parameter, the initial state. We'd end up with:
let applyAdd x y a = x y + a
let twoMinusN n = 2 - n
List.fold applyAdd twoMinusN [3;4]
If we expand the last expression like we did before, we'd end up with (applyAdd (applyAdd twoMinusN 3) 4).
Reducing it down further (I hope I brace this the right way):
(applyAdd (applyAdd twoMinusN 3) 4)
(applyAdd (applyAdd (fun n -> 2 - n) 3) 4)
(applyAdd ((fun x y a -> x y + a) (fun n -> 2 - n) 3) 4)
(applyAdd (fun a -> (fun n -> 2 - n) 3 + a) 4)
(applyAdd (fun a -> (-1) + a) 4)
((fun x y a -> x y + a) (fun a -> (-1) + a) 4)
(fun a -> (-1) + 4 + a)
(fun a -> 3 + a)
Isn't that strange? Say we started with List.fold (+) 1 [2;3;4;5] we'd end up with:
((+) ((+) ((+) ((+) 1 2) 3) 4) 5
(* or, if that's easier to read: *)
1 + 2 + 3 + 4 + 5
which is 15. All we're left with is a function now, and not a single value. That's where the 1 |> part comes into play:
1 |> (fun a -> 3 + a)
(fun a -> 3 + a) 1
3 + 1
4

Combine Observables

Let say I have
A: IObservable<int>
B: IObservable<int>
how can I combine these two into
C: IObservable<int>
which emitted value is a product of last observed values of A and B?
E.g.
A = [ 2 3 1 ]
B = [ 2 5 6 ]
then
C = [ 4 6 15 18 6 ]

I'm not terribly good at f# (more like a novice), but this seems to work:
let a = new Subject<int>()
let b = new Subject<int>()
let c = Observable.CombineLatest(a, b, Func<_,_,_>(fun x y -> x * y))
c.Subscribe(fun x -> printfn "%i" x) |> ignore
a.OnNext(2)
b.OnNext(2)
a.OnNext(3)
b.OnNext(5)
b.OnNext(6)
a.OnNext(1)
I get:
4
6
15
18
6

More Efficient Recursive Tetranacci function in F#

I am trying to write a tetranacci function using F# as efficiently as possible the first solution I came up with was really inefficient. can you help me come up with a better one? How would i be able to implement this in linear time?
let rec tetra n =
match n with
| 0 -> 0
| 1 -> 1
| 2 -> 1
| 3 -> 2
| _ -> tetra (n - 1) + tetra (n - 2) + tetra (n - 3) + tetra (n - 4)

You could economise by devising a function that computes the state for the next iteration on a 4-tuple. Then the sequence generator function Seq.unfold can be used to build a sequence that contains the first element of each state quadruple, an operation that is 'lazy` -- the elements of the sequence are only computed on demand as they are consumed.
let tetranacci (a3, a2, a1, a0) = a2, a1, a0, a3 + a2 + a1 + a0
(0, 1, 1, 2)
|> Seq.unfold (fun (a3, _, _, _ as a30) -> Some(a3, tetranacci a30))
|> Seq.take 10
|> Seq.toList
// val it : int list = [0; 1; 1; 2; 4; 8; 15; 29; 56; 108]
Note that the standard Tetranacci sequence (OEIS A000078) would usually be generated with the start state of (0, 0, 0, 1):
// val it : int list = [0; 0; 0; 1; 1; 2; 4; 8; 15; 29]

kaefer's answer is good, but why stop at linear time? It turns out that you can actually achieve logarithmic time instead, by noting that the recurrence can be expressed as a matrix multiplication:
[T_n+1] [0; 1; 0; 0][T_n]
[T_n+2] = [0; 0; 1; 0][T_n+1]
[T_n+3] [0; 0; 0; 1][T_n+2]
[T_n+4] [1; 1; 1; 1][T_n+3]
But then T_n can be achieved by applying the recurrence n times, which we can see as the first entry of M^n*[T_0; T_1; T_2; T_3] (which is just the upper right entry of M^n), and we can perform the matrix multiplication in O(log n) time by repeated squaring:
type Mat =
| Mat of bigint[][]
static member (*)(Mat arr1, Mat arr2) =
Array.init arr1.Length (fun i -> Array.init arr2.[0].Length (fun j -> Array.sum [| for k in 0 .. arr2.Length - 1 -> arr1.[i].[k]*arr2.[k].[j] |]))
|> Mat
static member Pow(m, n) =
match n with
| 0 ->
let (Mat arr) = m
Array.init arr.Length (fun i -> Array.init arr.Length (fun j -> if i = j then 1I else 0I))
|> Mat
| 1 -> m
| _ ->
let m2 = m ** (n/2)
if n % 2 = 0 then m2 * m2
else m2 * m2 * m
let tetr =
let m = Mat [| [|0I; 1I; 0I; 0I|]
[|0I; 0I; 1I; 0I|]
[|0I; 0I; 0I; 1I|]
[|1I; 1I; 1I; 1I|]|]
fun n ->
let (Mat m') = m ** n
m'.[0].[3]
for i in 0 .. 50 do
printfn "%A" (tetr i)

Here is a tail recursive version, which compiles to mostly loops (and its complexity should be O(n)):
let tetr n =
let rec t acc4 acc3 acc2 acc1 = function
| n when n = 0 -> acc4
| n when n = 1 -> acc3
| n when n = 2 -> acc2
| n when n = 3 -> acc1
| n -> t acc3 acc2 acc1 (acc1 + acc2 + acc3 + acc4) (n - 1)
t 0 1 1 2 n
acc1 corresponds to tetra (n - 1),
acc2 corresponds to tetra (n - 2),
acc3 corresponds to tetra (n - 3),
acc4 corresponds to tetra (n - 4)
Based on the Fibonacci example

Further optimizing Number to Roman Numeral function in F#

I'm new to F# and I'm curious if this can still be optimized further. I am not particularly sure if I've done this correctly as well. I'm curious particularly on the last line as it looks really long and hideous.
I've searched over google, but only Roman Numeral to Number solutions only show up, so I'm having a hard time comparing.
type RomanDigit = I | IV | V | IX
let rec romanNumeral number =
let values = [ 9; 5; 4; 1 ]
let capture number values =
values
|> Seq.find ( fun x -> number >= x )
let toRomanDigit x =
match x with
| 9 -> IX
| 5 -> V
| 4 -> IV
| 1 -> I
match number with
| 0 -> []
| int -> Seq.toList ( Seq.concat [ [ toRomanDigit ( capture number values ) ]; romanNumeral ( number - ( capture number values ) ) ] )
Thanks for anyone who can help with this problem.

A slightly shorter way of recursively finding the largest digit representation that can be subtracted from the value (using List.find):
let units =
[1000, "M"
900, "CM"
500, "D"
400, "CD"
100, "C"
90, "XC"
50, "L"
40, "XL"
10, "X"
9, "IX"
5, "V"
4, "IV"
1, "I"]
let rec toRomanNumeral = function
| 0 -> ""
| n ->
let x, s = units |> List.find (fun (x,s) -> x <= n)
s + toRomanNumeral (n-x)

If I had to use a Discriminated Union to represent the roman letters I would not include IV and IX.
type RomanDigit = I|V|X
let numberToRoman n =
let (r, diff) =
if n > 8 then [X], n - 10
elif n > 3 then [V], n - 5
else [], n
if diff < 0 then I::r
else r # (List.replicate diff I)
Then, based in this solution you can go further and extend it to all numbers.
Here's my first attempt, using fold and partial application:
type RomanDigit = I|V|X|L|C|D|M
let numberToRoman n i v x =
let (r, diff) =
if n > 8 then [x], n - 10
elif n > 3 then [v], n - 5
else [], n
if diff < 0 then i::r
else r # (List.replicate diff i)
let allDigits (n:int) =
let (_, f) =
[(I,V); (X,L); (C,D)]
|> List.fold (fun (n, f) (i, v) ->
(n / 10, fun x -> (numberToRoman (n % 10) i v x) # f i)) (n, (fun _ -> []))
f M

Here's a tail-recursive version of #Philip Trelford's answer:
let toRomanNumeral n =
let rec iter acc n =
match n with
| 0 -> acc
| n ->
let x, s = units |> List.find (fun (x, _) -> x <= n)
iter (acc + s) (n-x)
iter "" n

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Grouping Columns in Deedle - f#

Related

Drop duplicates except for the first occurrence with Deedle

F# Please explain how the result is calculated (List.fold (+) (-))

Combine Observables

More Efficient Recursive Tetranacci function in F#

Further optimizing Number to Roman Numeral function in F#

Categories

Resources