Deedle - what is most efficient (fastest) way to replace an item in a column based on value of another item in another column on the same row - f#

I have this data frame
AutoStat_1 AutoStat_2 Mode_1 Mode_2 Setpoint_1 Setpoint_2
0 -> 0 0 1 1 23 24
1 -> 0 1 1 0 23 27
2 -> 1 1 3 0 26 27
3 -> 1 0 3 1 26 24
4 -> 0 0 1 2 24 24
5 -> 0 0 1 2 24 24
6 -> 2 3 0 4 24 26
7 -> 2 3 0 4 25 26
The requirement is that if AutoStat_i is not 0 then Mode_i and Setpoint_i will be the value of the above (in-front) of which AutoStat_i is 0
The result should be (notice the column Setpoint_i and Mode_i are different than above)
AutoStat_1 AutoStat_2 Mode_1 Mode_2 Setpoint_1 Setpoint_2
0 -> 0 0 1 1 23 24
1 -> 0 1 1 1 23 24
2 -> 1 1 1 1 23 24
3 -> 1 0 1 1 23 24
4 -> 0 0 1 2 24 24
5 -> 0 0 1 2 24 24
6 -> 2 3 1 2 24 24
7 -> 2 3 1 2 24 24
What've I tried:
My idea is for each set i of (AutoStat_i, Mode_i, Setpoint_i), scan each row if AutoStat_i is <> 0 then set the other values to NaN, after that I will just do the fillMissing with Direction.Forward. Below is the impementation
let calculateNonSFi (df:Frame<_,string>) idx =
let autoStatusName = sprintf "AutoStat_%d" idx
let setpointName = sprintf "Setpoint_%d" idx
let modeName = sprintf "Mode_%d" idx
let setMissingOnMode (s:ObjectSeries<string>) =
let s2 = s.As<float>()
if s2.[autoStatusName] <> 0. then
Series.replaceArray [|setpointName;modeName|] Double.NaN s2
else
s2
df.Rows
|> Series.mapValues setMissingOnMode
|> Frame.ofRows
|> Frame.fillMissing Direction.Forward
|> Frame.fillMissing Direction.Backward
// for each set i do the folding
[0..150]
|> List.fold calculateNonSFi df
It gave me the expected results, however, for 150sets of 8000rows, it took more than 30 minutes to complete. I kinda see where it's wrong as for every set it acts on the whole dataset but I cannot think of a better way.
The logic is quite simple. I believe there should be a better way, do advice, thanks.
UPDATE
Here is the code for reproduction
open Deedle
open System
let df =
[
{| AutoStat_1=0;Setpoint_1=23;Mode_1=1;AutoStat_2=0;Setpoint_2=24;Mode_2=1|}
{| AutoStat_1=0;Setpoint_1=23;Mode_1=1;AutoStat_2=1;Setpoint_2=24;Mode_2=1|}
{| AutoStat_1=1;Setpoint_1=23;Mode_1=1;AutoStat_2=1;Setpoint_2=24;Mode_2=1|}
{| AutoStat_1=1;Setpoint_1=23;Mode_1=1;AutoStat_2=0;Setpoint_2=24;Mode_2=1|}
{| AutoStat_1=0;Setpoint_1=24;Mode_1=1;AutoStat_2=0;Setpoint_2=24;Mode_2=2|}
{| AutoStat_1=0;Setpoint_1=24;Mode_1=1;AutoStat_2=0;Setpoint_2=24;Mode_2=2|}
{| AutoStat_1=2;Setpoint_1=24;Mode_1=1;AutoStat_2=3;Setpoint_2=24;Mode_2=2|}
{| AutoStat_1=2;Setpoint_1=24;Mode_1=1;AutoStat_2=3;Setpoint_2=24;Mode_2=2|}
] |> Frame.ofRecords
df.Print()
let calculateNonSFi (df:Frame<_,string>) idx =
let autoStatusName = sprintf "AutoStat_%d" idx
let setpointName = sprintf "Setpoint_%d" idx
let modeName = sprintf "Mode_%d" idx
let setMissingOnMode (s:ObjectSeries<string>) =
let s2 = s.As<float>()
if s2.[autoStatusName] <> 0. then
Series.replaceArray [|setpointName;modeName|] Double.NaN s2
else
s2
df.Rows
|> Series.mapValues setMissingOnMode
|> Frame.ofRows
|> Frame.fillMissing Direction.Forward
let df1 =
[1..2]
|> List.fold calculateNonSFi df
df1.Print()
Advice/Answer from Tomas
df
|> Frame.mapRows (fun _ o ->
[ for i in 0 .. 150 do
let au = o.GetAs<float>("AutoStat_" + string i)
yield "AutoStat_" + string i, au
yield "Mode_" + string i, if au <> 0. then nan else o.GetAs("Mode_" + string i)
yield "Setpoint_" + string i, if au <> 0. then nan else o.GetAs("Setpoint_" + string i) ]
|> series )
|> Frame.ofRows
|> Frame.fillMissing Direction.Forward
which yields correct result but in different column order hence my mistake in the earlier edit
AutoStat_1 Mode_1 Setpoint_1 AutoStat_2 Mode_2 Setpoint_2
0 -> 0 1 23 0 1 24
1 -> 0 1 23 1 1 24
2 -> 1 1 23 1 1 24
3 -> 1 1 23 0 1 24
4 -> 0 1 24 0 2 24
5 -> 0 1 24 0 2 24
6 -> 2 1 24 3 2 24
7 -> 2 1 24 3 2 24

First of all, I think your strategy of setting Mode_i and Setpoint_i to NA when AutoStat_i is not 0 and then filling the missing values is a nice approach.
You can certainly make it a bit faster by moving the fillMissing call outside of the calculateNonSFi function - the fillMissing operation will run on the whole frame, so you need to run this once at the end.
The second thing would be to find a way of setting the NA values that only iterates over the frame once. One option (I have not tested this) would be to use Frame.mapRows and, inside the function, iterate over all the columns (rather than iterating over all the columns and calling mapRows repeatedly). Something like:
df
|> Frame.mapRows (fun _ o ->
[ for i in 0 .. 150 do
let au = o.GetAs<float>("AutoStat_" + string i)
yield "AutoStat_" + string i, au
yield "Mode_" + string i, if au = 0. then nan else o.GetAs("Mode_" + string i)
yield "Setpoint_" + string i, if au = 0. then nan else o.GetAs("Setpoint_" + string i) ]
|> series )
|> Frame.ofRows

Related

Combine Observables

Let say I have
A: IObservable<int>
B: IObservable<int>
how can I combine these two into
C: IObservable<int>
which emitted value is a product of last observed values of A and B?
E.g.
A = [ 2 3 1 ]
B = [ 2 5 6 ]
then
C = [ 4 6 15 18 6 ]
I'm not terribly good at f# (more like a novice), but this seems to work:
let a = new Subject<int>()
let b = new Subject<int>()
let c = Observable.CombineLatest(a, b, Func<_,_,_>(fun x y -> x * y))
c.Subscribe(fun x -> printfn "%i" x) |> ignore
a.OnNext(2)
b.OnNext(2)
a.OnNext(3)
b.OnNext(5)
b.OnNext(6)
a.OnNext(1)
I get:
4
6
15
18
6

A straightforward functional way to rename columns of a Deedle data frame

Is there a concise functional way to rename columns of a Deedle data frame f?
f.RenameColumns(...) is usable, but mutates the data frame it is applied to, so it's a bit of a pain to make the renaming operation idempotent. I have something like f.RenameColumns (fun c -> ( if c.IndexOf( "_" ) < 0 then c else c.Substring( 0, c.IndexOf( "_" ) ) ) + "_renamed"), which is ugly.
What would be nice is something that creates a new frame from the input frame, like this: Frame( f |> Frame.cols |> Series.keys |> Seq.map someRenamingFunction, f |> Frame.cols |> Series.values ) but this gets tripped up by the second part -- the type of f |> Frame.cols |> Series.values is not what is required by the Frame constructor.
How can I concisely transform f |> Frame.cols |> Series.values so that its result is edible by the Frame constructor?
You can determine its function when used with RenameColumns:
df.RenameColumns someRenamingFunction
You can also use the function Frame.mapColKeys.
Builds a new data frame whose columns are the results of applying the
specified function on the columns of the input data frame. The
function is called with the column key and object series that
represents the column data.
Source
Example:
type Record = {Name:string; ID:int ; Amount:int}
let data =
[|
{Name = "Joe"; ID = 51; Amount = 50};
{Name = "Tomas"; ID = 52; Amount = 100};
{Name = "Eve"; ID = 65; Amount = 20};
|]
let df = Frame.ofRecords data
let someRenamingFunction s =
sprintf "%s(%i)" s s.Length
df.Format() |> printfn "%s"
let ndf = df |> Frame.mapColKeys someRenamingFunction
ndf.Format() |> printfn "%s"
df.RenameColumns someRenamingFunction
df.Format() |> printfn "%s"
Print:
Name ID Amount
0 -> Joe 51 50
1 -> Tomas 52 100
2 -> Eve 65 20
Name(4) ID(2) Amount(6)
0 -> Joe 51 50
1 -> Tomas 52 100
2 -> Eve 65 20
Name(4) ID(2) Amount(6)
0 -> Joe 51 50
1 -> Tomas 52 100
2 -> Eve 65 20

deedle aggregate/group based on running numbers in a column of Frame

say I have a Frame, which looks like below,
" Name ID Amount
0 -> Joe 51 50
1 -> Tomas 52 100
2 -> Eve 65 20
3 -> Suzanne 67 10
4 -> Suassss 69 10
5 -> Suzanne 70 10
6 -> Suzanne 78 1
7 -> Suzanne 79 10
8 -> Suzanne 80 12
9 -> Suzanne 85 10
10 -> Suzanne 87 10
...
What I would like to achieve is to group or aggregate base on the ID column such that if a sequence of running number is encountered, those rows should be grouped together, otherwise, the row itself is a group.
I belive a recursive function is your friend here.
Feed a list of tuples
let data = [(Joe, 51, 50);
(Tomas, 52, 100);
(Eve, 65, 20);
(Suzanne, 67, 10)]
to a function
let groupBySequencialId list =
let rec group result acc data lastId =
match data with
| [] -> acc :: result
| (name, id, amount) :: tail ->
if lastId + 1 = id then
group result ((name, id, amount) :: acc) tail id
else
group (acc :: result) ([(name, id, amount)]) tail id
group [] [] data 0
and you'll get the result you are looking for.
This should get the job done save three caveats.
You need to parse your string into the tuples required
There's an empty list in the result set because the first recursion wont match and appends the empty accumulator to the result set
The list will come out be reversed
Also note that this is a highly specialized function.
If I was you, I'd try to make this more general, if you ever plan on reusing it.
Have fun.

How to join frames using F#'s Deedle where one of the frame has a composite key?

Say I have two frames, firstFrame (Frame<(int * int),string>) and secondFrame (Frame<int,string>). I'd like to find a way to join the frames such that the values from the first part of the composite key from firstFrame match the values from the key in secondFrame.
The following is an example of the frames that I'm working with:
val firstFrame : Deedle.Frame<(int * int),string> =
Premia
1 1 -> 125
2 1 -> 135
3 1 -> 169
1 2 -> 231
2 2 -> 876
3 2 -> 24
val secondFrame : Deedle.Frame<int,string> =
year month
1 -> 2014 Apr
2 -> 2014 May
3 -> 2014 Jun
Code used to generate the sample above:
#I #"w:\\\dev\packages\Deedle.0.9.12"
#load "Deedle.fsx"
open Deedle
open System
let periodMembers =[(1,1);(2,1);(3,1);(1,2);(2,2);(3,2);]
let premia =[125;135;169;231;876;24;]
let firstSeries = Series(periodMembers,premia)
let firstFrame = Frame.ofColumns["Premia"=>firstSeries]
let projectedYears = series([1=>2014;2=>2014;3=>2014;])
let projectedMonths = series([1=>"Apr";2=>"May";3=>"Jun"])
let secondFrame = Frame(["year";"month"],[projectedYears;projectedMonths;])
Unfortunately, the issue is still open. I think Tomas' solution does not work well with missing values and it changes the order of rows.
My solution:
// 1. Make the key available
let k1 =
firstFrame
|> Frame.mapRows (fun (k,_) _ -> k)
let first =
firstFrame
|> Frame.addCol "A" k1
// 2. Create a combind column via lookup
let combined =
first.Columns.["A"].As<int>()
|> Series.mapAll (fun _ vOpt ->
vOpt
|> Option.bind (fun v ->
secondFrame.TryGetRow<string> v |> OptionalValue.asOption)
)
// 3. Split into single columns and append
let result =
secondFrame.ColumnKeys
|> Seq.fold (fun acc key ->
let col =
combined
|> Series.mapAll (fun _ sOpt ->
sOpt
|> Option.bind (fun s ->
s.TryGet key |> OptionalValue.asOption
)
)
acc
|> Frame.addCol key col) first
result.Print()
Premia A year month
1 1 -> 125 1 2014 Apr
2 1 -> 135 2 2014 May
3 1 -> 169 3 2014 Jun
1 2 -> 231 1 2014 Apr
2 2 -> 876 2 2014 May
3 2 -> 24 3 2014 Jun
Great question! This is not as easy as it should be (and it is probably related to another question about joining frames that we recorded as an issue). I think there should be a nicer way to do this and I'll add a link to this question to the issue.
That said, you can use the fact that joining can align keys that do not match exactly to do this. You can first add zero as the second element of the key in the second frame:
> let m = secondFrame |> Frame.mapRowKeys (fun k -> k, 0);;
val m : Frame<(int * int),string> =
year month
1 0 -> 2014 Apr
2 0 -> 2014 May
3 0 -> 2014 Jun
Now, the key in the second frame is always smaller than the matching keys in the first frame (assuming the numbers are positive). So, e.g. we want to align a value n the second frame with key (1, 0) to values in the first frame with keys (1, 1), (1, 2), ... You can use Lookup.NearestSmaller to tell Deedle that you want to find a value with the nearest smaller key (which will be (1, 0) for any key (1, k)).
To use this, you first need to sort the first frame, but then it works nicely:
> firstFrame.SortByRowKey().Join(m, JoinKind.Left, Lookup.NearestSmaller);;
val it : Frame<(int * int),string> =
Premia year month
1 1 -> 125 2014 Apr
2 -> 231 2014 Apr
2 1 -> 135 2014 May
2 -> 876 2014 May
3 1 -> 169 2014 Jun
2 -> 24 2014 Jun
This is not particularly obvious, but it does the trick. Although, I hope we can come up with a nicer way!

Pretty print a tree

Let's say I have a binary tree data structure defined as follows
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
I have an instance of a tree as follows:
let x =
Node
(Node (Node (Nil,35,Node (Nil,40,Nil)),48,Node (Nil,52,Node (Nil,53,Nil))),
80,Node (Node (Nil,82,Node (Nil,83,Nil)),92,Node (Nil,98,Nil)))
I'm trying to pretty-print the tree into something easy to interpret. Preferably, I'd like to print the tree in a console window like this:
_______ 80 _______
/ \
_ 48 _ _ 92 _
/ \ / \
35 52 82 98
\ \ /
40 53 83
What's an easy way to get my tree to output in that format?
If you want it to be very pretty, you could steal about 25 lines of code from this blog entry to draw it with WPF.
But I'll code up an ascii solution shortly too, probably.
EDIT
Ok, wow, that was hard.
I'm not certain it's entirely correct, and I can't help but think there's probably a better abstraction. But anyway... enjoy!
(See the end of the code for a large example that is rather pretty.)
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
(*
For any given tree
ddd
/ \
lll rrr
we think about it as these three sections, left|middle|right (L|M|R):
d | d | d
/ | | \
lll | | rrr
M is always exactly one character.
L will be as wide as either (d's width / 2) or L's width, whichever is more (and always at least one)
R will be as wide as either ((d's width - 1) / 2) or R's width, whichever is more (and always at least one)
(above two lines mean 'dddd' of even length is slightly off-center left)
We want the '/' to appear directly above the rightmost character of the direct left child.
We want the '\' to appear directly above the leftmost character of the direct right child.
If the width of 'ddd' is not long enough to reach within 1 character of the slashes, we widen 'ddd' with
underscore characters on that side until it is wide enough.
*)
// PrettyAndWidthInfo : 'a tree -> string[] * int * int * int
// strings are all the same width (space padded if needed)
// first int is that total width
// second int is the column the root node starts in
// third int is the column the root node ends in
// (assumes d.ToString() never returns empty string)
let rec PrettyAndWidthInfo t =
match t with
| Nil ->
[], 0, 0, 0
| Node(Nil,d,Nil) ->
let s = d.ToString()
[s], s.Length, 0, s.Length-1
| Node(l,d,r) ->
// compute info for string of this node's data
let s = d.ToString()
let sw = s.Length
let swl = sw/2
let swr = (sw-1)/2
assert(swl+1+swr = sw)
// recurse
let lp,lw,_,lc = PrettyAndWidthInfo l
let rp,rw,rc,_ = PrettyAndWidthInfo r
// account for absent subtrees
let lw,lb = if lw=0 then 1," " else lw,"/"
let rw,rb = if rw=0 then 1," " else rw,"\\"
// compute full width of this tree
let totalLeftWidth = (max (max lw swl) 1)
let totalRightWidth = (max (max rw swr) 1)
let w = totalLeftWidth + 1 + totalRightWidth
(*
A suggestive example:
dddd | d | dddd__
/ | | \
lll | | rr
| | ...
| | rrrrrrrrrrr
---- ---- swl, swr (left/right string width (of this node) before any padding)
--- ----------- lw, rw (left/right width (of subtree) before any padding)
---- totalLeftWidth
----------- totalRightWidth
---- - ----------- w (total width)
*)
// get right column info that accounts for left side
let rc2 = totalLeftWidth + 1 + rc
// make left and right tree same height
let lp = if lp.Length < rp.Length then lp # List.init (rp.Length-lp.Length) (fun _ -> "") else lp
let rp = if rp.Length < lp.Length then rp # List.init (lp.Length-rp.Length) (fun _ -> "") else rp
// widen left and right trees if necessary (in case parent node is wider, and also to fix the 'added height')
let lp = lp |> List.map (fun s -> if s.Length < totalLeftWidth then (nSpaces (totalLeftWidth - s.Length)) + s else s)
let rp = rp |> List.map (fun s -> if s.Length < totalRightWidth then s + (nSpaces (totalRightWidth - s.Length)) else s)
// first part of line1
let line1 =
if swl < lw - lc - 1 then
(nSpaces (lc + 1)) + (nBars (lw - lc - swl)) + s
else
(nSpaces (totalLeftWidth - swl)) + s
// line1 right bars
let line1 =
if rc2 > line1.Length then
line1 + (nBars (rc2 - line1.Length))
else
line1
// line1 right padding
let line1 = line1 + (nSpaces (w - line1.Length))
// first part of line2
let line2 = (nSpaces (totalLeftWidth - lw + lc)) + lb
// pad rest of left half
let line2 = line2 + (nSpaces (totalLeftWidth - line2.Length))
// add right content
let line2 = line2 + " " + (nSpaces rc) + rb
// add right padding
let line2 = line2 + (nSpaces (w - line2.Length))
let resultLines = line1 :: line2 :: ((lp,rp) ||> List.map2 (fun l r -> l + " " + r))
for x in resultLines do
assert(x.Length = w)
resultLines, w, lw-swl, totalLeftWidth+1+swr
and nSpaces n =
String.replicate n " "
and nBars n =
String.replicate n "_"
let PrettyPrint t =
let sl,_,_,_ = PrettyAndWidthInfo t
for s in sl do
printfn "%s" s
let y = Node(Node (Node (Nil,35,Node (Node(Nil,1,Nil),88888888,Nil)),48,Node (Nil,777777777,Node (Nil,53,Nil))),
80,Node (Node (Nil,82,Node (Nil,83,Nil)),1111111111,Node (Nil,98,Nil)))
let z = Node(y,55555,y)
let x = Node(z,4444,y)
PrettyPrint x
(*
___________________________4444_________________
/ \
________55555________________ ________80
/ \ / \
________80 ________80 _______48 1111111111
/ \ / \ / \ / \
_______48 1111111111 _______48 1111111111 35 777777777 82 98
/ \ / \ / \ / \ \ \ \
35 777777777 82 98 35 777777777 82 98 88888888 53 83
\ \ \ \ \ \ /
88888888 53 83 88888888 53 83 1
/ /
1 1
*)
If you don't mind turning your head sideways, you can print the tree depth first, one node to a line, recursively passing the depth down the tree, and printing depth*N spaces on the line before the node.
Here's Lua code:
tree={{{nil,35,{nil,40,nil}},48,{nil,52,{nil,53,nil}}},
80,{{nil,82,{nil,83,nil}},92 {nil,98,nil}}}
function pptree (t,depth)
if t ~= nil
then pptree(t[3], depth+1)
print(string.format("%s%d",string.rep(" ",depth), t[2]))
pptree(t[1], depth+1)
end
end
Test:
> pptree(tree,4)
98
92
83
82
80
53
52
48
40
35
>
Maybe this can help: Drawing Trees in ML
Although it's not exactly the right output, I found an answer at http://www.christiankissig.de/cms/files/ocaml99/problem67.ml :
(* A string representation of binary trees
Somebody represents binary trees as strings of the following type (see example opposite):
a(b(d,e),c(,f(g,)))
a) Write a Prolog predicate which generates this string representation, if the tree
is given as usual (as nil or t(X,L,R) term). Then write a predicate which does this
inverse; i.e. given the string representation, construct the tree in the usual form.
Finally, combine the two predicates in a single predicate tree_string/2 which can be
used in both directions.
b) Write the same predicate tree_string/2 using difference lists and a single
predicate tree_dlist/2 which does the conversion between a tree and a difference
list in both directions.
For simplicity, suppose the information in the nodes is a single letter and there are
no spaces in the string.
*)
type bin_tree =
Leaf of string
| Node of string * bin_tree * bin_tree
;;
let rec tree_to_string t =
match t with
Leaf s -> s
| Node (s,tl,tr) ->
String.concat ""
[s;"(";tree_to_string tl;",";tree_to_string tr;")"]
;;
This is an intuition, I'm sure someone like Knuth had the idea, I'm too lazy
to check.
If you look at your tree as an one dimensional structure you will get an array
(or vector) of length L
This is easy to build with an "in order" recursive tree traversal: left,root,right
some calculations must be done to fill the gaps when the tree is unbalanced
2 dimension
_______ 80 _______
/ \
_ 48 _ _ 92 _
/ \ / \
35 52 82 98
\ \ /
40 53 83
1 dimension :
35 40 48 52 53 80 83 82 92 98
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
The pretty printed tree can be build using this array
(maybe with something recursive)
first using values at L/2 position, the X position is the
L/2 value * the default length (here it is 2 characters)
80
then (L/2) - (L/4) and (L/2) + (L/4)
48 92
then L/2-L/4-L/8, L/2-L/4+L/8, L/2+L/4-L/8 and L/2+L/4+L/8
35 52 82 98
...
Adding pretty branches will cause more positional arithmetics but it's trivial here
You can concatenate values in a string instead using an array, concatenation will
de facto calculate the best X postion and will allow different value size,
making a more compact tree.
In this case you will have to count the words in the string to extract
the values. ex: for the first element using the L/2th word of the string instead
of the L/2 element of the array. The X position in the string is the same in the tree.
N 35 40 48 N 52 53 80 83 82 N 92 N 98 N
80
48 92
35 52 82 98
40 53 83

Resources