deedle aggregate/group based on running numbers in a column of Frame - f#

say I have a Frame, which looks like below,
" Name ID Amount
0 -> Joe 51 50
1 -> Tomas 52 100
2 -> Eve 65 20
3 -> Suzanne 67 10
4 -> Suassss 69 10
5 -> Suzanne 70 10
6 -> Suzanne 78 1
7 -> Suzanne 79 10
8 -> Suzanne 80 12
9 -> Suzanne 85 10
10 -> Suzanne 87 10
...
What I would like to achieve is to group or aggregate base on the ID column such that if a sequence of running number is encountered, those rows should be grouped together, otherwise, the row itself is a group.

I belive a recursive function is your friend here.
Feed a list of tuples
let data = [(Joe, 51, 50);
(Tomas, 52, 100);
(Eve, 65, 20);
(Suzanne, 67, 10)]
to a function
let groupBySequencialId list =
let rec group result acc data lastId =
match data with
| [] -> acc :: result
| (name, id, amount) :: tail ->
if lastId + 1 = id then
group result ((name, id, amount) :: acc) tail id
else
group (acc :: result) ([(name, id, amount)]) tail id
group [] [] data 0
and you'll get the result you are looking for.
This should get the job done save three caveats.
You need to parse your string into the tuples required
There's an empty list in the result set because the first recursion wont match and appends the empty accumulator to the result set
The list will come out be reversed
Also note that this is a highly specialized function.
If I was you, I'd try to make this more general, if you ever plan on reusing it.
Have fun.

Related

Deedle Row based calculation

I am trying to use Deedle to do some row based calculation. however most of the examples are column based. For example I have this simple structure:
let tt = Series.ofObservations[ 1=>10.0; 3=>20.0;5=> 30.0 ]
let tt2 = Series.ofObservations[1=> 10.0; 3=> Double.NaN; 6=>30.0 ]
let f1 = frame ["cola" => tt; "colb"=>tt2]
val f1 : Frame<int,string> =
cola colb
1 -> 10 10
3 -> 20 <missing>
5 -> 30 <missing>
6 -> <missing> 30
I want to calculate the mean of cola and colb. if I do
f1.Rows |> Series.mapValues(fun r -> (r.GetAs<float>("cola") + r.GetAs<float>("colb") )/2.0)
val it : Series<int,float> =
1 -> 10
3 -> <missing>
5 -> <missing>
6 -> <missing>
i know i can match with each column to handle the mean, however this will not be practical if there are lots of columns.
each row returned by f1.Rows is a ObjectSeries can this be converted into a float Series and apply the stats.mean to a row?
thanks
casbby
Update:
I think i might have found one of the ways to do this (reference: https://github.com/BlueMountainCapital/Deedle/issues/100) :
folding operation:
f1.Rows |> Series.mapValues(fun v -> v.As<float>() |> Series.foldValues (fun acc elem -> elem + acc) 0.0 )
mean (it properly skip the missing value):
f1.Rows |> Series.mapValues(fun v -> v.As<float>() |> Stats.mean )
count:
f1.Rows |> Series.mapValues(fun v -> v.As<float>() |> Stats.count )
if there is a different way please let me know. hopefully this can be useful to new comers like myself.
Your approach using f1.Rows, casting each row to a numerical series and then applying Stats functions is exactly what I was going to suggest as an answer, so I think that approach makes a perfect sense.
Another option that I can think of is to turn the frame into a de-normalized representation and then group the rows by the cola and colb values (so, you'll have all data as rows, but grouped by the other attribute):
let byCol =
f1
|> Frame.stack
|> Frame.groupRowsByString "Column";;
This gives you:
Row Column Value
cola 0 -> 1 cola 10
2 -> 3 cola 20
3 -> 5 cola 30
colb 1 -> 1 colb 10
4 -> 6 colb 30
Now you can use functions working with hierarchical indices to do the calculations. For example, to compute mean of Value for the two groups, you can write:
byCol?Value |> Stats.levelMean fst
I'm not sure which approach I'd recommend at the moment - it probably depends on other operations that you need to do with the data. But it's good to keep the alternative one in mind..

How to join frames using F#'s Deedle where one of the frame has a composite key?

Say I have two frames, firstFrame (Frame<(int * int),string>) and secondFrame (Frame<int,string>). I'd like to find a way to join the frames such that the values from the first part of the composite key from firstFrame match the values from the key in secondFrame.
The following is an example of the frames that I'm working with:
val firstFrame : Deedle.Frame<(int * int),string> =
Premia
1 1 -> 125
2 1 -> 135
3 1 -> 169
1 2 -> 231
2 2 -> 876
3 2 -> 24
val secondFrame : Deedle.Frame<int,string> =
year month
1 -> 2014 Apr
2 -> 2014 May
3 -> 2014 Jun
Code used to generate the sample above:
#I #"w:\\\dev\packages\Deedle.0.9.12"
#load "Deedle.fsx"
open Deedle
open System
let periodMembers =[(1,1);(2,1);(3,1);(1,2);(2,2);(3,2);]
let premia =[125;135;169;231;876;24;]
let firstSeries = Series(periodMembers,premia)
let firstFrame = Frame.ofColumns["Premia"=>firstSeries]
let projectedYears = series([1=>2014;2=>2014;3=>2014;])
let projectedMonths = series([1=>"Apr";2=>"May";3=>"Jun"])
let secondFrame = Frame(["year";"month"],[projectedYears;projectedMonths;])
Unfortunately, the issue is still open. I think Tomas' solution does not work well with missing values and it changes the order of rows.
My solution:
// 1. Make the key available
let k1 =
firstFrame
|> Frame.mapRows (fun (k,_) _ -> k)
let first =
firstFrame
|> Frame.addCol "A" k1
// 2. Create a combind column via lookup
let combined =
first.Columns.["A"].As<int>()
|> Series.mapAll (fun _ vOpt ->
vOpt
|> Option.bind (fun v ->
secondFrame.TryGetRow<string> v |> OptionalValue.asOption)
)
// 3. Split into single columns and append
let result =
secondFrame.ColumnKeys
|> Seq.fold (fun acc key ->
let col =
combined
|> Series.mapAll (fun _ sOpt ->
sOpt
|> Option.bind (fun s ->
s.TryGet key |> OptionalValue.asOption
)
)
acc
|> Frame.addCol key col) first
result.Print()
Premia A year month
1 1 -> 125 1 2014 Apr
2 1 -> 135 2 2014 May
3 1 -> 169 3 2014 Jun
1 2 -> 231 1 2014 Apr
2 2 -> 876 2 2014 May
3 2 -> 24 3 2014 Jun
Great question! This is not as easy as it should be (and it is probably related to another question about joining frames that we recorded as an issue). I think there should be a nicer way to do this and I'll add a link to this question to the issue.
That said, you can use the fact that joining can align keys that do not match exactly to do this. You can first add zero as the second element of the key in the second frame:
> let m = secondFrame |> Frame.mapRowKeys (fun k -> k, 0);;
val m : Frame<(int * int),string> =
year month
1 0 -> 2014 Apr
2 0 -> 2014 May
3 0 -> 2014 Jun
Now, the key in the second frame is always smaller than the matching keys in the first frame (assuming the numbers are positive). So, e.g. we want to align a value n the second frame with key (1, 0) to values in the first frame with keys (1, 1), (1, 2), ... You can use Lookup.NearestSmaller to tell Deedle that you want to find a value with the nearest smaller key (which will be (1, 0) for any key (1, k)).
To use this, you first need to sort the first frame, but then it works nicely:
> firstFrame.SortByRowKey().Join(m, JoinKind.Left, Lookup.NearestSmaller);;
val it : Frame<(int * int),string> =
Premia year month
1 1 -> 125 2014 Apr
2 -> 231 2014 Apr
2 1 -> 135 2014 May
2 -> 876 2014 May
3 1 -> 169 2014 Jun
2 -> 24 2014 Jun
This is not particularly obvious, but it does the trick. Although, I hope we can come up with a nicer way!

Can I sort a Deedle frame?

From what I can tell a Deedle frame is only sorted by the index. Is there any way to apply a custom sorting function or sort by a given series (and define ascending/descending order)?
Sticking to a "standard" frame of type Frame<int,string> (row index of integers and column names of strings) it is easy to implement a function capable of reordering the frame based on any single column contents in ascending or descending order:
let reorder isAscending sortColumnName (frame:Frame<int,string>) =
let result = frame |> Frame.indexRows sortColumnName
|> Frame.orderRows |> Frame.indexRowsOrdinally
if isAscending then result else result |> Frame.mapRowKeys ((-) 0)
|> Frame.orderRows
|> Frame.indexRowsOrdinally
A smoke test over peopleList sample frame:
Name Age Countries
0 -> Joe 51 [UK; US; UK]
1 -> Tomas 28 [CZ; UK; US; CZ]
2 -> Eve 2 [FR]
3 -> Suzanne 15 [US]
reorder false "Name" peopleList returns the frame where Name is sorted in descending order
Name Age Countries
0 -> Tomas 28 [CZ; UK; US; CZ]
1 -> Suzanne 15 [US]
2 -> Joe 51 [UK; US; UK]
3 -> Eve 2 [FR]
while reorder true "Age" peopleList returns the frame where Age is sorted in ascending order
Name Age Countries
0 -> Eve 2 [FR]
1 -> Suzanne 15 [US]
2 -> Tomas 28 [CZ; UK; US; CZ]
3 -> Joe 51 [UK; US; UK]
Nevertheless, requirement of absent duplicate values in to-be-ordered column might be considered as a showstopper for this approach to Deedle frame ordering.
You can sort a Deedle frame based on the values in a named column, like so:
myFrame |> Frame.sortRowsBy "columnName" (fun v -> -v) (descending)
myFrame |> Frame.sortRowsBy "columnName" (fun v -> v) (ascending)
Deedle 1.0 has additional sorting features for rows & cols
Frame.sortRows
Frame.sortRowsWith
Frame.sortRowsBy

help me explain this F# recursive example program

let rec aggregateList (f:int->int->int) init list =
match list with
| [] -> init
| hd::tl ->
let rem = aggregateList f init tl
f rem hd
let add a b = a + b
let mul a b = a * b
//to use in F# Interactive:
//aggregateList add 0 [1..5];;
Got this example from "Functional Programming for the Real world" by Thomas Petricek
I don't understand in second branch in that pattern matching: f rem hd.
Could somebody help me?
Let's break down the aggregateList function declaration first. The function takes three parameters:
A function, named f, that takes two ints and returns a third int.
The initial value to start aggregating with.
A list of values.
The function then matches the list it is supplied with one of two possibilities:
The list is empty, in which case it returns the value of init.
The list is not empty, in which case it takes the first item and assigns it to hd (or head) and the rest of the list and assigns it to tl (or tail). Then it performs the recursive call aggregateList f init tl. When that returns, it takes the result and assigns it to rem. Then it calls f on rem and hd.
As other people have pointed out, this does the same thing as the List.foldback function in the basic F# library.
Be careful, of course, to choose the init value properly because if you executed aggregateList mul 0 somelist;; you'll just get 0 no matter what list you supply.
It calls the function f (one of the parameters) giving it the result of the recursive call and the next item.
rem is the remainder, or in this case the result of the remainder of the values.
hd is the next item, as seen in the | hd::tl -> part of the pattern matching.
Effectively this aggregate function takes a function, a starting point, and a list. A way of representing the example line is:
(1 + (2 + (3 + (4 + (5 + 0)))))
Just for fun, let's do some printf style debugging:
> aggregateList (fun acc x -> printf "%i " x; acc + x) 0 [1..10];;
10 9 8 7 6 5 4 3 2 1 val it : int = 55
It looks like the function is equivalent to List.foldBack (or fold_right in other languages): it walks each item in the list from right to left and invokes a function f on them.
Let's re-write the function in a few different ways:
// functional version
let rec foldBack f seed = function
| [] -> seed
| x::xs -> let res = foldBack f seed xs in f res x
// imperative version
let foldBack f seed xs =
let mutable result = seed
for x in List.rev xs do
result <- f result x
result
// C# equivalent
public static U FoldBack<T, U>(Func<T, U> f, U seed, IEnumerable<T> xs) {
foreach(T x in xs.Reverse())
seed = f(seed, x);
return seed;
}
You'd use the function like this:
let sum = foldBack (+) 0 [1..10] // returns 55
let sumOfSquares = foldBack (fun acc x -> acc + x * x) 0 [1..10];; // 385
I don't understand in second branch in
that pattern matching: f rem hd. Could
somebody help me?
So let's start with what we already know about F# functions:
f is a function with the type int -> int -> int. You pass functions around as if they were any other variable like ints or strings.
You call functions by passing a space-separated list of arguments. f rem hd invokes the function f with two arguments, rem and hd.
The last expression evaluated in a function is treated as the function's return value.
So going back to the original function:
let rec aggregateList (f:int->int->int) init list =
match list with
| [] -> init
| hd::tl ->
let rem = aggregateList f init tl // 1
f rem hd // 2
In line 1, we call aggregateList recusively with tl. Since the list gets smaller and smaller, we're eventually going to hit the nil case, which returns init.
In line 2, f rem hd is the function's return value. However, since we recursed down the stack as we made our way to end of the list, we're going to call this function one for each element (in right-to-left order) as we walk back up the stack trace.
Given aggregateList (+) 0 [1..10], the nil case returns 0, so we call:
return value = f rem hd = f 0 10 = 0 + 10 = 10
return value = f rem hd = f 10 9 = 9 + 10 = 19
return value = f rem hd = f 19 8 = 19 + 8 = 27
return value = f rem hd = f 27 7 = 27 + 7 = 34
return value = f rem hd = f 34 6 = 34 + 6 = 40
return value = f rem hd = f 40 5 = 40 + 5 = 45
return value = f rem hd = f 45 4 = 45 + 4 = 49
return value = f rem hd = f 49 3 = 49 + 3 = 52
return value = f rem hd = f 52 2 = 52 + 2 = 54
return value = f rem hd = f 54 1 = 54 + 1 = 55
No more items in the list, so the whole function returns 55.
As you can imagine, the nested calls in aggregateList evaluate like this for a list of length n:
f (f (f (f (f (f (f (f init hdn) hdn-1) hdn-2) hdn-3) ... hd2) hd1) hd0

Pretty print a tree

Let's say I have a binary tree data structure defined as follows
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
I have an instance of a tree as follows:
let x =
Node
(Node (Node (Nil,35,Node (Nil,40,Nil)),48,Node (Nil,52,Node (Nil,53,Nil))),
80,Node (Node (Nil,82,Node (Nil,83,Nil)),92,Node (Nil,98,Nil)))
I'm trying to pretty-print the tree into something easy to interpret. Preferably, I'd like to print the tree in a console window like this:
_______ 80 _______
/ \
_ 48 _ _ 92 _
/ \ / \
35 52 82 98
\ \ /
40 53 83
What's an easy way to get my tree to output in that format?
If you want it to be very pretty, you could steal about 25 lines of code from this blog entry to draw it with WPF.
But I'll code up an ascii solution shortly too, probably.
EDIT
Ok, wow, that was hard.
I'm not certain it's entirely correct, and I can't help but think there's probably a better abstraction. But anyway... enjoy!
(See the end of the code for a large example that is rather pretty.)
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
(*
For any given tree
ddd
/ \
lll rrr
we think about it as these three sections, left|middle|right (L|M|R):
d | d | d
/ | | \
lll | | rrr
M is always exactly one character.
L will be as wide as either (d's width / 2) or L's width, whichever is more (and always at least one)
R will be as wide as either ((d's width - 1) / 2) or R's width, whichever is more (and always at least one)
(above two lines mean 'dddd' of even length is slightly off-center left)
We want the '/' to appear directly above the rightmost character of the direct left child.
We want the '\' to appear directly above the leftmost character of the direct right child.
If the width of 'ddd' is not long enough to reach within 1 character of the slashes, we widen 'ddd' with
underscore characters on that side until it is wide enough.
*)
// PrettyAndWidthInfo : 'a tree -> string[] * int * int * int
// strings are all the same width (space padded if needed)
// first int is that total width
// second int is the column the root node starts in
// third int is the column the root node ends in
// (assumes d.ToString() never returns empty string)
let rec PrettyAndWidthInfo t =
match t with
| Nil ->
[], 0, 0, 0
| Node(Nil,d,Nil) ->
let s = d.ToString()
[s], s.Length, 0, s.Length-1
| Node(l,d,r) ->
// compute info for string of this node's data
let s = d.ToString()
let sw = s.Length
let swl = sw/2
let swr = (sw-1)/2
assert(swl+1+swr = sw)
// recurse
let lp,lw,_,lc = PrettyAndWidthInfo l
let rp,rw,rc,_ = PrettyAndWidthInfo r
// account for absent subtrees
let lw,lb = if lw=0 then 1," " else lw,"/"
let rw,rb = if rw=0 then 1," " else rw,"\\"
// compute full width of this tree
let totalLeftWidth = (max (max lw swl) 1)
let totalRightWidth = (max (max rw swr) 1)
let w = totalLeftWidth + 1 + totalRightWidth
(*
A suggestive example:
dddd | d | dddd__
/ | | \
lll | | rr
| | ...
| | rrrrrrrrrrr
---- ---- swl, swr (left/right string width (of this node) before any padding)
--- ----------- lw, rw (left/right width (of subtree) before any padding)
---- totalLeftWidth
----------- totalRightWidth
---- - ----------- w (total width)
*)
// get right column info that accounts for left side
let rc2 = totalLeftWidth + 1 + rc
// make left and right tree same height
let lp = if lp.Length < rp.Length then lp # List.init (rp.Length-lp.Length) (fun _ -> "") else lp
let rp = if rp.Length < lp.Length then rp # List.init (lp.Length-rp.Length) (fun _ -> "") else rp
// widen left and right trees if necessary (in case parent node is wider, and also to fix the 'added height')
let lp = lp |> List.map (fun s -> if s.Length < totalLeftWidth then (nSpaces (totalLeftWidth - s.Length)) + s else s)
let rp = rp |> List.map (fun s -> if s.Length < totalRightWidth then s + (nSpaces (totalRightWidth - s.Length)) else s)
// first part of line1
let line1 =
if swl < lw - lc - 1 then
(nSpaces (lc + 1)) + (nBars (lw - lc - swl)) + s
else
(nSpaces (totalLeftWidth - swl)) + s
// line1 right bars
let line1 =
if rc2 > line1.Length then
line1 + (nBars (rc2 - line1.Length))
else
line1
// line1 right padding
let line1 = line1 + (nSpaces (w - line1.Length))
// first part of line2
let line2 = (nSpaces (totalLeftWidth - lw + lc)) + lb
// pad rest of left half
let line2 = line2 + (nSpaces (totalLeftWidth - line2.Length))
// add right content
let line2 = line2 + " " + (nSpaces rc) + rb
// add right padding
let line2 = line2 + (nSpaces (w - line2.Length))
let resultLines = line1 :: line2 :: ((lp,rp) ||> List.map2 (fun l r -> l + " " + r))
for x in resultLines do
assert(x.Length = w)
resultLines, w, lw-swl, totalLeftWidth+1+swr
and nSpaces n =
String.replicate n " "
and nBars n =
String.replicate n "_"
let PrettyPrint t =
let sl,_,_,_ = PrettyAndWidthInfo t
for s in sl do
printfn "%s" s
let y = Node(Node (Node (Nil,35,Node (Node(Nil,1,Nil),88888888,Nil)),48,Node (Nil,777777777,Node (Nil,53,Nil))),
80,Node (Node (Nil,82,Node (Nil,83,Nil)),1111111111,Node (Nil,98,Nil)))
let z = Node(y,55555,y)
let x = Node(z,4444,y)
PrettyPrint x
(*
___________________________4444_________________
/ \
________55555________________ ________80
/ \ / \
________80 ________80 _______48 1111111111
/ \ / \ / \ / \
_______48 1111111111 _______48 1111111111 35 777777777 82 98
/ \ / \ / \ / \ \ \ \
35 777777777 82 98 35 777777777 82 98 88888888 53 83
\ \ \ \ \ \ /
88888888 53 83 88888888 53 83 1
/ /
1 1
*)
If you don't mind turning your head sideways, you can print the tree depth first, one node to a line, recursively passing the depth down the tree, and printing depth*N spaces on the line before the node.
Here's Lua code:
tree={{{nil,35,{nil,40,nil}},48,{nil,52,{nil,53,nil}}},
80,{{nil,82,{nil,83,nil}},92 {nil,98,nil}}}
function pptree (t,depth)
if t ~= nil
then pptree(t[3], depth+1)
print(string.format("%s%d",string.rep(" ",depth), t[2]))
pptree(t[1], depth+1)
end
end
Test:
> pptree(tree,4)
98
92
83
82
80
53
52
48
40
35
>
Maybe this can help: Drawing Trees in ML
Although it's not exactly the right output, I found an answer at http://www.christiankissig.de/cms/files/ocaml99/problem67.ml :
(* A string representation of binary trees
Somebody represents binary trees as strings of the following type (see example opposite):
a(b(d,e),c(,f(g,)))
a) Write a Prolog predicate which generates this string representation, if the tree
is given as usual (as nil or t(X,L,R) term). Then write a predicate which does this
inverse; i.e. given the string representation, construct the tree in the usual form.
Finally, combine the two predicates in a single predicate tree_string/2 which can be
used in both directions.
b) Write the same predicate tree_string/2 using difference lists and a single
predicate tree_dlist/2 which does the conversion between a tree and a difference
list in both directions.
For simplicity, suppose the information in the nodes is a single letter and there are
no spaces in the string.
*)
type bin_tree =
Leaf of string
| Node of string * bin_tree * bin_tree
;;
let rec tree_to_string t =
match t with
Leaf s -> s
| Node (s,tl,tr) ->
String.concat ""
[s;"(";tree_to_string tl;",";tree_to_string tr;")"]
;;
This is an intuition, I'm sure someone like Knuth had the idea, I'm too lazy
to check.
If you look at your tree as an one dimensional structure you will get an array
(or vector) of length L
This is easy to build with an "in order" recursive tree traversal: left,root,right
some calculations must be done to fill the gaps when the tree is unbalanced
2 dimension
_______ 80 _______
/ \
_ 48 _ _ 92 _
/ \ / \
35 52 82 98
\ \ /
40 53 83
1 dimension :
35 40 48 52 53 80 83 82 92 98
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
The pretty printed tree can be build using this array
(maybe with something recursive)
first using values at L/2 position, the X position is the
L/2 value * the default length (here it is 2 characters)
80
then (L/2) - (L/4) and (L/2) + (L/4)
48 92
then L/2-L/4-L/8, L/2-L/4+L/8, L/2+L/4-L/8 and L/2+L/4+L/8
35 52 82 98
...
Adding pretty branches will cause more positional arithmetics but it's trivial here
You can concatenate values in a string instead using an array, concatenation will
de facto calculate the best X postion and will allow different value size,
making a more compact tree.
In this case you will have to count the words in the string to extract
the values. ex: for the first element using the L/2th word of the string instead
of the L/2 element of the array. The X position in the string is the same in the tree.
N 35 40 48 N 52 53 80 83 82 N 92 N 98 N
80
48 92
35 52 82 98
40 53 83

Resources