Deedle - Weighted Average after filtering FilterRowValues

Deedle - Weighted Average after filtering FilterRowValues - f#

I am new to F#. I am attempting to calculate a weighted average after filtering my Frame by two timestamps and an instrument_id.
example data:
| trade_qty | trade_price | trade_timestamp | instrument_id
| 1000 | 100.59 | 1/26/2018 16:00:00 | 1
| 2000 | 105.10 | 1/26/2018 15:59:30 | 1
| 3000 | 97.59 | 1/26/2018 15:59:00 | 1
I found I can filter easily: e.g. instrument 1 between two times
frameVolume
|> Frame.filterRowValues (fun c.GetAs<DateTime>
("trade_timestamp)>DateTime(2018,1,27,15,31,0))
|> Frame.filterRowValues (fun c.GetAs<DateTime>
("trade_timestamp)<DateTime(2018,1,27,16,00,0))
|> Frame.filterRowValues (fun c.GetAs<int>("instrument_id")=
1
I am stuck here. I haven't figured out how to 1/sum(trade_qty) * Sum(trade_price*trade_qty)
I have tried:
|>Frame.GetColumn<float>("trade_qty") *
Frame.GetColumn<float>("trade_price")
For context, I'd like to use this as a function to be fed into another function in order to calculate the weighted average price over several intervals.
Any Thoughts? Thank you!

It's nice that Deedle provides higher-order functions similar to the built in higher-order functions for F# List, Arrays, and Seqs. Using this knowledge, it makes the task simpler. Here is an implementation of the function you are describing:
#I "..\packages\Deedle.1.2.5"
#load "Deedle.fsx"
open System
open Deedle
let weightedAverage after before frame: float =
let filteredFrame =
frame
|> Frame.filterRowValues (fun r -> r.GetAs<DateTime>("trade_timestamp") < before)
|> Frame.filterRowValues (fun r -> r.GetAs<DateTime>("trade_timestamp") > after)
|> Frame.filterRowValues (fun r -> r.GetAs<int>("instrument_id") = 1)
let quantities: Series<int, float> = filteredFrame |> Frame.getCol "trade_qty"
let tradePrices: Series<int, float> = filteredFrame |> Frame.getCol "trade_price"
let weightedSum =
(quantities, tradePrices)
||> Series.zip
|> Series.mapValues (fun (q, p) -> (OptionalValue.get q * OptionalValue.get p))
|> Series.reduceValues (fun acc curr -> acc + curr)
let total =
quantities
|> Series.reduceValues (fun acc curr -> acc + curr)
weightedSum / total
let path = __SOURCE_DIRECTORY__ + "\data.csv"
let df = Frame.ReadCsv(path, separators = "|")
let ans = df |> weightedAverage (DateTime(2017, 1, 1)) (DateTime(2019, 1, 1))

Related

F# list group by running total?

I have the following list of tuples ordered by the first item. I want to cluster the times by
If the second item of the tuple is greater then 50, it will be in its own cluster.
Otherwise, cluster the items whose sum is less than 50.
The order cannot be changed.
code:
let values =
[("ACE", 78);
("AMR", 3);
("Aam", 6);
("Acc", 1);
("Adj", 23);
("Aga", 12);
("All", 2);
("Ame", 4);
("Amo", 60);
//....
]
values |> Seq.groupBy(fun (k,v) -> ???)
The expected value will be
[["ACE"] // 78
["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] // 47
["Ame"] // 4
["Amo"] // 60
....]
Ideally, I want to evenly distribute the second group (["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] which got the sum of 47) and the third one (["Ame"] which has only 4).
How to implement it in F#?
I had the following solution. It uses a mutable variable. It's not F# idiomatic? Is for ... do imperative in F# or is it a syntactic sugar of some function construct?
seq {
let mutable c = []
for v in values |> Seq.sortBy(fun (k, _) -> k) do
let sum = c |> Seq.map(fun (_, v) -> v) |> Seq.sum
if not(c = []) && sum + (snd v) > 50
then
yield c
c <- [v]
else
c <- List.append c [v]
}

I think I got it. Not the nicest code ever, but works and is immutable.
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last)
let newLast = [((fst last) # [name]), (snd last) + value]
newLast |> List.append withoutLast
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else [[name], value] |> List.append acc
values |> List.fold foldFn [] |> List.map fst
Update: Since append can be quite expensive operation, I added prepend only version (still fulfills original requirement to keep order).
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last) |> List.rev
let newLast = ((fst last) # [name]), (snd last) + value
(newLast :: withoutLast) |> List.rev
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else ([name], value) :: (List.rev acc) |> List.rev
Note: There is still # operator on line 4 (when creating new list of names in cluster), but since the theoretical maximum amount of names in cluster is 50 (if all of them would be equal 1), the performance here is negligible.
If you remove List.map fst on last line, you would get sum value for each cluster in list.

Append operations are expensive. A straight-forward fold with prepended intermediate results is cheaper, even if the lists need to be reversed after processing.
["ACE", 78; "AMR", 3; "Aam", 6; "Acc", 1; "Adj", 23; "Aga", 12; "All", 2; "Ame", 4; "Amd", 6; "Amo", 60]
|> List.fold (fun (r, s1, s2) (t1, t2) ->
if t2 > 50 then [t1]::s1::r, [], 0
elif s2 + t2 > 50 then s1::r, [t1], t2
else r, t1::s1, s2 + t2 ) ([], [], 0)
|> fun (r, s1, _) -> s1::r
|> List.filter (not << List.isEmpty)
|> List.map List.rev
|> List.rev
// val it : string list list =
// [["ACE"]; ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]; ["Ame"; "Amd"];
// ["Amo"]]

Here is a recursive version - working much the same way as fold-versions:
let groupBySums data =
let rec group cur sum acc lst =
match lst with
| [] -> acc |> List.where (not << List.isEmpty) |> List.rev
| (name, value)::tail when value > 50 -> group [] 0 ([(name, value)]::(cur |> List.rev)::acc) tail
| (name, value)::tail ->
match sum + value with
| x when x > 50 -> group [(name, value)] 0 ((cur |> List.rev)::acc) tail
| _ -> group ((name, value)::cur) (sum + value) acc tail
(data |> List.sortBy (fun (name, _) -> name)) |> group [] 0 []
values |> groupBySums |> List.iter (printfn "%A")

Yet another solution using Seq.mapFold and Seq.groupBy:
let group values =
values
|> Seq.mapFold (fun (group, total) (name, count) ->
let newTotal = count + total
let newGroup = group + if newTotal > 50 then 1 else 0
(newGroup, name), (newGroup, if newGroup = group then newTotal else count)
) (0, 0)
|> fst
|> Seq.groupBy fst
|> Seq.map (snd >> Seq.map snd >> Seq.toList)
Invoke it like this:
[ "ACE", 78
"AMR", 3
"Aam", 6
"Acc", 1
"Adj", 23
"Aga", 12
"All", 2
"Ame", 4
"Amo", 60
]
|> group
|> Seq.iter (printfn "%A")
// ["ACE"]
// ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]
// ["Ame"]
// ["Amo"]

Matrix transposition in F#

I'm trying to modify a matrix like this one:
/ 1 2 3 \
\ 4 5 6 /
to return:
/ 1 4 \
| 2 5 |
\ 3 6 /
Instead it is flipping my matrix by the corners. This is the code I have so far:
Let rec matrixadjust = function
| (_::_) : : as xss-> List.map List.head xss :: matrixadjust (List.map List.tail xss)
| _ ->[];;

I think that the best way to work with matrix is using the Array2D data structure. You can build an Array2D from an array of arrays and then create a new Array2D to acomplish what you want:
let arrayOfArrays = [| [| 1; 2; 3 |]; [|4; 5; 6 |] |]
let array2d = Array2D.init 2 3 (fun row column -> arrayOfArrays.[row].[column])
let newArray = Array2D.init (array2d |> Array2D.length2) (array2d |> Array2D.length1) (fun r c -> array2d.[c,r])

Assuming your data structure is a list of lists where each sub-list represents a row you could do it like this. Basically it loops once per source-list row and accumulates the result in the partial binding. Since its doing list accumulation, it reverses the order of the values so you have to do a List.rev on each row at the end.
let flip matrix =
match matrix with
| [] -> []
| x::xs ->
let rec loop matrix partial =
match matrix with
| [] -> partial
| y::ys ->let newPartial = (y, partial) ||> List.map2(fun x y->x::y)
loop ys newPartial
let length = List.length x
loop matrix (List.init length (fun _ -> [] ))
|> List.map(fun x->x |> List.rev)

How do I write a ZipN-like function in F#?

I want to create a function with the signature seq<#seq<'a>> ->seq<seq<'a>> that acts like a Zip method taking a sequence of an arbitrary number of input sequences (instead of 2 or 3 as in Zip2 and Zip3) and returning a sequence of sequences instead of tuples as a result.
That is, given the following input:
[[1;2;3];
[4;5;6];
[7;8;9]]
it will return the result:
[[1;4;7];
[2;5;8];
[3;6;9]]
except with sequences instead of lists.
I am very new to F#, but I have created a function that does what I want, but I know it can be improved. It's not tail recursive and it seems like it could be simpler, but I don't know how yet. I also haven't found a good way to get the signature the way I want (accepting, e.g., an int list list as input) without a second function.
I know this could be implemented using enumerators directly, but I'm interested in doing it in a functional manner.
Here's my code:
let private Tail seq = Seq.skip 1 seq
let private HasLengthNoMoreThan n = Seq.skip n >> Seq.isEmpty
let rec ZipN_core = function
| seqs when seqs |> Seq.isEmpty -> Seq.empty
| seqs when seqs |> Seq.exists Seq.isEmpty -> Seq.empty
| seqs ->
let head = seqs |> Seq.map Seq.head
let tail = seqs |> Seq.map Tail |> ZipN_core
Seq.append (Seq.singleton head) tail
// Required to change the signature of the parameter from seq<seq<'a> to seq<#seq<'a>>
let ZipN seqs = seqs |> Seq.map (fun x -> x |> Seq.map (fun y -> y)) |> ZipN_core

let zipn items = items |> Matrix.Generic.ofSeq |> Matrix.Generic.transpose
Or, if you really want to write it yourself:
let zipn items =
let rec loop items =
seq {
match items with
| [] -> ()
| _ ->
match zipOne ([], []) items with
| Some(xs, rest) ->
yield xs
yield! loop rest
| None -> ()
}
and zipOne (acc, rest) = function
| [] -> Some(List.rev acc, List.rev rest)
| []::_ -> None
| (x::xs)::ys -> zipOne (x::acc, xs::rest) ys
loop items

Since this seems to be the canonical answer for writing a zipn in f#, I wanted to add a "pure" seq solution that preserves laziness and doesn't force us to load our full source sequences in memory at once like the Matrix.transpose function. There are scenarios where this is very important because it's a) faster and b) works with sequences that contain 100s of MBs of data!
This is probably the most un-idiomatic f# code I've written in a while but it gets the job done (and hey, why would there be sequence expressions in f# if you couldn't use them for writing procedural code in a functional language).
let seqdata = seq {
yield Seq.ofList [ 1; 2; 3 ]
yield Seq.ofList [ 4; 5; 6 ]
yield Seq.ofList [ 7; 8; 9 ]
}
let zipnSeq (src:seq<seq<'a>>) = seq {
let enumerators = src |> Seq.map (fun x -> x.GetEnumerator()) |> Seq.toArray
if (enumerators.Length > 0) then
try
while(enumerators |> Array.forall(fun x -> x.MoveNext())) do
yield enumerators |> Array.map( fun x -> x.Current)
finally
enumerators |> Array.iter (fun x -> x.Dispose())
}
zipnSeq seqdata |> Seq.toArray
val it : int [] [] = [|[|1; 4; 7|]; [|2; 5; 8|]; [|3; 6; 9|]|]
By the way, the traditional matrix transpose is much more terse than #Daniel's answer. Though, it requires a list or LazyList that both will eventually have the full sequence in memory.
let rec transpose =
function
| (_ :: _) :: _ as M -> List.map List.head M :: transpose (List.map List.tail M)
| _ -> []

To handle having sub-lists of different lengths, I've used option types to spot if we've run out of elements.
let split = function
| [] -> None, []
| h::t -> Some(h), t
let rec zipN listOfLists =
seq { let splitted = listOfLists |> List.map split
let anyMore = splitted |> Seq.exists (fun (f, _) -> f.IsSome)
if anyMore then
yield splitted |> List.map fst
let rest = splitted |> List.map snd
yield! rest |> zipN }
This would map
let ll = [ [ 1; 2; 3 ];
[ 4; 5; 6 ];
[ 7; 8; 9 ] ]
to
seq
[seq [Some 1; Some 4; Some 7]; seq [Some 2; Some 5; Some 8];
seq [Some 3; Some 6; Some 9]]
and
let ll = [ [ 1; 2; 3 ];
[ 4; 5; 6 ];
[ 7; 8 ] ]
to
seq
[seq [Some 1; Some 4; Some 7]; seq [Some 2; Some 5; Some 8];
seq [Some 3; Some 6; null]]
This takes a different approach to yours, but avoids using some of the operations that you had before (e.g. Seq.skip, Seq.append), which you should be careful with.

I realize that this answer is not very efficient, but I do like its succinctness:
[[1;2;3]; [4;5;6]; [7;8;9]]
|> Seq.collect Seq.indexed
|> Seq.groupBy fst
|> Seq.map (snd >> Seq.map snd);;

Another option:
let zipN ls =
let rec loop (a,b) =
match b with
|l when List.head l = [] -> a
|l ->
let x1,x2 =
(([],[]),l)
||> List.fold (fun acc elem ->
match acc,elem with
|(ah,at),eh::et -> ah#[eh],at#[et]
|_ -> acc)
loop (a#[x1],x2)
loop ([],ls)

More volatile sequence than "classical"

For cartesian production there is a good enough function - sequence which defined like that:
let rec sequence = function
| [] -> Seq.singleton []
| (l::ls) -> seq { for x in l do for xs in sequence ls do yield (x::xs) }
but look at its result:
sequence [[1..2];[1..10000]] |> Seq.skip 1000 ;;
val it : seq = seq [[1; 1001]; [1; 1002]; [1; 1003]; [1; 1004]; ...]
As we can see the first "coordinate" of the product alters very slowly and it will change the value when the second list is ended.
I wrote my own sequence as following (comments below):
/// Sum of all producted indeces = n
let rec hyper'plane'indices indexsum maxlengths =
match maxlengths with
| [x] -> if indexsum < x then [[indexsum]] else []
| (i::is) -> [for x in [0 .. min indexsum (i-1)] do for xs in hyper'plane'indices (indexsum-x) is do yield (x::xs)]
| [] -> [[]]
let finite'sequence = function
| [] -> Seq.singleton []
| ns ->
let ars = [ for n in ns -> Seq.toArray n ]
let length'list = List.map Array.length ars
let nmax = List.max length'list
seq {
for n in [0 .. nmax] do
for ixs in hyper'plane'indices n length'list do
yield (List.map2 (fun (a:'a[]) i -> a.[i]) ars ixs)
}
The key idea is to look at (two) lists as at (two) orthogonal dimensions where every element marked by its index in the list. So we can enumerate all elements by enumerating every element in every section of cartesian product by hyper plane (in 2D case this is a line). In another words imagine excel's sheet where first column contains values from [1;1] to [1;10000] and second - from [2;1] to [2;10000]. And "hyper plane" with number 1 is the line that connects cell A2 and cell B1. For the our example
hyper'plane'indices 0 [2;10000];; val it : int list list = [[0; 0]]
hyper'plane'indices 1 [2;10000];; val it : int list list = [[0; 1]; [1; 0]]
hyper'plane'indices 2 [2;10000];; val it : int list list = [[0; 2]; [1; 1]]
hyper'plane'indices 3 [2;10000];; val it : int list list = [[0; 3]; [1; 2]]
hyper'plane'indices 4 [2;10000];; val it : int list list = [[0; 4]; [1; 3]]
Well if we have indeces and arrays that we are producing from the given lists than we can now define sequence as {all elements in plane 0; than all elements in plane 1 ... and so on } and get more volatile function than original sequence.
But finite'sequence turned out very gluttonous function. And now the question. How I can improve it?
With best wishes, Alexander. (and sorry for poor English)

Can you explain what exactly is the problem - time or space complexity or performance? Do you have a specific benchmark in mind? I am not sure how to improve on the time complexity here, but I edited your code a bit to remove the intermediate lists, which might help a bit with memory allocation behavior.
Do not do this:
for n in [0 .. nmax] do
Do this instead:
for n in 0 .. nmax do
Here is the code:
let rec hyper'plane'indices indexsum maxlengths =
match maxlengths with
| [] -> Seq.singleton []
| [x] -> if indexsum < x then Seq.singleton [indexsum] else Seq.empty
| i :: is ->
seq {
for x in 0 .. min indexsum (i - 1) do
for xs in hyper'plane'indices (indexsum - x) is do
yield x :: xs
}
let finite'sequence xs =
match xs with
| [] -> Seq.singleton []
| ns ->
let ars = [ for n in ns -> Seq.toArray n ]
let length'list = List.map Array.length ars
let nmax = List.max length'list
seq {
for n in 0 .. nmax do
for ixs in hyper'plane'indices n length'list do
yield List.map2 Array.get ars ixs
}
Does this fare any better? Beautiful problem by the way.
UPDATE: Perhaps you are more interested to mix the sequences fairly than in maintaining the exact formula in your algorithm. Here is a Haskell code that mixes a finite number of possibly infinite sequences fairly, where fairness means that for every input element there is a finite prefix of the output sequence that contains it. You mention in the comment that you have a 2D incremental solution that is hard to generalize to N dimensions, and the Haskell code does exactly that:
merge :: [a] -> [a] -> [a]
merge [] y = y
merge x [] = x
merge (x:xs) (y:ys) = x : y : merge xs ys
prod :: (a -> b -> c) -> [a] -> [b] -> [c]
prod _ [] _ = []
prod _ _ [] = []
prod f (x:xs) (y:ys) = f x y : a `merge` b `merge` prod f xs ys where
a = [f x y | x <- xs]
b = [f x y | y <- ys]
prodN :: [[a]] -> [[a]]
prodN [] = [[]]
prodN (x:xs) = prod (:) x (prodN xs)
I have not ported this to F# yet - it requires some thought as sequences do not match to head/tail very well.
UPDATE 2:
A fairly mechanical translation to F# follows.
type Node<'T> =
| Nil
| Cons of 'T * Stream<'T>
and Stream<'T> = Lazy<Node<'T>>
let ( !! ) (x: Lazy<'T>) = x.Value
let ( !^ ) x = Lazy.CreateFromValue(x)
let rec merge (xs: Stream<'T>) (ys: Stream<'T>) : Stream<'T> =
lazy
match !!xs, !!ys with
| Nil, r | r, Nil -> r
| Cons (x, xs), Cons (y, ys) -> Cons (x, !^ (Cons (y, merge xs ys)))
let rec map (f: 'T1 -> 'T2) (xs: Stream<'T1>) : Stream<'T2> =
lazy
match !!xs with
| Nil -> Nil
| Cons (x, xs) -> Cons (f x, map f xs)
let ( ++ ) = merge
let rec prod f xs ys =
lazy
match !!xs, !!ys with
| Nil, _ | _, Nil -> Nil
| Cons (x, xs), Cons (y, ys) ->
let a = map (fun x -> f x y) xs
let b = map (fun y -> f x y) ys
Cons (f x y, a ++ b ++ prod f xs ys)
let ofSeq (s: seq<'T>) =
lazy
let e = s.GetEnumerator()
let rec loop () =
lazy
if e.MoveNext()
then Cons (e.Current, loop ())
else e.Dispose(); Nil
!! (loop ())
let toSeq stream =
stream
|> Seq.unfold (fun stream ->
match !!stream with
| Nil -> None
| Cons (x, xs) -> Some (x, xs))
let empty<'T> : Stream<'T> = !^ Nil
let cons x xs = !^ (Cons (x, xs))
let singleton x = cons x empty
let rec prodN (xs: Stream<Stream<'T>>) : Stream<Stream<'T>> =
match !!xs with
| Nil -> singleton empty
| Cons (x, xs) -> prod cons x (prodN xs)
let test () =
ofSeq [
ofSeq [1; 2; 3]
ofSeq [4; 5; 6]
ofSeq [7; 8; 9]
]
|> prodN
|> toSeq
|> Seq.iter (fun xs ->
toSeq xs
|> Seq.map string
|> String.concat ", "
|> stdout.WriteLine)

Swapping every pair of items in an F# list

I'm positive that there is a better way to swap items in a list by pairs ( [1;2;3;4] -> [2;1;4;3] ) as I'm doing too many appends for my liking but I'm not sure how best to do it.
let swapItems lst =
let f acc item =
match acc with
| [] -> [item]
| hd :: next :: tl when tl <> [] -> [next] # tl # [item;hd]
| _ -> item :: acc
List.fold f [] lst
How can I improve this? This only works on lists that have an even length.

Simplest possible solution:
let rec swapItems = function
| a::b::xs -> b::a::swapItems xs
| xs -> xs
I like to make the names of variables that are sequences like lists "plural", e.g. xs instead of x.
Note that this is not tail recursive so it will stack overflow if you give it a very long list.

What about this:
let rec swapItems = function
| []
| _::[] as l -> l
| a::b::t ->
b::a::(swapItems t)
?

Using higher order functions this can be done as:
let swapItems l =
l |> List.toSeq |> Seq.pairwise
|> Seq.mapi (fun i (a,b) -> if i % 2 = 0 then seq [b;a] else Seq.empty)
|> Seq.concat |> Seq.toList

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Deedle - Weighted Average after filtering FilterRowValues - f#

Related

F# list group by running total?

Matrix transposition in F#

How do I write a ZipN-like function in F#?

More volatile sequence than "classical"

Swapping every pair of items in an F# list

Categories

Resources