Deedle, multiplying columns of a frame by scalar values - f#

Suppose you have a dictionary of weights that need to by multiplied by the respective column of a deedle frame. In python it would be something like this:
>>> df = pd.DataFrame(np.ones((3,3)),columns=list("abc"))
>>> d = {"a":1, "b":2, "c":3}
>>> df * pd.Series(d)
a b c
0 1.0 2.0 3.0
1 1.0 2.0 3.0
2 1.0 2.0 3.0
How to go about this in Deedle?

Well, I can make it in three lines but definitely not as succint as pandas:
let df = Array2D.create 3 3 1. |> Frame.ofArray2D |> Frame.indexColsWith("abc" |> Seq.toArray)
let multi = dict ['a',1.;'b',2.;'c',3.;]
df |> Frame.getNumericCols |> Series.map (fun idx v -> multi.[idx] * v) |> Frame.ofColumns
Note that I'm creating a sequence of chars from the string array, hence the 'a'.
I vaguely recall that the operators maybe overloaded to work on the dataframe in pandas. You can do the same thing with deedle as well. So you can say df * df', where df' contains your scalars to multiply with.
Something like this:
let df' = Array2D.init 3 3 (fun i j -> float j + 1.) |> Frame.ofArray2D |> Frame.indexColsWith("abc" |> Seq.toArray)
df * df'

Related

Plotting scatter chart with a linear regression

I am trying to display a scatter chart for two columns in a Deedly data frame, ideally grouped by a third column.
And I would like to show a linear regression line on the same chart.
In Python this can be done with seaborn.lmplot https://seaborn.pydata.org/generated/seaborn.lmplot.html
sns.lmplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
I was hoping to do something like that with Plotly.Net, but so far I only got a simple scatterplot:
(
df.["rating"].Values,
df.["calories"].Values
)
||> Seq.zip
|> Chart.Point
How do I add a linear regression line similar to seaborn? Do I need to do it manually somehow?
How do I group the points by a third column? This one I may be able to figure out myself, but I wonder if there is a more elegant solution.
Thanks to Brian Berns' comment, pointing me to this example, I was able to create a helper function that works similar to Python's seaborn.lmplot function.
Here is the code if anyone wants to use it:
// helpers
let getColVector col (df: Frame<'a, 'b>) =
vector <| df.[col].Values
let filterByKey fn (df: Frame<'a, 'b>) =
df.Where(fun (KeyValue(k, _)) -> fn k)
let singleGroupLmplot xCol yCol valuesName df =
let y = df |> getColVector yCol
let x = df |> getColVector xCol
let coefs = OrdinaryLeastSquares.Linear.Univariable.coefficient x y
let fittinFunc x = OrdinaryLeastSquares.Linear.Univariable.fit coefs x
let xRange = [for i in Seq.min(x)..Seq.max(x) -> i]
let yPredicted = [for x in xRange -> fittinFunc x]
let xy = Seq.zip xRange yPredicted
[
Chart.Point(x, y, ShowLegend=true, Name=valuesName)
|> Chart.withXAxisStyle(TitleText=xCol)
|> Chart.withYAxisStyle(TitleText=yCol)
Chart.Line(xy, ShowLegend=true, Name=($"Reg. {valuesName}"))
]
|> Chart.combine
let lmplot xCol yCol hue df =
match hue with
| None ->
[ singleGroupLmplot xCol yCol ($"{xCol} vs {yCol}") df ]
| Some h ->
let groupedDf = df |> Frame.groupRowsByString h
groupedDf.RowKeys
|> Seq.map (fun (g, _) -> g)
|> Seq.distinct
|> List.ofSeq
|> List.map (fun k ->
groupedDf
|> filterByKey (fun (g, _) -> g = k)
|> singleGroupLmplot xCol yCol k
)
|> Chart.combine
|> Chart.withLegendStyle(Orientation=StyleParam.Orientation.Horizontal)
Example of using it to render a scatter plot with regression line for a dataframe:
df |> lmplot "rating" "calories" None
Example of using it to render a scatter plots with regression lines for a dataframe grouped by a row value:
df |> lmplot "rating" "calories" (Some "healthiness")

How to cumulate (scan) Deedle data frame values

I'm loading a sequence of records into a deedle data frame (from a database table). Is it possible to accumulate (for example sum cumulatively) the values, and get back a data frame? For example there is Series.scanValues but there is no Frame.scanValues. There is Frame.map, but it didn't do what I expected, it left all values as they were.
#if INTERACTIVE
#r #"Fsharp.Charting"
#load #"..\..\Deedle.fsx"
#endif
open FSharp.Charting
open FSharp.Charting.ChartTypes
open Deedle
type SeriesX = {
DataDate:DateTime
Series1:float
Series2:float
Series3:float
}
let rnd = new System.Random()
rnd.NextDouble() - 0.5
let data =
[for i in [100..-1..1] ->
{SeriesX.DataDate = DateTime.Now.AddDays(float -i)
SeriesX.Series1 = rnd.NextDouble() - 0.5
SeriesX.Series2 = rnd.NextDouble() - 0.5
SeriesX.Series3 = rnd.NextDouble() - 0.5
}
]
# now comes the deedle frame:
let df = data |> Frame.ofRecords
let df = df.IndexRows<DateTime>("DataDate")
df.["Series1"] |> Chart.Line
df.["Series1"].ScanValues((fun acc x -> acc + x),0.0) |> Chart.Line
let df' = df |> Frame.mapValues (Seq.scan (fun acc x -> acc + x) 0.0)
df'.["Series1"] |> Chart.Line
The last two lines just give me back the original values while I would like to have the accumulated values like in df.["Series1"].Scanvalues for Series1, Series2, and Series3.
For filtering and projection, series provides Where and Select methods
and corresponding Series.map and Series.filter functions (there is
also Series.mapValues and Series.mapKeys if you only want to transform
one aspect).
So you just apply your function to each Series:
let allSum =
df.Columns
|> Series.mapValues(Series.scanValues(fun acc v -> acc + (v :?> float)) 0.0)
|> Frame.ofColumns
and use Frame.ofColumns that to convert the result to the Frame.
Edit:
If you need to select only numerics columns, you can use the Frame.getNumericCols:
let allSum =
df
|> Frame.getNumericCols
|> Series.mapValues(Series.scanValues (+) 0.0)
|> Frame.ofColumns
without an explicit type cast code has become more beautiful :)
There is a Series.scanValues function. You can obtain a series from every column in your data frame like this: frame$column, which gets you a Series.
If you need all the columns at once to do the scan, you could first map each row into a single value (a tuple, for example) and the apply the Series.scanValues to that new column.

How to generate an array of exponential weights?

I am trying to do an "unfold" - (I think), by starting with an initial value, applying some function to it repeatedly, and then getting a sequence as a result.
In this example, I'm trying to start with 1.0, multiply it by .80, and do it 4 times, such that I end up with an array = [| 1.0; 0.80; 0.64; 0.512 |]
VS 2010 says I'm using "i" in an invalid way, and that mutable values cannot be captured by closures - so this function does not compile. Can anyone possibly suggest a clean approach that actually works? Thank you.
let expSeries seed fade n =
//take see and repeatedly multiply it by the fade factor n times...
let mutable i = 0;
let mutable weight = seed;
[| while(i < n) do
yield weight;
weight <- weight * fade |]
let testWeights = expSeries 1.0 0.80 4
let exp_series seed fade n =
Array.init (n) (fun i -> seed * fade ** (float i))
I think this recursive version should work.
let expSeries seed fade n =
let rec buildSeq i weight = seq {
if i < n then
yield weight;
yield! buildSeq (i + 1) (weight * fade)
}
buildSeq 0 seed
|> Seq.toArray
Based on the anwer to this question, you can create an unfold, and take a number of values of it:
let weighed startvalue factor =
startvalue |> Seq.unfold (fun x -> Some (x, factor * x))
let fivevalues = weighed 1.0 .8 |> Seq.take 5
If you want to explicitly use an unfold, here's how:
let expSeries seed fade n =
Seq.unfold
(fun (weight,k) ->
if k > n then None
else Some(weight,(weight*fade, k+1)))
(seed,1)
|> Array.ofSeq
let arr = expSeries 1.0 0.80 4
Note that the reason your original code won't work is that mutable bindings can't be captured by closures, and sequence, list, and array expressions implicitly use closures.

What's the style for immutable set and map in F#

I have just solved problem23 in Project Euler, in which I need a set to store all abundant numbers. F# has a immutable set, I can use Set.empty.Add(i) to create a new set containing number i. But I don't know how to use immutable set to do more complicated things.
For example, in the following code, I need to see if a number 'x' could be written as the sum of two numbers in a set. I resort to a sorted array and array's binary search algorithm to get the job done.
Please also comment on my style of the following program. Thanks!
let problem23 =
let factorSum x =
let mutable sum = 0
for i=1 to x/2 do
if x%i=0 then
sum <- sum + i
sum
let isAbundant x = x < (factorSum x)
let abuns = {1..28123} |> Seq.filter isAbundant |> Seq.toArray
let inAbuns x = Array.BinarySearch(abuns, x) >= 0
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
the updated version:
let problem23b =
let factorSum x =
{1..x/2} |> Seq.filter (fun i->x%i=0) |> Seq.sum
let isAbundant x = x < (factorSum x)
let abuns = Set( {1..28123} |> Seq.filter isAbundant )
let inAbuns x = Set.contains x abuns
let sumable x =
abuns |> Seq.exists (fun a -> inAbuns (x-a))
{1..28123} |> Seq.filter (fun x -> not (sumable x)) |> Seq.sum
This version runs in about 27 seconds, while the first 23 seconds(I've run several times). So an immutable red-black tree actually does not have much speed down compared to a sorted array with binary search. The total number of elements in the set/array is 6965.
Your style looks fine to me. The different steps in the algorithm are clear, which is the most important part of making something work. This is also the tactic I use for solving Project Euler problems. First make it work, and then make it fast.
As already remarked, replacing Array.BinarySearch by Set.contains makes the code even more readable. I find that in almost all PE solutions I've written, I only use arrays for lookups. I've found that using sequences and lists as data structures is more natural within F#. Once you get used to them, that is.
I don't think using mutability inside a function is necessarily bad. I've optimized problem 155 from almost 3 minutes down to 7 seconds with some aggressive mutability optimizations. In general though, I'd save that as an optimization step and start out writing it using folds/filters etc. In the example case of problem 155, I did start out using immutable function composition, because it made testing and most importantly, understanding, my approach easy.
Picking the wrong algorithm is much more detrimental to a solution than using a somewhat slower immutable approach first. A good algorithm is still fast even if it's slower than the mutable version (couch hello captain obvious! cough).
Edit: let's look at your version
Your problem23b() took 31 seconds on my PC.
Optimization 1: use new algorithm.
//useful optimization: if m divides n, (n/m) divides n also
//you now only have to check m up to sqrt(n)
let factorSum2 n =
let rec aux acc m =
match m with
| m when m*m = n -> acc + m
| m when m*m > n -> acc
| m -> aux (acc + (if n%m=0 then m + n/m else 0)) (m+1)
aux 1 2
This is still very much in functional style, but using this updated factorSum in your code, the execution time went from 31 seconds to 8 seconds.
Everything's still in immutable style, but let's see what happens when an array lookup is used instead of a set:
Optimization 2: use an array for lookup:
let absums() =
//create abundant numbers as an array for (very) fast lookup
let abnums = [|1..28128|] |> Array.filter (fun n -> factorSum2 n > n)
//create a second lookup:
//a boolean array where arr.[x] = true means x is a sum of two abundant numbers
let arr = Array.zeroCreate 28124
for x in abnums do
for y in abnums do
if x+y<=28123 then arr.[x+y] <- true
arr
let euler023() =
absums() //the array lookup
|> Seq.mapi (fun i isAbsum -> if isAbsum then 0 else i) //mapi: i is the position in the sequence
|> Seq.sum
//I always write a test once I've solved a problem.
//In this way, I can easily see if changes to the code breaks stuff.
let test() = euler023() = 4179871
Execution time: 0.22 seconds (!).
This is what I like so much about F#, it still allows you to use mutable constructs to tinker under the hood of your algorithm. But I still only do this after I've made something more elegant work first.
You can easily create a Set from a given sequence of values.
let abuns = Set (seq {1..28123} |> Seq.filter isAbundant)
inAbuns would therefore be rewritten to
let inAbuns x = abuns |> Set.mem x
Seq.exists would be changed to Set.exists
But the array implementation is fine too ...
Note that there is no need to use mutable values in factorSum, apart from the fact that it's incorrect since you compute the number of divisors instead of their sum:
let factorSum x = seq { 1..x/2 } |> Seq.filter (fun i -> x % i = 0) |> Seq.sum
Here is a simple functional solution that is shorter than your original and over 100× faster:
let problem23 =
let rec isAbundant i t x =
if i > x/2 then x < t else
if x % i = 0 then isAbundant (i+1) (t+i) x else
isAbundant (i+1) t x
let xs = Array.Parallel.init 28124 (isAbundant 1 0)
let ys = Array.mapi (fun i b -> if b then Some i else None) xs |> Array.choose id
let f x a = x-a < 0 || not xs.[x-a]
Array.init 28124 (fun x -> if Array.forall (f x) ys then x else 0)
|> Seq.sum
The first trick is to record which numbers are abundant in an array indexed by the number itself rather than using a search structure. The second trick is to notice that all the time is spent generating that array and, therefore, to do it in parallel.

Get a random subset from a set in F#

I am trying to think of an elegant way of getting a random subset from a set in F#
Any thoughts on this?
Perhaps this would work: say we have a set of 2x elements and we need to pick a subset of y elements. Then if we could generate an x sized bit random number that contains exactly y 2n powers we effectively have a random mask with y holes in it. We could keep generating new random numbers until we get the first one satisfying this constraint but is there a better way?
If you don't want to convert to an array you could do something like this. This is O(n*m) where m is the size of the set.
open System
let rnd = Random(0);
let set = Array.init 10 (fun i -> i) |> Set.of_array
let randomSubSet n set =
seq {
let i = set |> Set.to_seq |> Seq.nth (rnd.Next(set.Count))
yield i
yield! set |> Set.remove i
}
|> Seq.take n
|> Set.of_seq
let result = set |> randomSubSet 3
for x in result do
printfn "%A" x
Agree with #JohannesRossel. There's an F# shuffle-an-array algorithm here you can modify suitably. Convert the Set into an array, and then loop until you've selected enough random elements for the new subset.
Not having a really good grasp of F# and what might be considered elegant there, you could just do a shuffle on the list of elements and select the first y. A Fisher-Yates shuffle even helps you in this respect as you also only need to shuffle y elements.
rnd must be out of subset function.
let rnd = new Random()
let rec subset xs =
let removeAt n xs = ( Seq.nth (n-1) xs, Seq.append (Seq.take (n-1) xs) (Seq.skip n xs) )
match xs with
| [] -> []
| _ -> let (rem, left) = removeAt (rnd.Next( List.length xs ) + 1) xs
let next = subset (List.of_seq left)
if rnd.Next(2) = 0 then rem :: next else next
Do you mean a random subset of any size?
For the case of a random subset of a specific size, there's a very elegant answer here:
Select N random elements from a List<T> in C#
Here it is in pseudocode:
RandomKSubset(list, k):
n = len(list)
needed = k
result = {}
for i = 0 to n:
if rand() < needed / (n-i)
push(list[i], result)
needed--
return result
Using Seq.fold to construct using lazy evaluation random sub-set:
let rnd = new Random()
let subset2 xs = let insertAt n xs x = Seq.concat [Seq.take n xs; seq [x]; Seq.skip n xs]
let randomInsert xs = insertAt (rnd.Next( (Seq.length xs) + 1 )) xs
xs |> Seq.fold randomInsert Seq.empty |> Seq.take (rnd.Next( Seq.length xs ) + 1)

Resources