Plotting scatter chart with a linear regression - f#

I am trying to display a scatter chart for two columns in a Deedly data frame, ideally grouped by a third column.
And I would like to show a linear regression line on the same chart.
In Python this can be done with seaborn.lmplot https://seaborn.pydata.org/generated/seaborn.lmplot.html
sns.lmplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
I was hoping to do something like that with Plotly.Net, but so far I only got a simple scatterplot:
(
df.["rating"].Values,
df.["calories"].Values
)
||> Seq.zip
|> Chart.Point
How do I add a linear regression line similar to seaborn? Do I need to do it manually somehow?
How do I group the points by a third column? This one I may be able to figure out myself, but I wonder if there is a more elegant solution.

Thanks to Brian Berns' comment, pointing me to this example, I was able to create a helper function that works similar to Python's seaborn.lmplot function.
Here is the code if anyone wants to use it:
// helpers
let getColVector col (df: Frame<'a, 'b>) =
vector <| df.[col].Values
let filterByKey fn (df: Frame<'a, 'b>) =
df.Where(fun (KeyValue(k, _)) -> fn k)
let singleGroupLmplot xCol yCol valuesName df =
let y = df |> getColVector yCol
let x = df |> getColVector xCol
let coefs = OrdinaryLeastSquares.Linear.Univariable.coefficient x y
let fittinFunc x = OrdinaryLeastSquares.Linear.Univariable.fit coefs x
let xRange = [for i in Seq.min(x)..Seq.max(x) -> i]
let yPredicted = [for x in xRange -> fittinFunc x]
let xy = Seq.zip xRange yPredicted
[
Chart.Point(x, y, ShowLegend=true, Name=valuesName)
|> Chart.withXAxisStyle(TitleText=xCol)
|> Chart.withYAxisStyle(TitleText=yCol)
Chart.Line(xy, ShowLegend=true, Name=($"Reg. {valuesName}"))
]
|> Chart.combine
let lmplot xCol yCol hue df =
match hue with
| None ->
[ singleGroupLmplot xCol yCol ($"{xCol} vs {yCol}") df ]
| Some h ->
let groupedDf = df |> Frame.groupRowsByString h
groupedDf.RowKeys
|> Seq.map (fun (g, _) -> g)
|> Seq.distinct
|> List.ofSeq
|> List.map (fun k ->
groupedDf
|> filterByKey (fun (g, _) -> g = k)
|> singleGroupLmplot xCol yCol k
)
|> Chart.combine
|> Chart.withLegendStyle(Orientation=StyleParam.Orientation.Horizontal)
Example of using it to render a scatter plot with regression line for a dataframe:
df |> lmplot "rating" "calories" None
Example of using it to render a scatter plots with regression lines for a dataframe grouped by a row value:
df |> lmplot "rating" "calories" (Some "healthiness")

Related

F#, loop control until n-2

I am currently learning functional programming and F#, and I want to do a loop control until n-2. For example:
Given a list of doubles, find the pairwise average,
e.g. pairwiseAverage [1.0; 2.0; 3.0; 4.0; 5.0] will give [1.5; 2.5; 3.5; 4.5]
After doing some experimenting and searching, I have a few ways to do it:
Method 1:
let pairwiseAverage (data: List<double>) =
[for j in 0 .. data.Length-2 do
yield (data.[j]+data.[j+1])/2.0]
Method 2:
let pairwiseAverage (data: List<double>) =
let averageWithNone acc next =
match acc with
| (_,None) -> ([],Some(next))
| (result,Some prev) -> ((prev+next)/2.0)::result,Some(next))
let resultTuple = List.fold averageWithNone ([],None) data
match resultTuple with
| (x,_) -> List.rev x
Method 3:
let pairwiseAverage (data: List<double>) =
// Get elements from 1 .. n-1
let after = List.tail data
// Get elements from 0 .. n-2
let before =
data |> List.rev
|> List.tail
|> List.rev
List.map2 (fun x y -> (x+y)/2.0) before after
I just like to know if there are other ways to approach this problem. Thank you.
Using only built-ins:
list |> Seq.windowed 2 |> Seq.map Array.average
Seq.windowed n gives you sliding windows of n elements each.
One simple other way is to use Seq.pairwise
something like
list |> Seq.pairwise |> Seq.map (fun (a,b) -> (a+b)/2.0)
The approaches suggested above are appropriate for short windows, like the one in the question. For windows with a length greater than 2 one cannot use pairwise. The answer by hlo generalizes to wider windows and is a clean and fast approach if window length is not too large. For very wide windows the code below runs faster, as it only adds one number and subtracts another one from the value obtained for the previous window. Notice that Seq.map2 (and Seq.map) automatically deal with sequences of different lengths.
let movingAverage (n: int) (xs: float List) =
let init = xs |> (Seq.take n) |> Seq.sum
let additions = Seq.map2 (fun x y -> x - y) (Seq.skip n xs) xs
Seq.fold (fun m x -> ((List.head m) + x)::m) [init] additions
|> List.rev
|> List.map (fun (x: float) -> x/(float n))
xs = [1.0..1000000.0]
movingAverage 1000 xs
// Real: 00:00:00.265, CPU: 00:00:00.265, GC gen0: 10, gen1: 10, gen2: 0
For comparison, the function above performs the calculation above about 60 times faster than the windowed equivalent:
let windowedAverage (n: int) (xs: float List) =
xs
|> Seq.windowed n
|> Seq.map Array.average
|> Seq.toList
windowedAverage 1000 xs
// Real: 00:00:15.634, CPU: 00:00:15.500, GC gen0: 74, gen1: 74, gen2: 71
I tried to eliminate List.rev using foldBack but did not succeed.
A point-free approach:
let pairwiseAverage = List.pairwise >> List.map ((<||) (+) >> (*) 0.5)
Online Demo
Usually not a better way, but another way regardless... ;-]

How to cumulate (scan) Deedle data frame values

I'm loading a sequence of records into a deedle data frame (from a database table). Is it possible to accumulate (for example sum cumulatively) the values, and get back a data frame? For example there is Series.scanValues but there is no Frame.scanValues. There is Frame.map, but it didn't do what I expected, it left all values as they were.
#if INTERACTIVE
#r #"Fsharp.Charting"
#load #"..\..\Deedle.fsx"
#endif
open FSharp.Charting
open FSharp.Charting.ChartTypes
open Deedle
type SeriesX = {
DataDate:DateTime
Series1:float
Series2:float
Series3:float
}
let rnd = new System.Random()
rnd.NextDouble() - 0.5
let data =
[for i in [100..-1..1] ->
{SeriesX.DataDate = DateTime.Now.AddDays(float -i)
SeriesX.Series1 = rnd.NextDouble() - 0.5
SeriesX.Series2 = rnd.NextDouble() - 0.5
SeriesX.Series3 = rnd.NextDouble() - 0.5
}
]
# now comes the deedle frame:
let df = data |> Frame.ofRecords
let df = df.IndexRows<DateTime>("DataDate")
df.["Series1"] |> Chart.Line
df.["Series1"].ScanValues((fun acc x -> acc + x),0.0) |> Chart.Line
let df' = df |> Frame.mapValues (Seq.scan (fun acc x -> acc + x) 0.0)
df'.["Series1"] |> Chart.Line
The last two lines just give me back the original values while I would like to have the accumulated values like in df.["Series1"].Scanvalues for Series1, Series2, and Series3.
For filtering and projection, series provides Where and Select methods
and corresponding Series.map and Series.filter functions (there is
also Series.mapValues and Series.mapKeys if you only want to transform
one aspect).
So you just apply your function to each Series:
let allSum =
df.Columns
|> Series.mapValues(Series.scanValues(fun acc v -> acc + (v :?> float)) 0.0)
|> Frame.ofColumns
and use Frame.ofColumns that to convert the result to the Frame.
Edit:
If you need to select only numerics columns, you can use the Frame.getNumericCols:
let allSum =
df
|> Frame.getNumericCols
|> Series.mapValues(Series.scanValues (+) 0.0)
|> Frame.ofColumns
without an explicit type cast code has become more beautiful :)
There is a Series.scanValues function. You can obtain a series from every column in your data frame like this: frame$column, which gets you a Series.
If you need all the columns at once to do the scan, you could first map each row into a single value (a tuple, for example) and the apply the Series.scanValues to that new column.

computing prime factors using same code produces different results?

I am basically trying to compute the factors of a BigInteger that are a prime, I have two simple factorization functions, they both look like they should produce the same result in the way I used them here down below but this is not the case here, can someone explain what is happening?
let lookupTable = new HashSet<int>(primes)
let isPrime x = lookupTable.Contains x
let factors (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..n]
let factors' (n:bigint) =
Seq.filter (fun x -> n % x = 0I) [1I..bigint(sqrt(float(n)))]
600851475143I
|> fun n -> bigint(sqrt(float(n)))
|> factors
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 137
600851475143I
|> factors'
|> Seq.map int
|> Seq.filter isPrime
|> Seq.max // produces 6857 (the right answer)
Your functions are not equivalent. In the first function, the list of candidates goes to n, and the filter function also uses n for remainder calculation. The second function, however, also uses n for remainder calculation, but the candidates list goes to sqrt(n) instead.
To make the second function equivalent, you need to reformulate it like this:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> k % x = 0I) [1I..k]
Update, to clarify this somewhat:
In the above code, notice how k is used in two places: to produce the initial list of candidates and to calculate remainder within the filter function? This is precisely the change I made to your code: my code uses k in both places, but your code uses k in one place, but n in the other.
This is how your original function would look with k:
let factors' (n:bigint) =
let k = bigint(sqrt(float(n)))
Seq.filter (fun x -> n % x = 0I) [1I..k]
Notice how it uses k in one place, but n in the other.

Can I put let statements in an F# list comprehension?

I am trying to write a list comprehension in F# and can't get it to compile:
[for x in xs do
let y = f(x)
when g(y) -> y]
Is there any way to save an intermediate computation in the middle of a list comprehension? How can I rework this list comprehension so that it compiles?
I would just skip the list comprehension.
let ys = xs |> List.map f |> List.filter g
However it is simple enough to get your code working.
let ys = [ for x in xs do
let y = f(x)
if g(y) then yield y ]
To expand on #ChaosPandion's solution, you could also write this using List.choose -- think of it as a combination of List.map and List.filter which avoids creating an extra list (i.e., instead of creating a list with List.map just to pass it to List.filter).
let ys =
xs
|> List.choose (fun x ->
let y = f x
if g y then Some y else None)

How to make this code more compact and idiomatic?

Hullo all.
I am a C# programmer, exploring F# in my free time. I have written the following little program for image convolution in 2D.
open System
let convolve y x =
y |> List.map (fun ye -> x |> List.map ((*) ye))
|> List.mapi (fun i l -> [for q in 1..i -> 0] # l # [for q in 1..(l.Length - i - 1) -> 0])
|> List.reduce (fun r c -> List.zip r c |> List.map (fun (a, b) -> a + b))
let y = [2; 3; 1; 4]
let x = [4; 1; 2; 3]
printfn "%A" (convolve y x)
My question is: Is the above code an idiomatic F#? Can it be made more concise? (e.g. Is there some shorter way to generate a filled list of 0's (I have used list comprehension in my code for this purpose)). Any changes that can improve its performance?
Any help would be greatly appreciated. Thanks.
EDIT:
Thanks Brian. I didn't get your first suggestion. Here's how my code looks after applying your second suggestion. (I also abstracted out the list-fill operation.)
open System
let listFill howMany withWhat = [for i in 1..howMany -> withWhat]
let convolve y x =
y |> List.map (fun ye -> x |> List.map ((*) ye))
|> List.mapi (fun i l -> (listFill i 0) # l # (listFill (l.Length - i - 1) 0))
|> List.reduce (List.map2 (+))
let y = [2; 3; 1; 4]
let x = [4; 1; 2; 3]
printfn "%A" (convolve y x)
Anything else can be improved? Awaiting more suggestions...
As Brian mentioned, the use of # is generally problematic, because the operator cannot be efficiently implemented for (simple) functional lists - it needs to copy the entire first list.
I think Brians suggestion was to write a sequence generator that would generate the list at once, but that's a bit more complicated. You'd have to convert the list to array and then write something like:
let convolve y x =
y |> List.map (fun ye -> x |> List.map ((*) ye) |> Array.ofList)
|> List.mapi (fun i l -> Array.init (2 * l.Length - 1) (fun n ->
if n < i || n - i >= l.Length then 0 else l.[n - i]))
|> List.reduce (Array.map2 (+))
In general, if performance is an important concern, then you'll probably need to use arrays anyway (because this kind of problem can be best solved by accessing elements by index). Using arrays is a bit more difficult (you need to get the indexing right), but perfectly fine approach in F#.
Anyway, if you want to write this using lists, then here ara some options. You could use sequence expressions everywhere, which would look like this:
let convolve y (x:_ list) =
[ for i, v1 in x |> List.zip [ 0 .. x.Length - 1] ->
[ yield! listFill i 0
for v2 in y do yield v1 * v2
yield! listFill (x.Length - i - 1) 0 ] ]
|> List.reduce (List.map2 (+))
... or you can also combine the two options and use a nested sequence expression (with yield! to generate zeros and lists) in the lambda function that you're passing to List.mapi:
let convolve y x =
y |> List.map (fun ye -> x |> List.map ((*) ye))
|> List.mapi (fun i l ->
[ for _ in 1 .. i do yield 0
yield! l
for _ in 1 .. (l.Length - i - 1) do yield 0 ])
|> List.reduce (List.map2 (+))
The idiomatic solution would be to use arrays and loops just as you would in C. However, you may be interested in the following alternative solution that uses pattern matching instead:
let dot xs ys =
Seq.map2 (*) xs ys
|> Seq.sum
let convolve xs ys =
let rec loop vs xs ys zs =
match xs, ys with
| x::xs, ys -> loop (dot ys (x::zs) :: vs) xs ys (x::zs)
| [], _::(_::_ as ys) -> loop (dot ys zs :: vs) [] ys zs
| _ -> List.rev vs
loop [] xs ys []
convolve [2; 3; 1; 4] [4; 1; 2; 3]
Regarding the zeroes, how about e.g.
[for q in 0..l.Length-1 -> if q=i then l else 0]
(I haven't tested to verify that is exactly right, but hopefully the idea is clear.) In general, any use of # is a code smell.
Regarding overall performance, for small lists this is probably fine; for larger ones, you might consider using Seq rather than List for some of the intermediate computations, to avoid allocating as many temporary lists along the way.
It looks like maybe the final zip-then-map could be replaced by just a call to map2, something like
... fun r c -> (r,c) ||> List.map2 (+)
or possibly even just
... List.map2 (+)
but I'm away from a compiler so haven't double-checked it.
(fun ye -> x |> List.map ((*) ye))
Really ?
I'll admit |> is pretty, but you could just wrote :
(fun ye -> List.map ((*) ye) x)
Another thing that you could do is fuse the first two maps. l |> List.map f |> List.mapi g = l |> List.mapi (fun i x -> g i (f x)), so incorporating Tomas and Brian's suggestions, you can get something like:
let convolve y x =
let N = List.length x
y
|> List.mapi (fun i ye ->
[for _ in 1..i -> 0
yield! List.map ((*) ye) x
for _ in 1..(N-i-1) -> 0])
|> List.reduce (List.map2 (+))

Resources