What would be an example for Frame.mapCols? - f#

I am trying to create examples for all the methods in Deedle, with mixed success. I was able to provide an example for Frame.filterColValues but not for Frame.mapCols. Stealing from one of Tomas Petricek's numerous writeups, I defined a DataFrame as follows:
open Deedle
let dates =
[ DateTime(2013,1,1);
DateTime(2013,1,4);
DateTime(2013,1,8) ]
let values = [ 10.0; 20.0; 30.0 ]
let first = Series(dates, values)
/// Generate date range from 'first' with 'count' days
let dateRange (first:System.DateTime) count =
seq {for i in 0..(count - 1) -> first.AddDays(float i)}
/// Generate 'count' number of random doubles
let rand count =
let rnd = System.Random()
seq {for i in 0..(count - 1) -> rnd.NextDouble()}
// A series with values for 10 days
let second = Series(dateRange (DateTime(2013,1,1)) 10, rand 10)
// df1 has two columns
let df1 = Frame(["first"; "second"], [first; second])
Next I was able to provide an example for Frame.filterColValues:
let df2 = Frame.filterColValues (fun s -> (s.GetAtAs<double>(0) > 5.0)) df1
// df2 has only one column
Frame.toArray2D(df2)
But I could not (and I tried hard) create an example for Frame.map cols. The best I could come up with was:
let df3 = Frame.mapCols (fun k x -> x.GetAtAs<double>(0)) df1
error FS0001: The type 'double' is not compatible with the type 'ISeries<'a>'
What am I doing wrong? Can someone post an example? Or, even better, point to a place where there are examples for the Deedle methods?

The Frame.mapCols function lets you transform a column series into another column series.
The most basic example is just to return the column series unchanged:
df1
|> Frame.mapCols (fun k col -> col)
As Foggy Finder mentioned in a comment, you can fill all missing values in a column using this - the body of the lambda has to return a series:
df1
|> Frame.mapCols (fun k v -> Series.fillMissingWith 0.0 v)
If you wanted, you could return a series with just a single element (this turns your frame into a frame with one row - containing data from the first row - and original number of columns):
df1
|> Frame.mapCols (fun k col -> series [ 0 => col.GetAtAs<float>(0) ])
In your code snippet, it looks like you wanted just a series (with a single value for each column), which can be done by getting the columns and then using Series.map:
df1.Columns
|> Series.map (fun k col -> col.GetAtAs<float>(0))

Related

Recursive function in F# that determines in a list of n elements of type int, the greater of two adjacent values

I have recently started learning f# and I have a problem with a task like the one in the subject line. I managed to solve this task but not using a recursive function. I have tried to convert my function to a recursive function but it does not work because in the function I create arrays which elements I then change. Please advise me how to convert my function to a recursive function or how else to perform this task.
let list = [8;4;3;3;5;9;-7]
let comp (a,b) = if a>b then a elif b = a then a else b
let maks (b: _ list) =
let x = b.Length
if x % 2 = 0 then
let tab = Array.create ((x/2)) 0
for i = 0 to (x/2)-1 do
tab.[i] <- (comp(b.Item(2*i),b.Item(2*i+1)))
let newlist = tab |> Array.toList
newlist
else
let tab = Array.create (((x-1)/2)+1) 0
tab.[(((x-1)/2))] <- b.Item(x-1)
for i = 0 to ((x-1)/2)-1 do
tab.[i] <- (comp(b.Item(2*i),b.Item(2*i+1)))
let newlist = tab |> Array.toList
newlist
It is worth noting that, if you were doing this not for learning purposes, there is a nice way of doing this using the chunkBySize function:
list
|> List.chunkBySize 2
|> List.map (fun l -> comp(l.[0], l.[l.Length-1]))
This splits the list into chunks of size at most 2. For each chunk, you can then compare the first element with the last element and that is the result you wanted.
If this is a homework question, I don't want to give away the answer, so consider this pseudocode solution instead:
If the list contains at least two elements:
Answer a new list consisting of:
The greater of the first two elements, followed by
Recursively applying the function to the rest of the list
Else the list contains less than two elements:
Answer the list unchanged
Hint: F#'s pattern matching ability makes this easy to implement.
Thanks to your guidance I managed to create the following function:
let rec maks2 (b: _ list,newlist: _ list,i:int) =
let x = b.Length
if x >= 2 then
if x % 2 = 0 then
if i < ((x/2)-1)+1 then
let d = (porownaj(b.Item(2*i),b.Item(2*i+1)))
let list2 = d::newlist
maks2(b,list2,i+1)
else
newlist
else
if i < ((x/2)-1)+1 then
let d = (porownaj(b.Item(2*i),b.Item(2*i+1)))
let list2 = d::newlist
maks2(b,list2,i+1)
else
let list3 = b.Item(x-1)::newlist
list3
else
b
The function works correctly, it takes as arguments list, empty list and index.
The only problem is that the returned list is reversed, i.e. values that should be at the end are at the beginning. How to add items to the end of the list?
You can use pattern matching to match and check/extract lists in one step.A typical recursive function, would look like:
let rec adjGreater xs =
match xs with
| [] -> []
| [x] -> [x]
| x::y::rest -> (if x >= y then x else y) :: adjGreater rest
It checks wether the list is empty, has one element, or has two elements and the remaining list in rest.
Then it builds a new list by either using x or y as the first element, and then compute the result of the remaing rest recursivly.
This is not tail-recursive. A tail-call optimized version would be, that instead of using the result of the recursive call. You would create a new list, and pass the computed valuke so far, to the recursive function. Usually this way, you want to create a inner recursive loop function.
As you only can add values to the top of a list, you then need to reverse the result of the recursive function like this:
let adjGreater xs =
let rec loop xs result =
match xs with
| [] -> result
| [x] -> x :: result
| x::y::rest -> loop rest ((if x >= y then x else y) :: result)
List.rev (loop xs [])

How do I do an efficient temporal join of records in an array?

I would like to join a record to the next record at least X days/minutes/seconds into the future. I need to do this with arrays with a few hundred thousand records. I am open to sequences/lists/arrays but I believe arrays are likely to be fastest.
I can do this quickly in Deedle with Frame.joinAlign JoinKind.Left Lookup.ExactOrGreater, but I have an easier time reasoning about transformations using standard arrays/sequences/lists.
The following example is fine with 1000 records but very slow when 100k. A comment here suggests a binary search but I do not see how to do that here where the search is based on an inequality.
type Test1 = {
Date : DateTime
Value : float
}
type Test2 = {
Date1 :DateTime
Value1 : float
Date2 : DateTime
Value2 : float
}
let rng = System.Random()
let rng2 = System.Random()
let rs =
[| for i = 1 to 1000 do
let baseDay = DateTime(2016,1,1).AddDays(float i)
let actualDay = baseDay.AddDays(float (rng2.Next(7)))
yield {Date = actualDay; Value = rng.NextDouble() }|]
[| for r in rs do
let futureDay = r.Date.AddDays(float 4)
let r2 =
rs
|> Array.filter (fun x -> x.Date > futureDay)
|> Array.tryHead
let nr =
match r2 with
| Some x -> Some {Date1 = r.Date;Value1 = r.Value; Date2=x.Date;Value2 = x.Value}
| None -> None
if nr.IsSome then yield nr.Value |]
The problem is this expression:
let r2 =
rs
|> Array.filter (fun x -> x.Date > futureDay)
|> Array.tryHead
This filters the entire array and creates a new array with all the matching items, when you really just want the first matching item. And this is happening for every r. Try this instead:
let r2 = rs |> Array.tryFind (fun x -> x.Date > futureDay)
N.b. your logic would have been fine if you were dealing with sequences rather than arrays as the filter would have been evaluated lazily, but of course sequences are going to be slower than arrays in general. The thing to keep in mind is that whereas the Seq module is lazy (with some exceptions), when using the Array and List (and Set and Map, etc.) modules, every step in the chain/pipeline will eagerly allocate a new list/array and consequently can be very expensive when working with large collections.
If sorting rs doesn't affect your logic or expected output, a further improvement can be made by using Array.FindIndex to start searching at r's index rather than from the beginning of the array each time:
Array.sortInPlace rs
rs
|> Seq.mapi (fun i r ->
let futureDay = r.Date.AddDays 4.0
let r2Index = Array.FindIndex (rs, i, (fun x -> x.Date > futureDay))
match r2Index with
| -1 -> None
| i' -> let x = rs.[i']
Some { Date1=r.Date; Value1=r.Value; Date2=x.Date; Value2=x.Value })
|> Seq.choose id
|> Array.ofSeq
This should offer a significant improvement over even the Array.tryFind approach as only a handful of array elements will need to be scanned each time.
Here are FSI timings from my ageing tablet with the system under otherwise nil load:
10k elements:
unsorted + Array.filter (original code): 00:00:08.783
unsorted + Array.tryFind: 00:00:03.844
Array.sort + Seq.mapi: 00:00:00.027
100k elements:
unsorted + Array.filter: I didn't bother.
unsorted + Array.tryFind: 00:06:14.288
Array.sort + Seq.mapi: 00:00:00.305

Fill a list with unique random values with F#

I'm trying to learn F# but I'm stuck with a very simple thing.
I would like to create a list with uniques random values that I can display on a console. Let say Random from 1 to 100 and 10 elements in the list.
I've seen here this code F# getting a list of random numbers :
let genRandomNumbers count =
let rnd = System.Random()
List.init count (fun _ -> rnd.Next (1, 100))
let l = genRandomNumbers 10
printfn "%A" l
But how can I make theses numbers be differents ? This is not exactly a duplicate question because I don't find a way to be sure that each number is unique ; Random.Next can generate same numbers...
Here's a very simple solution:
let genRandomNumbers count =
let rnd = System.Random()
let initial = Seq.initInfinite (fun _ -> rnd.Next (1, 100))
initial
|> Seq.distinct
|> Seq.take(count)
|> Seq.toList
Note the Seq.distinct does exactly what you want to get the unique values. Also note that you'll get an issue if you try to get a count larger than 99 because there aren't that many distinct values between 1 and 99!

F# deedle concat string columns

I have a Frame with two columns of String,
let first = Series.ofValues(["a";"b";"c"])
let second = Series.ofValues(["d";"e";"f"])
let df = Frame(["first"; "second"], [first; second])
How do I produce a third column as the concatenation of the two columns?
In python pandas, this can be achieved with simple + operator, but deedle gives error if i do that,
error FS0043: No overloads match for method 'op_Addition'.
It sounds like what you want is to have something that returns something like:
Series.ofValues(["ad"; "be"; "cf"])
Then I think you need to define an addition operator with something like this:
let additionOperator = (fun (a:string) (b:string) -> (a + b))
And then you can add them like this:
Series.zipInto additionOperator first second
I get as the result:
val it : Series<int,string> = series [ 0 => ad; 1 => be; 2 => cf]
However if you are alright with tuples as your result, you can just use:
Series.zip first second
I come across this after facing the same issue, the trick is to get the values as seq and use Seq.map2 to concat the two seqs, my solution is
let first = Series.ofValues(["a";"b";"c"])
let second = Series.ofValues(["d";"e";"f"])
let df = Seq.map2 (fun x y -> x+y) first.Values second.Values
|> Series.ofValues
|> (fun x -> Frame.addCol "third" x (Frame(["first"; "second"], [first; second])))
Result:
df.Print()
first second third
0 -> a d ad
1 -> b e be
2 -> c f cf
I believe this would work... Clearly not the most beautiful way to write it but... Will try to do some time testing later.
let df3c = df |> Frame.mapRows (fun _ b -> b.GetAt(0).ToString() + b.GetAt(1).ToString())
|> (fun a -> Frame.addCol "test" a df)

How to cumulate (scan) Deedle data frame values

I'm loading a sequence of records into a deedle data frame (from a database table). Is it possible to accumulate (for example sum cumulatively) the values, and get back a data frame? For example there is Series.scanValues but there is no Frame.scanValues. There is Frame.map, but it didn't do what I expected, it left all values as they were.
#if INTERACTIVE
#r #"Fsharp.Charting"
#load #"..\..\Deedle.fsx"
#endif
open FSharp.Charting
open FSharp.Charting.ChartTypes
open Deedle
type SeriesX = {
DataDate:DateTime
Series1:float
Series2:float
Series3:float
}
let rnd = new System.Random()
rnd.NextDouble() - 0.5
let data =
[for i in [100..-1..1] ->
{SeriesX.DataDate = DateTime.Now.AddDays(float -i)
SeriesX.Series1 = rnd.NextDouble() - 0.5
SeriesX.Series2 = rnd.NextDouble() - 0.5
SeriesX.Series3 = rnd.NextDouble() - 0.5
}
]
# now comes the deedle frame:
let df = data |> Frame.ofRecords
let df = df.IndexRows<DateTime>("DataDate")
df.["Series1"] |> Chart.Line
df.["Series1"].ScanValues((fun acc x -> acc + x),0.0) |> Chart.Line
let df' = df |> Frame.mapValues (Seq.scan (fun acc x -> acc + x) 0.0)
df'.["Series1"] |> Chart.Line
The last two lines just give me back the original values while I would like to have the accumulated values like in df.["Series1"].Scanvalues for Series1, Series2, and Series3.
For filtering and projection, series provides Where and Select methods
and corresponding Series.map and Series.filter functions (there is
also Series.mapValues and Series.mapKeys if you only want to transform
one aspect).
So you just apply your function to each Series:
let allSum =
df.Columns
|> Series.mapValues(Series.scanValues(fun acc v -> acc + (v :?> float)) 0.0)
|> Frame.ofColumns
and use Frame.ofColumns that to convert the result to the Frame.
Edit:
If you need to select only numerics columns, you can use the Frame.getNumericCols:
let allSum =
df
|> Frame.getNumericCols
|> Series.mapValues(Series.scanValues (+) 0.0)
|> Frame.ofColumns
without an explicit type cast code has become more beautiful :)
There is a Series.scanValues function. You can obtain a series from every column in your data frame like this: frame$column, which gets you a Series.
If you need all the columns at once to do the scan, you could first map each row into a single value (a tuple, for example) and the apply the Series.scanValues to that new column.

Resources