need to take the earliest date of each month - f#

I have a dataset with dates shown below. I need to take the earliest date from each month for all years. How would I go about doing this?
[(1/2/2004 12:00:00 AM);(1/5/2004 12:00:00 AM);
(1/6/2004 12:00:00 AM);(1/7/2004 12:00:00 AM);
(1/8/2004 12:00:00 AM);(1/9/2004 12:00:00 AM);
(1/12/2004 12:00:00 AM);(1/13/2004 12:00:00 AM);
(1/14/2004 12:00:00 AM);(1/15/2004 12:00:00 AM);
(1/16/2004 12:00:00 AM);(1/19/2004 12:00:00 AM);
(1/20/2004 12:00:00 AM);(1/21/2004 12:00:00 AM);
(1/22/2004 12:00:00 AM);(1/23/2004 12:00:00 AM);
(1/26/2004 12:00:00 AM);(1/27/2004 12:00:00 AM);
(1/28/2004 12:00:00 AM);(1/29/2004 12:00:00 AM);
(1/30/2004 12:00:00 AM);(2/2/2004 12:00:00 AM)]
Dataset continues on. Too large to paste here.
EDITED:
let data =
Datacsv.GetSample().Rows
|> Seq.map (fun ((yr,mon),(name,price))
For (name,price) on the last row, I get the error:
TermStructure.fsx(33,36): error FS0001: This expression was expected to have type
'CsvProvider<...>.Row'
but here has type
''a * 'b'

You're almost there. But your last line, with (name, price), isn't quite right. When you call Seq.head data, what is data at that point? Answer: a list of rows. So Seq.head will give you one CSV row. You can't match a CSV row to a (name, price) tuple, and that's what the error message is telling you. Since the input is a row, it was expecting a function that takes ((yr, mon), row) and you gave it ((yr, mon), (name, price)).
In your case, I'd probably back up a step and instead of just doing Seq.head data and feeding that into another Seq.map, I'd do it in one operation. I'll show you my suggestions one step at a time. First, here's the code that you wrote, which is getting the error:
let data =
Datacsv.GetSample().Rows
|> Seq.groupBy (fun row -> row.DATE.Year,row.DATE.Month)
|> Seq.map (fun ((yr,mon),data) -> ((yr, mon), Seq.head data))
|> Seq.map (fun ((yr,mon),(name,price))
My first thought is that having two different things named data in your code is going to be confusing. So let's change the variable name in the first Seq.map. The thing you've named data in that function is a sequence of rows, so let's call it rows instead:
let data =
Datacsv.GetSample().Rows
|> Seq.groupBy (fun row -> row.DATE.Year,row.DATE.Month)
|> Seq.map (fun ((yr,mon),rows) -> ((yr, mon), Seq.head rows))
|> Seq.map (fun ((yr,mon),(name,price))
Now we'll fix the second Seq.map by removing it and merging its operation into the first Seq.map, as follows:
let data =
Datacsv.GetSample().Rows
|> Seq.groupBy (fun row -> row.DATE.Year,row.DATE.Month)
|> Seq.map (fun ((yr,mon),rows) ->
let row = Seq.head rows
((yr, mon), (row.NAME, row.PRICE)))
Note that I'm assuming that since your CSV file has a column called DATE in all caps (which has translated to a DATE property on your row objects in F#), it also contains NAME and PRICE columns in all caps too. If those columns are named something else, adjust the row.NAME and row.PRICE parts of the code accordingly.

Related

Convert Nested Dictionaries to Deedle Frame

A nested dictionary contains the following data: {names, {dates, prices}}
The structure is so:
type Dict<'T, 'U> = System.Collections.Generic.Dictionary<'T, 'U>
let masterDict = Dict<string, Dict<Datetime, float>> ()
The raw data looks like:
> masterDict.Keys |> printfn "%A"
seq ["Corn Future"; "Wheat Future"]
> masterDict.["Corn Future"] |> printfn "%A"
seq [[2009-09-01, 316.69]; [2009-09-02, 316.09]; [2009-09-03, 316.33]; ...]
> masterDict.["Wheat Future"] |> printfn "%A"
seq [[2009-09-01, 214.4]; [2009-09-02, 223.86]; [2009-09-03,
234.11]; [2009-09-04, 224.62]; ...]
I'm trying to full outer join the data above into a Deedle frame like so:
Corn Future Wheat Future
2009-09-01 316.69 214.4
2009-09-02 316.09 223.86
2009-09-03 316.33 234.11
2009-09-04 NaN 224.62 // in case a point is not available
The mechanics of Deedle are still alien to me. Any help would be appreciated.
There are some extension methods in Deedle library (mainly to make it friendly to C# too), which work with KeyValuePair as opposed to tuples (which is the default for F#).
So you should be able to simplify the answer that Foggy Finder posted a little (assuming you have open Deedle at the top):
let frame =
masterDict
|> Seq.map(fun kv -> kv.Key, kv.Value.ToSeries())
|> Frame.ofColumns
frame.Format() |> printfn "%s"
Not sure what you are going to join, but you can just transform:
let frame =
masterDict
|> Seq.map(fun kv -> kv.Key, kv.Value
|> Seq.map(fun nkv -> nkv.Key, nkv.Value)
|> Series.ofObservations)
|> Frame.ofColumns
frame.Format() |> printfn "%s"
Then you got:
Corn Future Wheat Future
01.09.2009 0:00:00 -> 316,69 214,4
02.09.2009 0:00:00 -> 316,09 223,86
03.09.2009 0:00:00 -> 316,33 234,11
04.09.2009 0:00:00 -> <missing> 224,62

GroupBy Year then take Pairwise diffs except for the head value then Flatten Using Deedle and F#

I have the following variable:
data:seq<(DateTime*float)>
and I want to do something like the following F# code but using Deedle:
data
|> Seq.groupBy (fun (k,v) -> k.Year)
|> Seq.map (fun (k,v) ->
let vals = v |> Seq.pairwise
let first = seq { yield v |> Seq.head }
let diffs = vals |> Seq.map (fun ((t0,v0),(t1,v1)) -> (t1, v1 - v0))
(k, diffs |> Seq.append first))
|> Seq.collect snd
This works fine using F# sequences but I want to do it using Deedle series. I know I can do something like:
(data:Series<DateTime*float>) |> Series.groupBy (fun k v -> k.Year)...
But then I need to take the within group year diffs except for the head value which should just be the value itself and then flatten the results into on series...I am bit confused with the deedle syntax
Thanks!
I think the following might be doing what you need:
ts
|> Series.groupInto
(fun k _ -> k.Month)
(fun m s ->
let first = series [ fst s.KeyRange => s.[fst s.KeyRange]]
Series.merge first (Series.diff 1 s))
|> Series.values
|> Series.mergeAll
The groupInto function lets you specify a function that should be called on each of the groups
For each group, we create series with the differences using Series.diff and append a series with the first value at the beginning using Series.merge.
At the end, we get all the nested series & flatten them using Series.mergeAll.

Joining two lists of records and calculating a result

I have two lists of records with the following types:
type AverageTempType = {Date: System.DateTime; Year: int64; Month: int64; AverageTemp: float}
type DailyTempType = {Date: System.DateTime; Year: int64; Month: int64; Day: int64; DailyTemp: float}
I want to get a new list which is made up of the DailyTempType "joined" with the AverageTempType. Ultimately though for each daily record I want the Daily Temp - Average temp for the matching month.
I think I can do this with loops as per below and massage this into a reasonable output:
let MatchLoop =
for i in DailyData do
for j in AverageData do
if (i.Year = j.Year && i.Month = j.Month)
then printfn "%A %A %A %A %A" i.Year i.Month i.Day i.DailyTemp j.Average
else printfn "NOMATCH"
I have also try to do this with matching but I can't quite get there (I'm not sure how to define the list correctly in the input type and then iterate to get a result. Also I'm not sure sure if this approach even makes sense):
let MatchPattern (x:DailyTempType) (y:AverageTempType) =
match (x,y) with
|(x,y) when (x.Year = y.Year && x.Month = y.Month) ->
printfn "match"
|(_,_) -> printfn "nomatch"
I have looked into Deedle which I think can do this relatively easily but I am keen to understand how to do it a lower level.
What you can do is to create a map of the monthly average data. You can think of a map as a read-only dictionary:
let averageDataMap =
averageData
|> Seq.map (fun x -> ((x.Year, x.Month), x))
|> Map.ofSeq
This particular map is a Map<(int64 * int64), AverageTempType>, which, in plainer words, means that the keys into the map are tuples of year and month, and the value associated with each key is an AverageTempType record.
This enables you to find all the matching month data, based on the daily data:
let matches =
dailyData
|> Seq.map (fun x -> (x, averageDataMap |> Map.tryFind (x.Year, x.Month)))
Here, matches has the data type seq<DailyTempType * AverageTempType option>. Again, in plainer words, this is a sequence of tuples, where the first element of each tuple is the original daily observation, and the second element is the corresponding monthly average, if a match was found, or None if no matching monthly average was found.
If you want to print the values as in the OP, you can do this:
matches
|> Seq.map snd
|> Seq.map (function | Some _ -> "Match" | None -> "No match")
|> Seq.iter (printfn "%s")
This expression starts with the matches; then pulls out the second element of each tuple; then again maps a Some value to the string "Match", and a None value to the string "No match"; and finally prints each string.
I would convert first AverageTempType seq to a Map (reducing cost of join):
let toMap (avg:AverageTempType seq) = avg |> Seq.groupBy(fun a -> a.Year + a.Month) |> Map.ofSeq
Then you can join and return an option, so consuming code can do whatever you want (print, store, error, etc.):
let join (avg:AverageTempType seq) (dly:DailyTempType seq) =
let avgMap = toMap avg
dly |> Seq.map (fun d -> d.Year, d.Month, d.Day, d.DailyTemp, Map.tryFind (d.Year + d.Month) avgMap);;

Merge multiple arrays in f#

I have three sets of information that I need to join together into one array so I can calculate a payment.
Dataset 1:
FromDate, ToDate
2013-04-10, 2013-04-16
(i'm currently creating a 2D array of the dates between these two dates using the following)
let CalculatedLOS : int = ToDate.Value.Subtract(FromDate.Value).Days
let internalArray = Array2D.init CalculatedDays, 3, (fun x -> (AdmissionDateValue.AddDays(x),0,0))
Dataset 2: These are separated as: code, date | code, date
87789,2013-04-10|35444,2013-04-14
Dataset 3: These are separated as date, differentcode | date, differentcode
2013-04-10,SE|2013-04-15,EA
What I need to do is somehow match up the dates with the relevant index in the array that is created from the FromDate and ToDate and update the 2nd and 3rd position with the code and differentcode that match to that date.
So I would hopefully end up with a dataset that looked like this
[2013-04-10; 87789; SE][2013-04-11;;][2013-04-12;;][2013-04-13;;][2013-04-14;87789;][2013-04-15;;EA][2013-04-16;;]
I would then iterate over this array to lookup some values and assign a payment based on each day.
I've tried Array.find within a loop to update 2D arrays but I'm not sure how to do it (code below which did not work) but I'm really stuck about how to do this, or even if this is the best way.
let differentCodeArray = MyLongString.Value.Split('|')
for i in 0 .. bedStaysArray.Length - 1 do
Array.find(fun elem -> bedStaysArray.[0].ToString() elem) internalArray
Also happy to be directed away from arrays if there's a better way!
Here is one way of doing it, given i understand your question. The code have a dependency on the 'correct' DateFormat beeing used.
Full example, dataset1, dataset2, dataset3 are your given inputs.
//Given data
let dataset1 = "2013-04-10, 2013-04-16"
let dataset2 = "87789,2013-04-10|35444,2013-04-14"
let dataset3 = "2013-04-10,SE|2013-04-15,EA"
//Extract data
let keyValuePair (c:char) (str:string) = let [|a;b|] = str.Split(c) in a,b
let mapTuple fn a = fn (fst a), fn (snd a)
let date1,date2 = keyValuePair ',' dataset1 |> mapTuple System.DateTime.Parse
let data2 =
dataset2.Split('|')
|> Seq.map (keyValuePair ',')
|> Seq.map (fun (code, date) -> System.DateTime.Parse date, code)
|> Map.ofSeq
let data3 =
dataset3.Split('|')
|> Seq.map (keyValuePair ',')
|> Seq.map (fun (date, code) -> System.DateTime.Parse date, code)
|> Map.ofSeq
let rec dateSeq (a:System.DateTime) (b:System.DateTime) =
seq {
yield a.Date
if a < b then yield! dateSeq (a.AddDays(1.0)) b
}
//join data
let getCode data key = match data |> Map.tryFind key with |Some v -> v |None -> ""
let result =
dateSeq date1 date2
|> Seq.map (fun d -> d, getCode data2 d, getCode data3 d)
|> Seq.toList
//Format result
result |> List.iter ((fun (date, code1, code2) -> printfn "[%s;%s;%s]" (date.ToShortDateString()) code1 code2))
Console output:
[2013-04-10;87789;SE]
[2013-04-11;;]
[2013-04-12;;]
[2013-04-13;;]
[2013-04-14;35444;]
[2013-04-15;;EA]
[2013-04-16;;]

FP - Condense and 'nice' code

Writing code in F# in most cases results in very condense an intuitive work. This piece of code looks somehow imperative and inconvenient to me.
times is an array of float values
Lines inside the file times.csv always look like that:
Mai 06 2011 05:43:45 nachm.,00:22.99
Mai 04 2011 08:59:12 nachm.,00:22.73
Mai 04 2011 08:58:27 nachm.,00:19.38
Mai 04 2011 08:57:54 nachm.,00:18.00
average generates an average of the values, dropping the lowest and highest time
getAllSubsetsOfLengthN creates a sequence of all consecutive subsets of length n. Is there a 'nicer' solution to that? Or does already exist something like that inside the F# core?
bestAverageOfN finds the lowest average of all the subsets
let times =
File.ReadAllLines "times.csv"
|> Array.map (fun l -> float (l.Substring((l.LastIndexOf ':') + 1)))
let average set =
(Array.sum set - Array.min set - Array.max set) / float (set.Length - 2)
let getAllSubsetsOfLengthN n (set:float list) =
seq { for i in [0 .. set.Length - n] -> set
|> Seq.skip i
|> Seq.take n }
let bestAverageOfN n =
times
|> Array.toList
|> getAllSubsetsOfLengthN n
|> Seq.map (fun t -> t
|> Seq.toArray
|> average)
|> Seq.min
What I am looking for are nicer, shorter or easier solutions. Every useful post will be upvoted, of course :)
I guess, getAllSubsetsOfLengthN can be replaced with Seq.windowed
so bestAverageOfN will look like:
let bestAverageOfN n =
times
|> Seq.windowed n
|> Seq.map average
|> Seq.min
Without much thinking, there are some basic functional refactorings you can make. For example, in the calculation of bestAverageOfN, you can use function composition:
let bestAverageOfN n =
times
|> Array.toList
|> getAllSubsetsOfLengthN n
|> Seq.map (Seq.toArray >> average)
|> Seq.min
Other than this and the suggestion by desco, I don't think there is anything I would change. If you don't use your special average function anywhere in the code, you could write it inline as a lambda function, but that really depends on your personal preferences.
Just for the sake of generality, I would probably make times an argument of bestAverageOfN:
let bestAverageOfN n times =
times
|> Seq.windowed n
|> Seq.map (fun set ->
(Array.sum set - Array.min set - Array.max set) / float (set.Length - 2))
|> Seq.min
Since you mentioned regex for parsing your input, I thought I'd show you such a solution. It may well be overkill, but it is also a more functional solution since regular expressions are declarative while substring stuff is more imperative. Regex is also nice since it is easier to grow if the structure of your input changes, index substring stuff can get messy, and I try to avoid it completely.
First a couple active patterns,
open System.Text.RegularExpressions
let (|Groups|_|) pattern input =
let m = Regex.Match(input, pattern)
if m.Success then
Some([for g in m.Groups -> g.Value] |> List.tail)
else
None
open System
let (|Float|_|) input =
match Double.TryParse(input) with
| true, value -> Some(value)
| _ -> None
Adopting #ildjarn's times implementation:
let times =
File.ReadAllLines "times.csv"
|> Array.map (function Groups #",.*?:(.*)$" [Float(value)] -> value)
Since bestAversageOfN has already been covered, here's an alternative implementation of times:
let times =
File.ReadAllLines "times.csv"
|> Array.map (fun l -> l.LastIndexOf ':' |> (+) 1 |> l.Substring |> float)

Resources