If I have a dataset that contains [City, Dealership, Total Cars Sold]. How would I get the top dealer in each city and the number of cars they sold?
The results should look like
City1 Dealership A 2000
City2 Dealership X 1000
etc.
I'm sure it's possible, but I'm not having any luck and it might because i'm approaching the problem the wrong way.
Currently i'm grouping by Dealership and City which creates a Frame<(string*string*int), int> and that gets me
City1 Dealership A 1 -> 2000
City1 Dealership B 2 -> 1000
City2 Dealership X 3 -> 1000
City2 Dealership Y 4 -> 500
etc.
But trying to then get the dealership that does the most deals is where i'm stumped.
Thanks.
I adapted Tomas's answer and output the type as Series<string, (string * int)>
let data = series [
("City1", "Dealership A") => 2000
("City1", "Dealership B") => 1000
("City2", "Dealership X") => 1000
("City2", "Dealership Y") => 500 ]
data
|> Series.groupBy (fun k _ -> fst k)
|> Series.mapValues (fun sr ->
let sorted = sr |> Series.sortBy(fun x -> -x)
let key = sorted |> Series.firstKey |> snd
let value = sorted |> Series.firstValue
key, value )
The output looks like
City1 -> (Dealership A, 2000)
City2 -> (Dealership X, 1000)
EDITED
I assume you have a csv file like this
City,Dealership,TotalCarsSold
City1,Dealership A,2000
City1,Dealership B,1000
City2,Dealership X,1000
City2,Dealership Y,500
This is how I'll do it. Read it as Frame and get the column as Series and apply the same code above to get result.
let df =
Frame.ReadCsv("C:/Temp/dealership.csv")
|> Frame.indexRowsUsing(fun r -> r.GetAs<string>("City"), r.GetAs<string>("Dealership"))
df?TotalCarsSold
|> Series.groupBy (fun k _ -> fst k)
|> Series.mapValues (fun sr ->
let sorted = sr |> Series.sortBy(fun x -> -x)
let key = sorted |> Series.firstKey |> snd
let value = sorted |> Series.firstValue
key, value )
You can do this using the Series.applyLevel function. It takes a series together with a key selector and then it applies a given aggregation to all rows that have the given key. In your case, the key selector just needs to project the dealership from the composed key of the series. Given your sample data:
let data = series [
("City1", "Dealership A") => 2000
("City1", "Dealership B") => 1000
("City2", "Dealership X") => 1000
("City2", "Dealership Y") => 500 ]
You can get the result by using:
data
|> Series.applyLevel (fun (c, d) -> d) Stats.max
Note that Stats.max returns option (which is None for empty series). You can get a series with just numbers using:
data
|> Series.applyLevel (fun (c, d) -> d) (Stats.max >> Option.get)
Related
My data is below. There are three columns that I use, and I want to be able to weight the income, by how many people make that income. There are multiple instances of State, because each income is in a different band. For example:
State Income Pop
AL 45000 8500
AL 78000 7800
AL 80000 1200
TX 500000 500
TX 100000 700
TX 40000 8000
MO 100000 7000
MO 780000 1000
MO 79000 1500
I want to weight income by the number of people out of the population that is in the band of income.
So for AL, I need:
45000 * 8500/(8500+7800+1200) +
78000 * 7800/(8500+7800+1200) +
80000 * 1200/(8500+7800+1200) = The Total <- this is the number I need, PER State
Any suggestions?
Maybe something like this...
type Data =
{State : string
Income : float
Pop : float }
let data =
[{State="AL"; Income=45000.; Pop=8500.};
{State="AL"; Income=78000.; Pop=7800.};
{State="AL"; Income=80000.; Pop=1200.};
{State="TX"; Income=500000.;Pop= 500.};
{State="TX"; Income=100000.;Pop= 700.};
{State="TX"; Income=40000.; Pop=8000.};
{State="MO"; Income=100000.;Pop= 7000.};
{State="MO"; Income=780000.;Pop= 1000.};
{State="MO"; Income=79000.; Pop=1500.} ]
data
|> List.map(fun r -> r.State)
|> List.distinct
|> List.map (fun state ->
let stateRecords = data |> List.filter (fun r -> r.State = state)
let statePopulation= stateRecords |> List.map (fun r -> r.Pop) |> List.sum
let avg = stateRecords |> List.map (fun r -> r.Income * r.Pop / statePopulation) |> List.sum
(state, avg)
)
Another option
data
|> List.groupBy (fun x -> x.State)
|> List.map
(fun (state, grp) ->
let n, d =
List.fold
(fun (n, d) v ->
n + v.Pop * v.Income, d + v.Pop)
(0.0, 0.0) grp
state, n / d)
If your data is sorted by state I guess it may be better for performance to use some fold function "right away" instead of call groupBy first.
I have the following type:
type Multiset<'a when 'a: comparison> = MSet of Map<'a, int>
I want to declare a function for this type that subtracts two MSets.
Let's say I have the following two Multisets:
let f = MSet (Map.ofList [("a",1);("b",2);("c",1)])
let g = MSet (Map.ofList [("a",1);("b",3);("c",1)])
I have now tried to create this subtract function which takes two Multisets.
let subtract fms sms =
match fms with
| MSet fs -> match sms with
| MSet ss ->
let toList ms = Map.fold (fun keys key value -> keys # [for i = 1 to value do yield key] ) [] ms
let fromList l = match l with
| [] -> MSet(Map.ofList [])
| x::xs -> MSet(Map.ofList (x::xs |> Seq.countBy id |> Seq.toList))
let sfList = toList fs
let ssList = toList ss
fromList (List.filter (fun n -> not (List.contains n sfList)) ssList)
If I run :
subtract f g
It returns :
MSet (map [])
Which is not what I wanted. g contains one more b than f, so I would want it to return:
MSet(map [("b", 1)])
My implementation doesn't account for multiple occurrences of the same key. I am not quite sure how I can fix this, so I get the wanted functionality?
I suspect you just have your arguments reversed, that's all. Try subtract g f.
That said, your solution seems way more complicated than it needs to be. How about just updating the values in the first map by subtracting the counts in the second, then removing non-positive counts?
let sub (MSet a) (MSet b) =
let bCount key = match Map.tryFind key b with | Some c -> c | None -> 0
let positiveCounts, _ =
a
|> Map.map (fun key value -> value - (bCount key))
|> Map.partition (fun _ value -> value > 0)
MSet positiveCounts
Also, the nested match in your implementation doesn't need to be there. If you wanted to match on both arguments, you can just do:
match fms, sms with
| MSet fs, MSet ss -> ...
But even that is an overkill - you can just include the pattern in parameter declarations, like in my implementation above.
As for duplicate keys - in this case, there is no reason to worry: neither of the arguments can have duplicate keys (because they're both Maps), and the algorithm will never produce any.
The underlying issue, also evident in your other question, seems to be the unification of identical keys. This requires an equality constraint and can be easily effected by the high-level function Seq.groupBy. Since comparison isn't strictly necessary, I propose using a dictionary, but the approach would work also with maps.
Given a type
type MultiSet<'T> = MultiSet of System.Collections.Generic.IDictionary<'T, int>
and a helper which maps the keys, sums their values and validates the result;
let internal mapSum f =
Seq.groupBy (fun (KeyValue(k, _)) -> f k)
>> Seq.map (fun (k, kvs) -> k, Seq.sumBy (fun (KeyValue(_, v)) -> v) kvs)
>> Seq.filter (fun (_, v) -> v > 0)
>> dict
>> MultiSet
your operations become:
let map f (MultiSet s) =
mapSum f s
let add (MultiSet fms) (MultiSet sms) =
Seq.append fms sms
|> mapSum id
let subtract (MultiSet fms) (MultiSet sms) =
Seq.map (fun (KeyValue(k, v)) ->
System.Collections.Generic.KeyValuePair(k, -v)) sms
|> Seq.append fms
|> mapSum id
let f = MultiSet(dict["a", 1; "b", 2; "c", 1])
let g = MultiSet(dict["a", 1; "b", 3; "c", 1])
subtract f g
// val it : MultiSet<string> = MultiSet (seq [])
subtract g f
// val it : MultiSet<string> = MultiSet (seq [[b, 1] {Key = "b";
// Value = 1;}])
I have two lists of records with the following types:
type AverageTempType = {Date: System.DateTime; Year: int64; Month: int64; AverageTemp: float}
type DailyTempType = {Date: System.DateTime; Year: int64; Month: int64; Day: int64; DailyTemp: float}
I want to get a new list which is made up of the DailyTempType "joined" with the AverageTempType. Ultimately though for each daily record I want the Daily Temp - Average temp for the matching month.
I think I can do this with loops as per below and massage this into a reasonable output:
let MatchLoop =
for i in DailyData do
for j in AverageData do
if (i.Year = j.Year && i.Month = j.Month)
then printfn "%A %A %A %A %A" i.Year i.Month i.Day i.DailyTemp j.Average
else printfn "NOMATCH"
I have also try to do this with matching but I can't quite get there (I'm not sure how to define the list correctly in the input type and then iterate to get a result. Also I'm not sure sure if this approach even makes sense):
let MatchPattern (x:DailyTempType) (y:AverageTempType) =
match (x,y) with
|(x,y) when (x.Year = y.Year && x.Month = y.Month) ->
printfn "match"
|(_,_) -> printfn "nomatch"
I have looked into Deedle which I think can do this relatively easily but I am keen to understand how to do it a lower level.
What you can do is to create a map of the monthly average data. You can think of a map as a read-only dictionary:
let averageDataMap =
averageData
|> Seq.map (fun x -> ((x.Year, x.Month), x))
|> Map.ofSeq
This particular map is a Map<(int64 * int64), AverageTempType>, which, in plainer words, means that the keys into the map are tuples of year and month, and the value associated with each key is an AverageTempType record.
This enables you to find all the matching month data, based on the daily data:
let matches =
dailyData
|> Seq.map (fun x -> (x, averageDataMap |> Map.tryFind (x.Year, x.Month)))
Here, matches has the data type seq<DailyTempType * AverageTempType option>. Again, in plainer words, this is a sequence of tuples, where the first element of each tuple is the original daily observation, and the second element is the corresponding monthly average, if a match was found, or None if no matching monthly average was found.
If you want to print the values as in the OP, you can do this:
matches
|> Seq.map snd
|> Seq.map (function | Some _ -> "Match" | None -> "No match")
|> Seq.iter (printfn "%s")
This expression starts with the matches; then pulls out the second element of each tuple; then again maps a Some value to the string "Match", and a None value to the string "No match"; and finally prints each string.
I would convert first AverageTempType seq to a Map (reducing cost of join):
let toMap (avg:AverageTempType seq) = avg |> Seq.groupBy(fun a -> a.Year + a.Month) |> Map.ofSeq
Then you can join and return an option, so consuming code can do whatever you want (print, store, error, etc.):
let join (avg:AverageTempType seq) (dly:DailyTempType seq) =
let avgMap = toMap avg
dly |> Seq.map (fun d -> d.Year, d.Month, d.Day, d.DailyTemp, Map.tryFind (d.Year + d.Month) avgMap);;
I have three sets of information that I need to join together into one array so I can calculate a payment.
Dataset 1:
FromDate, ToDate
2013-04-10, 2013-04-16
(i'm currently creating a 2D array of the dates between these two dates using the following)
let CalculatedLOS : int = ToDate.Value.Subtract(FromDate.Value).Days
let internalArray = Array2D.init CalculatedDays, 3, (fun x -> (AdmissionDateValue.AddDays(x),0,0))
Dataset 2: These are separated as: code, date | code, date
87789,2013-04-10|35444,2013-04-14
Dataset 3: These are separated as date, differentcode | date, differentcode
2013-04-10,SE|2013-04-15,EA
What I need to do is somehow match up the dates with the relevant index in the array that is created from the FromDate and ToDate and update the 2nd and 3rd position with the code and differentcode that match to that date.
So I would hopefully end up with a dataset that looked like this
[2013-04-10; 87789; SE][2013-04-11;;][2013-04-12;;][2013-04-13;;][2013-04-14;87789;][2013-04-15;;EA][2013-04-16;;]
I would then iterate over this array to lookup some values and assign a payment based on each day.
I've tried Array.find within a loop to update 2D arrays but I'm not sure how to do it (code below which did not work) but I'm really stuck about how to do this, or even if this is the best way.
let differentCodeArray = MyLongString.Value.Split('|')
for i in 0 .. bedStaysArray.Length - 1 do
Array.find(fun elem -> bedStaysArray.[0].ToString() elem) internalArray
Also happy to be directed away from arrays if there's a better way!
Here is one way of doing it, given i understand your question. The code have a dependency on the 'correct' DateFormat beeing used.
Full example, dataset1, dataset2, dataset3 are your given inputs.
//Given data
let dataset1 = "2013-04-10, 2013-04-16"
let dataset2 = "87789,2013-04-10|35444,2013-04-14"
let dataset3 = "2013-04-10,SE|2013-04-15,EA"
//Extract data
let keyValuePair (c:char) (str:string) = let [|a;b|] = str.Split(c) in a,b
let mapTuple fn a = fn (fst a), fn (snd a)
let date1,date2 = keyValuePair ',' dataset1 |> mapTuple System.DateTime.Parse
let data2 =
dataset2.Split('|')
|> Seq.map (keyValuePair ',')
|> Seq.map (fun (code, date) -> System.DateTime.Parse date, code)
|> Map.ofSeq
let data3 =
dataset3.Split('|')
|> Seq.map (keyValuePair ',')
|> Seq.map (fun (date, code) -> System.DateTime.Parse date, code)
|> Map.ofSeq
let rec dateSeq (a:System.DateTime) (b:System.DateTime) =
seq {
yield a.Date
if a < b then yield! dateSeq (a.AddDays(1.0)) b
}
//join data
let getCode data key = match data |> Map.tryFind key with |Some v -> v |None -> ""
let result =
dateSeq date1 date2
|> Seq.map (fun d -> d, getCode data2 d, getCode data3 d)
|> Seq.toList
//Format result
result |> List.iter ((fun (date, code1, code2) -> printfn "[%s;%s;%s]" (date.ToShortDateString()) code1 code2))
Console output:
[2013-04-10;87789;SE]
[2013-04-11;;]
[2013-04-12;;]
[2013-04-13;;]
[2013-04-14;35444;]
[2013-04-15;;EA]
[2013-04-16;;]
I have the first matrix which should account for each users (in lines) which products (in columns) they like.
Let's take 3 users and 5 products.
No user liked a product, so my matrix ILike equals a nul matrix :
let matrixILike = [[0.; 1.;2.;3.]
[1.;0.;0.;0.]
[2.;0.;0.;0.]
[3.;0.;0.;0.]
[4.;0.;0.;0.]
[5.;0.;0.;0.]]
Now user 1 likes product 2 and user 3 likes product 5 which can be summarized in the following matrix:
let matrixAction = [[1.;2.]
[3.;5.]]
So I would like to implement the matrix ILike thanks to the matrixAction to obtain a new updated matrixILike like this :
let matrixILike = [[0.; 1.;2.;3.]
[1.;0.;0.;0.]
[2.;1.;0.;0.]
[3.;0.;0.;0.]
[4.;0.;0.;0.]
[5.;0.;0.;1.]]
I try to do this with a "match with" code but it is not working.
for k = 0 to matrixAction.NumRows - 1 do
match (matrixAction.[k,0] , matrixAction.[k,1]) with
| (matrixILike.[x,0] , matrixILike.[0,y]) -> (matrixILike.[x,y] <- 1.)
| _ -> (matrixILike.[x,y] <- 0.)
matrixILike
If you have any suggestions I take it.
This is trivial if you change matrixILike to an array.
let matrixILike = [|
[|0.;1.;2.;3.|]
[|1.;0.;0.;0.|]
[|2.;0.;0.;0.|]
[|3.;0.;0.;0.|]
[|4.;0.;0.;0.|]
[|5.;0.;0.;0.|]
|]
let matrixAction = [
(1., 2.)
(3., 5.)
]
matrixAction
|> List.iter (fun (u, p) -> matrixILike.[int p].[int u] <- 1.)
Without changing your input parameters, this function will do the job.
let update actions =
let mapiTail f = function
| [] -> []
| h::t -> h :: List.mapi (f h) t
mapiTail (fun matHead _ ->
mapiTail (fun rowHead i x ->
if List.exists ((=) [matHead.[i+1];rowHead]) actions then 1. else x))
Usage:
update matrixAction matrixILike
It uses List.mapi which is the same as List.map but with additional parameter: the index.