How to weight income by population in F#? - f#

My data is below. There are three columns that I use, and I want to be able to weight the income, by how many people make that income. There are multiple instances of State, because each income is in a different band. For example:
State Income Pop
AL 45000 8500
AL 78000 7800
AL 80000 1200
TX 500000 500
TX 100000 700
TX 40000 8000
MO 100000 7000
MO 780000 1000
MO 79000 1500
I want to weight income by the number of people out of the population that is in the band of income.
So for AL, I need:
45000 * 8500/(8500+7800+1200) +
78000 * 7800/(8500+7800+1200) +
80000 * 1200/(8500+7800+1200) = The Total <- this is the number I need, PER State
Any suggestions?

Maybe something like this...
type Data =
{State : string
Income : float
Pop : float }
let data =
[{State="AL"; Income=45000.; Pop=8500.};
{State="AL"; Income=78000.; Pop=7800.};
{State="AL"; Income=80000.; Pop=1200.};
{State="TX"; Income=500000.;Pop= 500.};
{State="TX"; Income=100000.;Pop= 700.};
{State="TX"; Income=40000.; Pop=8000.};
{State="MO"; Income=100000.;Pop= 7000.};
{State="MO"; Income=780000.;Pop= 1000.};
{State="MO"; Income=79000.; Pop=1500.} ]
data
|> List.map(fun r -> r.State)
|> List.distinct
|> List.map (fun state ->
let stateRecords = data |> List.filter (fun r -> r.State = state)
let statePopulation= stateRecords |> List.map (fun r -> r.Pop) |> List.sum
let avg = stateRecords |> List.map (fun r -> r.Income * r.Pop / statePopulation) |> List.sum
(state, avg)
)

Another option
data
|> List.groupBy (fun x -> x.State)
|> List.map
(fun (state, grp) ->
let n, d =
List.fold
(fun (n, d) v ->
n + v.Pop * v.Income, d + v.Pop)
(0.0, 0.0) grp
state, n / d)
If your data is sorted by state I guess it may be better for performance to use some fold function "right away" instead of call groupBy first.

Related

F# list group by running total?

I have the following list of tuples ordered by the first item. I want to cluster the times by
If the second item of the tuple is greater then 50, it will be in its own cluster.
Otherwise, cluster the items whose sum is less than 50.
The order cannot be changed.
code:
let values =
[("ACE", 78);
("AMR", 3);
("Aam", 6);
("Acc", 1);
("Adj", 23);
("Aga", 12);
("All", 2);
("Ame", 4);
("Amo", 60);
//....
]
values |> Seq.groupBy(fun (k,v) -> ???)
The expected value will be
[["ACE"] // 78
["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] // 47
["Ame"] // 4
["Amo"] // 60
....]
Ideally, I want to evenly distribute the second group (["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"] which got the sum of 47) and the third one (["Ame"] which has only 4).
How to implement it in F#?
I had the following solution. It uses a mutable variable. It's not F# idiomatic? Is for ... do imperative in F# or is it a syntactic sugar of some function construct?
seq {
let mutable c = []
for v in values |> Seq.sortBy(fun (k, _) -> k) do
let sum = c |> Seq.map(fun (_, v) -> v) |> Seq.sum
if not(c = []) && sum + (snd v) > 50
then
yield c
c <- [v]
else
c <- List.append c [v]
}
I think I got it. Not the nicest code ever, but works and is immutable.
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last)
let newLast = [((fst last) # [name]), (snd last) + value]
newLast |> List.append withoutLast
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else [[name], value] |> List.append acc
values |> List.fold foldFn [] |> List.map fst
Update: Since append can be quite expensive operation, I added prepend only version (still fulfills original requirement to keep order).
let foldFn (acc:(string list * int) list) (name, value) =
let addToLast last =
let withoutLast = acc |> List.filter ((<>) last) |> List.rev
let newLast = ((fst last) # [name]), (snd last) + value
(newLast :: withoutLast) |> List.rev
match acc |> List.tryLast with
| None -> [[name],value]
| Some l ->
if (snd l) + value <= 50 then addToLast l
else ([name], value) :: (List.rev acc) |> List.rev
Note: There is still # operator on line 4 (when creating new list of names in cluster), but since the theoretical maximum amount of names in cluster is 50 (if all of them would be equal 1), the performance here is negligible.
If you remove List.map fst on last line, you would get sum value for each cluster in list.
Append operations are expensive. A straight-forward fold with prepended intermediate results is cheaper, even if the lists need to be reversed after processing.
["ACE", 78; "AMR", 3; "Aam", 6; "Acc", 1; "Adj", 23; "Aga", 12; "All", 2; "Ame", 4; "Amd", 6; "Amo", 60]
|> List.fold (fun (r, s1, s2) (t1, t2) ->
if t2 > 50 then [t1]::s1::r, [], 0
elif s2 + t2 > 50 then s1::r, [t1], t2
else r, t1::s1, s2 + t2 ) ([], [], 0)
|> fun (r, s1, _) -> s1::r
|> List.filter (not << List.isEmpty)
|> List.map List.rev
|> List.rev
// val it : string list list =
// [["ACE"]; ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]; ["Ame"; "Amd"];
// ["Amo"]]
Here is a recursive version - working much the same way as fold-versions:
let groupBySums data =
let rec group cur sum acc lst =
match lst with
| [] -> acc |> List.where (not << List.isEmpty) |> List.rev
| (name, value)::tail when value > 50 -> group [] 0 ([(name, value)]::(cur |> List.rev)::acc) tail
| (name, value)::tail ->
match sum + value with
| x when x > 50 -> group [(name, value)] 0 ((cur |> List.rev)::acc) tail
| _ -> group ((name, value)::cur) (sum + value) acc tail
(data |> List.sortBy (fun (name, _) -> name)) |> group [] 0 []
values |> groupBySums |> List.iter (printfn "%A")
Yet another solution using Seq.mapFold and Seq.groupBy:
let group values =
values
|> Seq.mapFold (fun (group, total) (name, count) ->
let newTotal = count + total
let newGroup = group + if newTotal > 50 then 1 else 0
(newGroup, name), (newGroup, if newGroup = group then newTotal else count)
) (0, 0)
|> fst
|> Seq.groupBy fst
|> Seq.map (snd >> Seq.map snd >> Seq.toList)
Invoke it like this:
[ "ACE", 78
"AMR", 3
"Aam", 6
"Acc", 1
"Adj", 23
"Aga", 12
"All", 2
"Ame", 4
"Amo", 60
]
|> group
|> Seq.iter (printfn "%A")
// ["ACE"]
// ["AMR"; "Aam"; "Acc"; "Adj"; "Aga"; "All"]
// ["Ame"]
// ["Amo"]

How to split a sequence in F# based on another sequence in an idiomatic way

I have, in F#, 2 sequences, each containing distinct integers, strictly in ascending order: listMaxes and numbers.
If not Seq.isEmpty numbers, then it is guaranteed that not Seq.isEmpty listMaxes and Seq.last listMaxes >= Seq.last numbers.
I would like to implement in F# a function that returns a list of list of integers, whose List.length equals Seq.length listMaxes, containing the elements of numbers divided in lists, where the elements of listMaxes limit each group.
For example: called with the arguments
listMaxes = seq [ 25; 56; 65; 75; 88 ]
numbers = seq [ 10; 11; 13; 16; 20; 25; 31; 38; 46; 55; 65; 76; 88 ]
this function should return
[ [10; 11; 13; 16; 20; 25]; [31; 38; 46; 55]; [65]; List.empty; [76; 88] ]
I can implement this function, iterating over numbers only once:
let groupByListMaxes listMaxes numbers =
if Seq.isEmpty numbers then
List.replicate (Seq.length listMaxes) List.empty
else
List.ofSeq (seq {
use nbe = numbers.GetEnumerator ()
ignore (nbe.MoveNext ())
for lmax in listMaxes do
yield List.ofSeq (seq {
if nbe.Current <= lmax then
yield nbe.Current
while nbe.MoveNext () && nbe.Current <= lmax do
yield nbe.Current
})
})
But this code feels unclean, ugly, imperative, and very un-F#-y.
Is there any functional / F#-idiomatic way to achieve this?
Here's a version based on list interpretation, which is quite functional in style. You can use Seq.toList to convert between them, whenever you want to handle that. You could also use Seq.scan in conjunction with Seq.partition ((>=) max) if you want to use only library functions, but beware that it's very very easy to introduce a quadratic complexity in either computation or memory when doing that.
This is linear in both:
let splitAt value lst =
let rec loop l1 = function
| [] -> List.rev l1, []
| h :: t when h > value -> List.rev l1, (h :: t)
| h :: t -> loop (h :: l1) t
loop [] lst
let groupByListMaxes listMaxes numbers =
let rec loop acc lst = function
| [] -> List.rev acc
| h :: t ->
let out, lst' = splitAt h lst
loop (out :: acc) lst' t
loop [] numbers listMaxes
It can be done like this with pattern matching and tail recursion:
let groupByListMaxes listMaxes numbers =
let rec inner acc numbers =
function
| [] -> acc |> List.rev
| max::tail ->
let taken = numbers |> Seq.takeWhile ((>=) max) |> List.ofSeq
let n = taken |> List.length
inner (taken::acc) (numbers |> Seq.skip n) tail
inner [] numbers (listMaxes |> List.ofSeq)
Update: I also got inspired by fold and came up with the following solution that strictly refrains from converting the input sequences.
let groupByListMaxes maxes numbers =
let rec inner (acc, (cur, numbers)) max =
match numbers |> Seq.tryHead with
// Add n to the current list of n's less
// than the local max
| Some n when n <= max ->
let remaining = numbers |> Seq.tail
inner (acc, (n::cur, remaining)) max
// Complete the current list by adding it
// to the accumulated result and prepare
// the next list for fold.
| _ ->
(List.rev cur)::acc, ([], numbers)
maxes |> Seq.fold inner ([], ([], numbers)) |> fst |> List.rev
I have found a better implementation myself. Tips for improvements are still welcome.
Dealing with 2 sequences is really a pain. And I really do want to iterate over numbers only once without turning that sequence into a list. But then I realized that turning listMaxes (generally the shorter of the sequences) is less costly. That way only 1 sequence remains, and I can use Seq.fold over numbers.
What should be the state that we want to keep and change while iterating with Seq.fold over numbers? First, it should definitely include the remaining of the listMaxes, yet the previous maxes that we already have surpassed are no longer of interest. Second, the accumulated lists so far, although, like in the other answers, these can be kept in reverse order. More to the point: the state is a couple which has as second element a reversed list of reversed lists of the numbers so far.
let groupByListMaxes listMaxes numbers =
let rec folder state number =
match state with
| m :: maxes, _ when number > m ->
folder (maxes, List.empty :: snd state) number
| m :: maxes, [] ->
fst state, List.singleton (List.singleton number)
| m :: maxes, h :: t ->
fst state, (number :: h) :: t
| [], _ ->
failwith "Guaranteed not to happen"
let listMaxesList = List.ofSeq listMaxes
let initialState = listMaxesList, List.empty
let reversed = snd (Seq.fold folder initialState numbers)
let temp = List.rev (List.map List.rev reversed)
let extraLength = List.length listMaxesList - List.length temp
let extra = List.replicate extraLength List.empty
List.concat [temp; extra]
I know this is an old question but I had a very similar problem and I think this is a simple solution:
let groupByListMaxes cs xs =
List.scan (fun (_, xs) c -> List.partition (fun x -> x <= c) xs)
([], xs)
cs
|> List.skip 1
|> List.map fst

F#, loop control until n-2

I am currently learning functional programming and F#, and I want to do a loop control until n-2. For example:
Given a list of doubles, find the pairwise average,
e.g. pairwiseAverage [1.0; 2.0; 3.0; 4.0; 5.0] will give [1.5; 2.5; 3.5; 4.5]
After doing some experimenting and searching, I have a few ways to do it:
Method 1:
let pairwiseAverage (data: List<double>) =
[for j in 0 .. data.Length-2 do
yield (data.[j]+data.[j+1])/2.0]
Method 2:
let pairwiseAverage (data: List<double>) =
let averageWithNone acc next =
match acc with
| (_,None) -> ([],Some(next))
| (result,Some prev) -> ((prev+next)/2.0)::result,Some(next))
let resultTuple = List.fold averageWithNone ([],None) data
match resultTuple with
| (x,_) -> List.rev x
Method 3:
let pairwiseAverage (data: List<double>) =
// Get elements from 1 .. n-1
let after = List.tail data
// Get elements from 0 .. n-2
let before =
data |> List.rev
|> List.tail
|> List.rev
List.map2 (fun x y -> (x+y)/2.0) before after
I just like to know if there are other ways to approach this problem. Thank you.
Using only built-ins:
list |> Seq.windowed 2 |> Seq.map Array.average
Seq.windowed n gives you sliding windows of n elements each.
One simple other way is to use Seq.pairwise
something like
list |> Seq.pairwise |> Seq.map (fun (a,b) -> (a+b)/2.0)
The approaches suggested above are appropriate for short windows, like the one in the question. For windows with a length greater than 2 one cannot use pairwise. The answer by hlo generalizes to wider windows and is a clean and fast approach if window length is not too large. For very wide windows the code below runs faster, as it only adds one number and subtracts another one from the value obtained for the previous window. Notice that Seq.map2 (and Seq.map) automatically deal with sequences of different lengths.
let movingAverage (n: int) (xs: float List) =
let init = xs |> (Seq.take n) |> Seq.sum
let additions = Seq.map2 (fun x y -> x - y) (Seq.skip n xs) xs
Seq.fold (fun m x -> ((List.head m) + x)::m) [init] additions
|> List.rev
|> List.map (fun (x: float) -> x/(float n))
xs = [1.0..1000000.0]
movingAverage 1000 xs
// Real: 00:00:00.265, CPU: 00:00:00.265, GC gen0: 10, gen1: 10, gen2: 0
For comparison, the function above performs the calculation above about 60 times faster than the windowed equivalent:
let windowedAverage (n: int) (xs: float List) =
xs
|> Seq.windowed n
|> Seq.map Array.average
|> Seq.toList
windowedAverage 1000 xs
// Real: 00:00:15.634, CPU: 00:00:15.500, GC gen0: 74, gen1: 74, gen2: 71
I tried to eliminate List.rev using foldBack but did not succeed.
A point-free approach:
let pairwiseAverage = List.pairwise >> List.map ((<||) (+) >> (*) 0.5)
Online Demo
Usually not a better way, but another way regardless... ;-]

finding average with list.fold

Just trying to make a really simple average float list -> float, that finds the average of a list.
This is my code:
let list ls = List.fold (fun acc float -> acc + float) 0.0 ls;;
list [1.60; 2.30; 5.0; 2.30];;
The output is 11.2
I need something added to the function that can divide 11.2 with x elements to find the average.
Any help?
If you want to iterate the list only once you can just count up it's length in parallel (using a tuple `(sum, len)ยด as an accumulator) and then divide afterwards:
let avg xs =
xs
|> List.fold (fun (sum,len) x -> (sum+x,len+1.0)) (0.0,0.0)
|> (fun (sum,len) -> sum / len)
which can be written in a point-free style if you like:
let avg =
List.fold (fun (sum,len) x -> (sum+x,len+1.0)) (0.0,0.0)
>> (fun (sum,len) -> sum / len)
this gives you:
> avg [1.0;2.0;3.0];;
val it : float = 2.0
but of course the easiest way to extent your function is to just get the length of the list and divide by it:
let avg ls =
List.fold (fun acc float -> acc + float) 0.0 ls / (float ls.Length)
for the list this will not make any difference - but you can extend the above version to seqs easily and there it will make a difference (iterate the seq once or twice):
let avg : float seq -> float =
Seq.fold (fun (sum,len) x -> (sum+x,len+1.0)) (0.0,0.0)
>> (fun (sum,len) -> sum / len)

F# stream of armstrong numbers

I am seeking help, mainly because I am very new to F# environment. I need to use F# stream to generate an infinite stream of Armstrong Numbers. Can any one help with this one. I have done some mambo jumbo but I have no clue where I'm going.
type 'a stream = | Cons of 'a * (unit -> 'a stream)
let rec take n (Cons(x, xsf)) =
if n = 0 then []
else x :: take (n-1) (xsf());;
//to test if two integers are equal
let test x y =
match (x,y) with
| (x,y) when x < y -> false
| (x,y) when x > y -> false
| _ -> true
//to check for armstrong number
let check n =
let mutable m = n
let mutable r = 0
let mutable s = 0
while m <> 0 do
r <- m%10
s <- s+r*r*r
m <- m/10
if (test n s) then true else false
let rec armstrong n =
Cons (n, fun () -> if check (n+1) then armstrong (n+1) else armstrong (n+2))
let pos = armstrong 0
take 5 pos
To be honest your code seems a bit like a mess.
The most basic version I could think of is this:
let isArmstrong (a,b,c) =
a*a*a + b*b*b + c*c*c = (a*100+b*10+c)
let armstrongs =
seq {
for a in [0..9] do
for b in [0..9] do
for c in [0..9] do
if isArmstrong (a,b,c) then yield (a*100+b*10+c)
}
of course assuming a armstrong number is a 3-digit number where the sum of the cubes of the digits is the number itself
this will yield you:
> Seq.toList armstrongs;;
val it : int list = [0; 1; 153; 370; 371; 407]
but it should be easy to add a wider range or remove the one-digit numbers (think about it).
general case
the problem seems so interesting that I choose to implement the general case (see here) too:
let numbers =
let rec create n =
if n = 0 then [(0,[])] else
[
for x in [0..9] do
for (_,xs) in create (n-1) do
yield (n, x::xs)
]
Seq.initInfinite create |> Seq.concat
let toNumber (ds : int list) =
ds |> List.fold (fun s d -> s*10I + bigint d) 0I
let armstrong (m : int, ds : int list) =
ds |> List.map (fun d -> bigint d ** m) |> List.sum
let leadingZero =
function
| 0::_ -> true
| _ -> false
let isArmstrong (m : int, ds : int list) =
if leadingZero ds then false else
let left = armstrong (m, ds)
let right = toNumber ds
left = right
let armstrongs =
numbers
|> Seq.filter isArmstrong
|> Seq.map (snd >> toNumber)
but the numbers get really sparse quickly and using this will soon get you out-of-memory but the
first 20 are:
> Seq.take 20 armstrongs |> Seq.map string |> Seq.toList;;
val it : string list =
["0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "153"; "370"; "371";
"407"; "1634"; "8208"; "9474"; "54748"; "92727"; "93084"]
remark/disclaimer
this is the most basic version - you can get big speed/performance if you just enumerate all numbers and use basic math to get and exponentiate the digits ;) ... sure you can figure it out

Resources