F# How to Percentile Rank An Array of Doubles? - f#

I am trying to take a numeric array in F#, and rank all the elements so that ties get the same rank. Basically I'm trying to replicate the algorithm I have below in C#, but just for an array of doubles. Help?
rankMatchNum = 0;
rankMatchSum = 0;
previousScore = -999999999;
for (int i = 0; i < factorStocks.Count; i++)
{
//The 1st time through it won't ever match the previous score...
if (factorStocks[i].factors[factorName + "_R"] == previousScore)
{
rankMatchNum = rankMatchNum + 1; //The count of matching ranks
rankMatchSum = rankMatchSum + i + 1; //The rank itself...
for (int j = 0; j <= rankMatchNum; j++)
{
factorStocks[i - j].factors[factorName + "_WR"] = rankMatchSum / (rankMatchNum + 1);
}
}
else
{
rankMatchNum = 0;
rankMatchSum = i + 1;
previousScore = factorStocks[i].factors[factorName + "_R"];
factorStocks[i].factors[factorName + "_WR"] = i + 1;
}
}

Here's how I would do it, although this isn't a direct translation of your code. I've done things in a functional style, piping results from one transformation to another.
let rank seq =
seq
|> Seq.countBy (fun x -> x) // count repeated numbers
|> Seq.sortBy (fun (k,v) -> k) // order by key
|> Seq.fold (fun (r,l) (_,n) -> // accumulate the number of items seen and the list of grouped average ranks
let r'' = r + n // get the rank after this group is processed
let avg = List.averageBy float [r+1 .. r''] // average ranks for this group
r'', ([for _ in 1 .. n -> avg]) :: l) // add a list with avg repeated
(0,[]) // seed the fold with rank 0 and an empty list
|> snd // get the final list component, ignoring the component storing the final rank
|> List.rev // reverse the list
|> List.collect (fun l -> l) // merge individual lists into final list
Or to copy Mehrdad's style:
let rank arr =
let lt item = arr |> Seq.filter (fun x -> x < item) |> Seq.length
let lte item = arr |> Seq.filter (fun x -> x <= item) |> Seq.length
let avgR item = [(lt item) + 1 .. (lte item)] |> List.averageBy float
Seq.map avgR arr

I think that you'll probably find this problem far easier to solve in F# if you rewrite the above in a declarative manner rather than in an imperative manner. Here's my off-the-top-of-my-head approach to rewriting the above declaratively:
First we need a wrapper class to decorate our items with a property carrying the rank.
class Ranked<T> {
public T Value { get; private set; }
public double Rank { get; private set; }
public Ranked(T value, double rank) {
this.Value = value;
this.Rank = rank;
}
}
Here, then, is your algorithm in a declarative manner. Note that elements is your input sequence and the resulting sequence is in the same order as elements. The delegate func is the value that you want to rank elements by.
static class IEnumerableExtensions {
public static IEnumerable<Ranked<T>> Rank<T, TRank>(
this IEnumerable<T> elements,
Func<T, TRank> func
) {
var groups = elements.GroupBy(x => func(x));
var ranks = groups.OrderBy(g => g.Key)
.Aggregate(
(IEnumerable<double>)new List<double>(),
(x, g) =>
x.Concat(
Enumerable.Repeat(
Enumerable.Range(x.Count() + 1, g.Count()).Sum() / (double)g.Count(),
g.Count()
)
)
)
.GroupBy(r => r)
.Select(r => r.Key)
.ToArray();
var dict = groups.Select((g, i) => new { g.Key, Index = i })
.ToDictionary(x => x.Key, x => ranks[x.Index]);
foreach (T element in elements) {
yield return new Ranked<T>(element, dict[func(element)]);
}
}
}
Usage:
class MyClass {
public double Score { get; private set; }
public MyClass(double score) { this.Score = score; }
}
List<MyClass> list = new List<MyClass>() {
new MyClass(1.414),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(2.718),
new MyClass(1.414),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(3.141),
new MyClass(1.618)
};
foreach(var item in list.Rank(x => x.Score)) {
Console.WriteLine("Score = {0}, Rank = {1}", item.Value.Score, item.Rank);
}
Output:
Score = 1.414, Rank = 1.5
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 2.718, Rank = 3
Score = 1.414, Rank = 1.5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 3.141, Rank = 5
Score = 1.618, Rank = 8
Note that I do not require the input sequence to be ordered. The resulting code is simpler if you enforce such a requirement on the input sequence. Note further that we do not mutate the input sequence, nor do we mutate the input items. This makes F# happy.
From here you should be able to rewrite this in F# easily.

This is not a very efficient algorithm (O(n2)), but it's quite short and readable:
let percentile arr =
let rank item = ((arr |> Seq.filter (fun i -> i < item)
|> Seq.length |> float) + 1.0)
/ float (Array.length arr) * 100.0
Array.map rank arr
You might mess with the expression fun i -> i < e (or the + 1.0 expression) to achieve your desired way of ranking results:
let arr = [|1.0;2.0;2.0;4.0;3.0;3.0|]
percentile arr |> print_any;;
[|16.66666667; 33.33333333; 33.33333333; 100.0; 66.66666667; 66.66666667|]

Mehrdad's solution is very nice but a bit slow for my purposes. The initial sorting can be done 1 time. Rather than traversing the lists each time to get the number of items < or <= the target, we can use counters. This is more imperative (could have used a fold):
let GetRanks2 ( arr ) =
let tupleList = arr |> Seq.countBy( fun x -> x ) |> Seq.sortBy( fun (x,count) -> x )
let map = new System.Collections.Generic.Dictionary<int,float>()
let mutable index = 1
for (item, count) in tupleList do
let c = count
let avgRank =
let mutable s = 0
for i = index to index + c - 1 do
s <- s + i
float s / float c
map.Add( item, avgRank )
index <- index + c
//
map

Related

F# Equivalent of ++ operator

I'm converting an array to a record type. Something like:
let value = [|"1";"2";"3";"Not a number";"5"|]
type ValueRecord = {
One: int32
Two: int32
Three: int32
Four: string
Five: int32 }
let convertArrayToRecord (x: string array) =
{ One = x.[0] |> Int.Parse
Two = x.[1] |> Int.Parse
Three = x.[2] |> Int.Parse
Four = x.[3]
Five = x.[4] |> Int.Parse }
let recordValue = convertArrayToRecord value
This works, but has the drawback that adding a value to the middle of the array results in manual editing of all index references thereafter like this:
let value = [|"1";"Not a number - 6";"2";"3";"Not a number";"5"|]
type ValueRecord = {
One: int32
Six: string
Two: int32
Three: int32
Four: string
Five: int32 }
let convertArrayToRecord (x: string array) =
{ One = x.[0] |> Int.Parse
Six = x.[1]
Two = x.[2] |> Int.Parse //<--updated index
Three = x.[3] |> Int.Parse //<--updated index
Four = x.[4] //<--updated index
Five = x.[5] |> Int.Parse } //<--updated index
let recordValue = convertArrayToRecord value
Additionally, its easy to accidentally get the indexes wrong.
The solution I came up with is:
let convertArrayToRecord (x: string array) =
let index = ref 0
let getIndex () =
let result = !index
index := result + 1
result
{ One = x.[getIndex ()] |> Int.Parse
Six = x.[getIndex ()]
Two = x.[getIndex ()] |> Int.Parse
Three = x.[getIndex ()] |> Int.Parse
Four = x.[getIndex ()]
Five = x.[getIndex ()] |> Int.Parse }
This works, but I really dislike the ref cell for something which isn't concurrent. Is there a better/cleaner way to accomplish this?
You could use pattern matching.
let convertArrayToRecord = function
| [|one; two; three; four; five|] ->
{
One = int one
Two = int two
Three = int three
Four = four
Five = int five
}
| _ ->
failwith "How do you want to deal with arrays of a different length"
When adding another entry to the array you'd adjust it by editing the first match to [|one; six; two; three; four; five|].
By the way, for a mutable index like the one you're using in your current example, you can avoid ref by using the mutable keyword instead, like so;
let mutable index = -1
let getIndex =
index <- index + 1
index
And if we hide the mutable inside the getIndex function
let getIndex =
let mutable index = -1
fun () ->
index <- index + 1
index
You could let the indexes be handled with pattern matching, and add an active pattern, like this:
let (|PInt32|_|) (s:string) =
let ok, i = Int32.TryParse(s)
if ok then Some(PInt32(s)) else None
let foo() =
match [|"1"; "2"; "Some str"|] with
| [|PInt32(x); PInt32(y); mystr|] ->
printfn "Yup"
| _ -> printfn "Nope"

Second Taxicab Number Generator

I am attempting to generate a series of guesses for the second Taxicab number. What I want to do is is call the Attempt function on a series of integers in a finite sequence. I have my two questions about implementation in the comments.
A taxi cab number, in case your wondering, is the least number that satisfied the sum of 2 unique cubes in for n unique sets of 2 unique cubes. Ta(2) is 1729.
[<EntryPoint>]
let main argv =
let Attempt (start : int) =
let stop = start+20
let integerList = [start..stop]
let list = List.init 3 (fun x -> integerList.[x])
//Is there a simple way to make initialize the list with random indices of integerList?
let Cube x = x*x*x
let newlist = list |> List.map (fun x -> Cube x)
let partitionList (x : List<int>) (y : int) = List.sum [x.[y];x.[y+1]]
let intLIST = [0..2]
let partitionList' = [for i in intLIST do yield partitionList newlist i]
let x = Set.ofList partitionList'
let y = Set.ofList partitionList'
//I was going to try to use some kind of equality operator to determine whether the two sets were equal, which could tell me whether we had actually found a Taxicab number by the weakened definition.
System.Console.Write(list)
System.Console.Write(newlist)
let rnd = System.Random()
//My primary question is how can I convert a random to an integer to use in start for the function Attempt?
System.Console.ReadKey() |> ignore
printfn("%A") argv
0
Dirty way to initialize list with random indexes of another list:
let randomIndexes count myList =
let rand = System.Random()
seq {
for n = 1 to count do
yield rand.Next(List.length myList) }
|> Seq.distinct
//|> Seq.sort // if you need them sorted
|> List.ofSeq
let result = randomIndexes 5 [3;2;4;5]
printfn "%A" result

Combine data into smaller discrete intervals

Suppose we have a pair of input arrays, or a list of (key, value) tuples if you prefer. What's an elegant and performant way to combine values that have indices falling in a certain interval? For example, if the interval (or 'bin') size is 10 then the values of all indices from 0 < x <= 10 would be combined, as would the values of indices from 10 < x <= 20 and so on. I want:
let interval = 10
let index = [| 6; 12; 18; 24 |]
let value = [| a; b; c; d |]
result = [| a; b + c; d |]
The crudest way to do this would be to use a whole lot of if, else if statements (the index range has a defined upper limit). I got close with
for i = 0 to index.Length do
result.[Math.Floor(index.[i]/10] += value.[Math.Floor(index.[i]/10]
but this is doing 0 <= x < 10, not 0 < x <= 10.
I also tried assuming the indices are ordered and evenly spaced, with
for i = 1 : ( index.Length - 1 ) / valuesPerBin
valueRange = ((i-1)*valuesPerBin + 1) : i*valuesPerBin )
result(i) = sum(value(valueRange))
which is nice but obviously breaks if there is a non integer number of values per bin.
What's the best way of doing this in F#? Is there a name or an existing function for what I'm trying to do?
let interval = 10
let index = [6;12;18;24]
let value =[101;102;103;104]
let intervals = List.map (fun e -> e/interval) index
let keys = List.map2(fun e1 e2 -> (e1,e2)) intervals value
let skeys = Seq.ofList keys
let result = skeys
|>Seq.groupBy (fun p -> fst p)
|>Seq.map (fun p -> snd p)
|>Seq.map(fun s -> Seq.sumBy (fun p -> snd p) s)
result will be [101;205;104] (as a Seq).
If you want to convert to an array, apply Seq.toArray.
Is it what you wanted ?
Adapt the surrounding code to use
0 <= x < 10 instead of 0 < x <= 10. In my case this was just a simple definition change in another function, allowing me to use
for i = 0 to index.Length do
result.[Math.Floor(index.[i]/10] += value.[Math.Floor(index.[i]/10], which is much simpler and terser syntax than the alternatives.

F# solution for Store Credit

I want to solve this excercise: http://code.google.com/codejam/contest/351101/dashboard#s=p0 using F#.
I am new to functional programming and F# but I like the concept and the language a lot. And I love the codejam excercise too it looks so easy but real life. Could somebody point me out a solution?
At the moment I have written this code which is just plain imperative and looks ugly from the functional perspective:
(*
C - Credit
L - Items
I - List of Integer, wher P is single integer
How does the data look like inside file
N
[...
* Money
* Items in store
...]
*)
let lines = System.IO.File.ReadAllLines("../../../../data/A-small-practice.in")
let CBounds c = c >= 5 && c <= 1000
let PBounds p = p >= 1 && p <= 1000
let entries = int(lines.[0]) - 1
let mutable index = 1 (* First index is how many entries*)
let mutable case = 1
for i = 0 to entries do
let index = (i*3) + 1
let C = int(lines.[index])
let L = int(lines.[index+1])
let I = lines.[index+2]
let items = I.Split([|' '|]) |> Array.map int
// C must be the sum of some items
// Ugly imperative way which contains duplicates
let mutable nIndex = 0
for n in items do
nIndex <- nIndex + 1
let mutable mIndex = nIndex
for m in items.[nIndex..] do
mIndex <- mIndex + 1
if n + m = C then do
printfn "Case #%A: %A %A" case nIndex mIndex
case <- case + 1
I would like to find out items which add up to C value but not in a usual imperative way - I want functional approach.
You don't specify how you would solve the problem, so it's hard to give advices.
Regarding reading inputs, you can express it as a series of transformation on Seq. High-order functions from Seq module are very handy:
let data =
"../../../../data/A-small-practice.in"
|> System.IO.File.ReadLines
|> Seq.skip 1
|> Seq.windowed 3
|> Seq.map (fun lines -> let C = int(lines.[0])
let L = int(lines.[1])
let items = lines.[2].Split([|' '|]) |> Array.map int
(C, L, items))
UPDATE:
For the rest of your example, you could use sequence expression. It is functional enough and easy to express nested computations:
let results =
seq {
for (C, _, items) in data do
for j in 1..items.Length-1 do
for i in 0..j-1 do
if items.[j] + items.[i] = C then yield (i, j)
}
Seq.iteri (fun case (i, j) -> printfn "Case #%A: %A %A" case i j) results

f# sequence of running total

Ok, this looks like it should be easy, but I'm just not getting it. If I have a sequence of numbers, how do I generate a new sequence made up of the running totals? eg for a sequence [1;2;3;4], I want to map it to [1;3;6;10]. In a suitably functional way.
Use List.scan:
let runningTotal = List.scan (+) 0 >> List.tail
[1; 2; 3; 4]
|> runningTotal
|> printfn "%A"
Seq.scan-based implementation:
let runningTotal seq' = (Seq.head seq', Seq.skip 1 seq') ||> Seq.scan (+)
{ 1..4 }
|> runningTotal
|> printfn "%A"
Another variation using Seq.scan (Seq.skip 1 gets rid of the leading zero):
> {1..4} |> Seq.scan (+) 0 |> Seq.skip 1;;
val it : seq<int> = seq [1; 3; 6; 10]
> Seq.scan (fun acc n -> acc + n) 0 [1;2;3;4];;
val it : seq<int> = seq [0; 1; 3; 6; ...]
With lists:
> [1;2;3;4] |> List.scan (fun acc n -> acc + n) 0 |> List.tail;;
val it : int list = [1; 3; 6; 10]
Edit: Another way with sequences:
let sum s = seq {
let x = ref 0
for i in s do
x := !x + i
yield !x
}
Yes, there's a mutable variable, but I find it more readable (if you want to get rid of the leading 0).
Figured it was worthwhile to share how to do this with Record Types in case that's also what you came here looking for.
Below is a fictitious example demonstrating the concept using runner laps around a track.
type Split = double
type Lap = { Num : int; Split : Split }
type RunnerLap = { Lap : Lap; TotalTime : double }
let lap1 = { Num = 1; Split = 1.23 }
let lap2 = { Num = 2; Split = 1.13 }
let lap3 = { Num = 3; Split = 1.03 }
let laps = [lap1;lap2;lap3]
let runnerLapsAccumulator =
Seq.scan
(fun rl l -> { rl with Lap = l; TotalTime = rl.TotalTime + l.Split }) // acumulator
{ Lap = { Num = 0; Split = 0.0 }; TotalTime = 0.0 } // initial state
let runnerLaps = laps |> runnerLapsAccumulator
printfn "%A" runnerLaps
Not sure this is the best way but it should do the trick
let input = [1; 2; 3; 4]
let runningTotal =
(input, 0)
|> Seq.unfold (fun (list, total) ->
match list with
| [] ->
None
| h::t ->
let total = total + h
total, (t, total) |> Some)
|> List.ofSeq

Resources