F# further requirement for finding unique array of tuples - f#

I have 4 arrays of different data. For the first array of string, I want to delete the duplicate element and get the results of array of unique tuples with 4 elements.
For example, let's say the arrays are:
let dupA1 = [| "A"; "B"; "C"; "D"; "A" |]
let dupA2 = [| 1; 2; 3; 4; 1 |]
let dupA3 = [| 1.0M; 2.0M; 3.0M; 4.0M; 1.0M |]
let dupA4 = [| 1L; 2L; 3L; 4L; 1L |]
I want the result to be:
let uniqueArray = [| ("A", 1, 1.0M, 1L); ("B", 2, 2.0M, 2L); ("C", 3, 3.0M, 3L); ("D",4,
Thanks to abatishchev for his great code, now I have this answer:
let zip4 a (b : _ []) (c : _ []) (d : _ []) =
Array.init (Array.length a) (fun i -> a.[i], b.[i], c.[i], d.[i])
let uniqueArray = zip4 dupA1 dupA2 dupA3 dupA4 |> Seq.distinct |> Seq.toArray
However, further more, I want to find the result of each array from the unique array.
let uniqueArray = [| ("A", 1, 1.0M, 1L); ("B", 2, 2.0M, 2L); ("C", 3, 3.0M, 3L); ("D",4, 4.0M, 4L) |]
I want the following 4 arrays from uniqeArray:
let uniqA1 = [| "A"; "B"; "C"; "D" |]
let uniqA2 = [| 1; 2; 3; 4 |]
let uniqA3 = [| 1.0M; 2.0M; 3.0M; 4.0M |]
let uniqA4 = [| 1L; 2L; 3L; 4L |]
I tried the following code:
let [| uniqA1, uniqA2, uniqA3, uniqA4 |] = uniqueArray
First, I get the compiler warning:
Warning 1
Incomplete pattern matches on this expression. For example, the value '[|_; _|]' may indicate a case not covered by the pattern(s).
After I used #nowarn "25", during the run time, I got the following error:
Microsoft.FSharp.Core.MatchFailureException was unhandled
Message: An unhandled exception of type 'Microsoft.FSharp.Core.MatchFailureException' occurred in Program.exe
Please help me with the further requirements.

You now need unzip4:
let unzip4 arr =
let a = Array.zeroCreate (Array.length arr)
let b = Array.zeroCreate (Array.length arr)
let c = Array.zeroCreate (Array.length arr)
let d = Array.zeroCreate (Array.length arr)
arr
|> Array.iteri (fun i (x, y, z, w) ->
a.[i] <- x
b.[i] <- y
c.[i] <- z
d.[i] <- w)
a, b, c, d
Then you can do:
> let uniqA1, uniqA2, uniqA3, uniqA4 = unzip4 uniqueArray;;
val uniqA4 : int64 [] = [|1L; 2L; 3L; 4L|]
val uniqA3 : decimal [] = [|1.0M; 2.0M; 3.0M; 4.0M|]
val uniqA2 : int [] = [|1; 2; 3; 4|]
val uniqA1 : string [] = [|"A"; "B"; "C"; "D"|]

Related

How to get a sublist or a subsequence in F#

I am trying to truncate this sequence like you can do with arrays in F#
let sublist sequ (i:int) (n:int) =
let item = Seq.item(n-i) sequ
let start = Seq.item i sequ
let ending = Seq.item n sequ
Seq.truncate(item) (seq{start..ending})
sublist [|25..92|] 5 10
like it can be done here
Array.sub [|5..20|] 3 10
You forgot to write the expected results.
You can use take and skip as in the linked answer in the comments:
let sublist sequ (i:int) (n:int) =
sequ |> Seq.skip i |> Seq.take (n-1)
Notice that if you are dealing with arrays you can use array slices:
[|25..92|].[5..10]
>
val it : int [] = [|30; 31; 32; 33; 34; 35|]
[|5..20|].[3..10]
>
val it : int [] = [|8; 9; 10; 11; 12; 13; 14; 15|]

F# suitable container for (string, float, float) triads?

I have the following problem and I hope somebody can help me.
Short description of the problem: i need to store a (string A, float B, float C) triad into a suitable container. The triad originates fomr a double "for" loop.
But the essential point is that I will need to slice this container when the loops are over to perform other operations.
An example that can be executed from the .fsx shell (using Deedle frames) follows. The triad is what is beeing printed on the screen.
open Deedle
let categorical_variable = [| "A"; "B"; "C"; "A"; "B"; "C"; |]
let vec_1 = [| 15.5; 14.3; 15.5; 14.3; 15.5; 14.3; |]
let vec_2 = [| 114.3; 17.5; 9.3; 88.7; 115.5; 12.3; |]
let dframe = frame ["cat" =?> Series.ofValues categorical_variable
"v1" =?> Series.ofValues vec_1
"v2" =?> Series.ofValues vec_2 ]
let distinct_categorical_variables = categorical_variable |> Array.toSeq |> Seq.distinct |> Seq.toArray
let mutable frame_slice : Frame<int, string> = Frame.ofRows []
let mutable frame_slice_vec_1 : float[] = Array.empty
let mutable frame_slice_vec_1_distinct : float[] = Array.empty
for cat_var in distinct_categorical_variables do
frame_slice <- (dframe |> Frame.filterRowValues (fun row -> row.GetAs "cat" = cat_var))
frame_slice_vec_1 <- (frame_slice?v1).Values |> Seq.toArray
frame_slice_vec_1_distinct <- (frame_slice_vec_1 |> Array.toSeq |> Seq.distinct |> Seq.toArray)
for vec_1_iter in frame_slice_vec_1_distinct do
printfn "%s, %f, %f \n" cat_var vec_1_iter (Array.average ((frame_slice?v2).Values |> Seq.toArray) ) |> ignore
So, is there any suitable object where to store this triad? I saw Array3d objects, but I don't think they are the right solution cause A, B and C of my triad have different types.
Many thanks in advance.
you probably want a sequence expression with tuples:
let mySequence =
seq { for cat_var in distinct_categorical_variables do
...
for vec_1_iter in ... do
yield cat_var, vec_1_iter, Array.average ... }
// then use it like
for cat_var, vec_1_iter, result in mySequence do
...

How to execute a function, that creates a lot of objects, in parallel?

I am using Array.Parallel.map on a function but find that it is not executing at anywhere near full processor capacity. I am assuming this is because the function creates a lot of objects when running List.map and List.map2. Would this be causing a synchronization issue and is there a more appropriate way of doing this? At the moment the only way I can think of getting around this is by running each process as a separate executable using something like xargs under Linux.
I put together the script below to demonstrate the problem. It is a very basic data categorizer which relies on a field having a certain value as a rule to determine if this will predict a category:
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: list<int> }
let InitAssessment categorizeFields rules =
let ruleAssessments = List.init (List.length rules) (fun x -> 0)
List.map (fun categorizeField ->
let fieldIndex, categoryValue = categorizeField
{ CategoryAssessment.fieldIndex = fieldIndex;
value = categoryValue;
ruleAssessments = ruleAssessments })
categorizeFields
let AssessCategory ruleMatches (row : int[]) categoryAssessment =
let fieldIndex = categoryAssessment.fieldIndex
let categoryValue = categoryAssessment.value
let categoryMatch = categoryValue = row.[fieldIndex]
let newRuleAssessments =
List.map2 (fun ruleAssessment ruleMatch ->
if ruleMatch = categoryMatch then
ruleAssessment + 1
else
ruleAssessment)
categoryAssessment.ruleAssessments
ruleMatches
{ categoryAssessment with ruleAssessments = newRuleAssessments }
let MatchRule (row : int[]) rule =
let fieldIndex, eqVal = rule
row.[fieldIndex] = eqVal
let Assess categorizeFields rules input =
printfn "START - Assess"
let d =
Array.fold (fun categoryAssessment row ->
let ruleMatches = List.map (MatchRule row) rules
List.map (AssessCategory ruleMatches row) categoryAssessment)
(InitAssessment categorizeFields rules)
input
printfn "END - Assess"
d
let JoinAssessments assessments =
let numAssessments = Array.length assessments
Array.fold (fun accAssessment assessment ->
List.map2 (fun accCategory category ->
let newRuleAssessments =
List.map2 (+)
accCategory.ruleAssessments
category.ruleAssessments
{ accCategory with
ruleAssessments = newRuleAssessments })
accAssessment
assessment)
assessments.[0]
assessments.[1..(numAssessments-1)]
let numRecords = 10000
let numFields = 20
let numSplits = 10
let numRules = 10000
let inputs = Array.create numSplits
[| for i in 1 .. (numRecords / numSplits) ->
[| for j in 1 .. numFields ->
(i % 10) + j |] |]
let categorizeFields = [ (1, 6); (2, 3); (2, 4); (3, 2) ]
let rules = [ for i in 1 .. numRules -> (i % numFields, i) ]
let assessments =
Array.Parallel.map (Assess categorizeFields rules) inputs
|> JoinAssessments
printfn "Assessments: %A" assessments
0
After a fair bit of investigation, the ultimate answer to my question seems to be to find a way of not creating lots of objects. The easiest change to do this is moving to using arrays instead of lists. I have written up my findings more fully in an article: Beware of Immutable Lists for F# Parallel Processing.
The above program when altered as follows, runs better between threads and runs much quicker even on a single thread. Further improvements can be made by making the ruleAssessments field mutable as demonstrated in the referenced article.
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: int[] }
let InitAssessment categorizeFields rules =
let ruleAssessments = Array.create (Array.length rules) 0
Array.map (fun categorizeField ->
let fieldIndex, categoryValue = categorizeField
{ CategoryAssessment.fieldIndex = fieldIndex;
value = categoryValue;
ruleAssessments = ruleAssessments })
categorizeFields
let AssessCategory ruleMatches (row : int[]) categoryAssessment =
let fieldIndex = categoryAssessment.fieldIndex
let categoryValue = categoryAssessment.value
let categoryMatch = categoryValue = row.[fieldIndex]
let newRuleAssessments =
Array.map2 (fun ruleAssessment ruleMatch ->
if ruleMatch = categoryMatch then
ruleAssessment + 1
else
ruleAssessment)
categoryAssessment.ruleAssessments
ruleMatches
{ categoryAssessment with ruleAssessments = newRuleAssessments }
let MatchRule (row : int[]) rule =
let fieldIndex, eqVal = rule
row.[fieldIndex] = eqVal
let Assess categorizeFields rules input =
printfn "START - Assess"
let d =
Array.fold (fun categoryAssessment row ->
let ruleMatches = Array.map (MatchRule row) rules
Array.map (AssessCategory ruleMatches row) categoryAssessment)
(InitAssessment categorizeFields rules)
input
printfn "END - Assess"
d
let JoinAssessments assessments =
let numAssessments = Array.length assessments
Array.fold (fun accAssessment assessment ->
Array.map2 (fun accCategory category ->
let newRuleAssessments =
Array.map2 (+)
accCategory.ruleAssessments
category.ruleAssessments
{ accCategory with
ruleAssessments = newRuleAssessments })
accAssessment
assessment)
assessments.[0]
assessments.[1..(numAssessments-1)]
let numRecords = 10000
let numFields = 20
let numSplits = 10
let numRules = 10000
let inputs = Array.create numSplits
[| for i in 1 .. (numRecords / numSplits) ->
[| for j in 1 .. numFields ->
(i % 10) + j |] |]
let categorizeFields = [| (1, 6); (2, 3); (2, 4); (3, 2) |]
let rules = [| for i in 1 .. numRules -> (i % numFields, i) |]
let assessments =
Array.Parallel.map (Assess categorizeFields rules) inputs
|> JoinAssessments
printfn "Assessments: %A" assessments
0
This is a version of your program that doesn't require mutability and uses nearly all of the 4 cpus on my iMac.
To pull it off, it's driven by assessing each rule in parallel, not by processing records. That also required the input array to be transposed making it be fields by records.
open System
type CategoryAssessment =
{ fieldIndex: int
value: int
ruleAssessments: list<int> }
let MatchRule rVal fVal =
rVal = fVal
let AssessRule cMatches (inputs:int[][]) (rIndex, rVal) =
// printfn "START - Assess" // uses more cpu than the code itself
let matches = inputs.[rIndex] |>
Array.map2 (fun cVal fVal -> (MatchRule rVal fVal) = cVal) cMatches
let assessment = matches |>
Array.map ( fun v -> if v then 1 else 0 ) |>
Array.sum
// printfn "END - Assess"
assessment
let Assess categorizeFields rules (inputs:int[][]) =
categorizeFields |> List.map (fun (catIndex, catValue) ->
let catMatches = inputs.[catIndex] |> Array.map( fun v -> v = catValue )
let assessments = rules |> Array.Parallel.map
(AssessRule catMatches inputs)
|> Array.toList
{ CategoryAssessment.fieldIndex = catIndex;
value = catValue;
ruleAssessments = assessments }
)
let numRecords = 10000
let numFields = 20
let numRules = 10000
let inputs = [| for j in 1 .. numFields ->
[| for i in 1 .. numRecords -> (i % 10) + j |] |]
let categorizeFields = [ (1, 6); (2, 3); (2, 4); (3, 2) ]
let rules = [| for i in 1 .. numRules -> (i % numFields, i) |]
let assessments = Assess categorizeFields rules inputs
printfn "Assessments: %A" assessments
Assessing by rule allowed the summing of a single integer across all records for a given rule, avoiding mutable state and extra memory allocations.
I used a lot of array iteration to get the speed up but didn't remove all the lists.
I fear I changed the functionality while refactoring or made assumptions that can't be applied to your actual problem, however I do hope it's a useful example.

f# sequence of running total

Ok, this looks like it should be easy, but I'm just not getting it. If I have a sequence of numbers, how do I generate a new sequence made up of the running totals? eg for a sequence [1;2;3;4], I want to map it to [1;3;6;10]. In a suitably functional way.
Use List.scan:
let runningTotal = List.scan (+) 0 >> List.tail
[1; 2; 3; 4]
|> runningTotal
|> printfn "%A"
Seq.scan-based implementation:
let runningTotal seq' = (Seq.head seq', Seq.skip 1 seq') ||> Seq.scan (+)
{ 1..4 }
|> runningTotal
|> printfn "%A"
Another variation using Seq.scan (Seq.skip 1 gets rid of the leading zero):
> {1..4} |> Seq.scan (+) 0 |> Seq.skip 1;;
val it : seq<int> = seq [1; 3; 6; 10]
> Seq.scan (fun acc n -> acc + n) 0 [1;2;3;4];;
val it : seq<int> = seq [0; 1; 3; 6; ...]
With lists:
> [1;2;3;4] |> List.scan (fun acc n -> acc + n) 0 |> List.tail;;
val it : int list = [1; 3; 6; 10]
Edit: Another way with sequences:
let sum s = seq {
let x = ref 0
for i in s do
x := !x + i
yield !x
}
Yes, there's a mutable variable, but I find it more readable (if you want to get rid of the leading 0).
Figured it was worthwhile to share how to do this with Record Types in case that's also what you came here looking for.
Below is a fictitious example demonstrating the concept using runner laps around a track.
type Split = double
type Lap = { Num : int; Split : Split }
type RunnerLap = { Lap : Lap; TotalTime : double }
let lap1 = { Num = 1; Split = 1.23 }
let lap2 = { Num = 2; Split = 1.13 }
let lap3 = { Num = 3; Split = 1.03 }
let laps = [lap1;lap2;lap3]
let runnerLapsAccumulator =
Seq.scan
(fun rl l -> { rl with Lap = l; TotalTime = rl.TotalTime + l.Split }) // acumulator
{ Lap = { Num = 0; Split = 0.0 }; TotalTime = 0.0 } // initial state
let runnerLaps = laps |> runnerLapsAccumulator
printfn "%A" runnerLaps
Not sure this is the best way but it should do the trick
let input = [1; 2; 3; 4]
let runningTotal =
(input, 0)
|> Seq.unfold (fun (list, total) ->
match list with
| [] ->
None
| h::t ->
let total = total + h
total, (t, total) |> Some)
|> List.ofSeq

Check if an element is within a sequence

how to check if an element is contained within a sequence? I expected some Seq.contains, but i could not find it. Thanks
EDIT:
Or, for an easier task, how to make the diff between two sequences? Like, getting all the elements within a list that doesn not belong to another (or that do)?
Little bit simpler:
let contains x = Seq.exists ((=) x)
Seq.exists
let testseq = seq [ 1; 2; 3; 4 ]
let equalsTwo n = (n = 2)
let containsTwo = Seq.exists equalsTwo testseq
Set is your friend here:
let a = set [0;1;2;3]
let b = set [2;3;4;5]
let c = a - b
let d = b - a
let e = Set.intersect a b
let f = a + b
>
val c : Set<int> = seq [0; 1]
val d : Set<int> = seq [4; 5]
val e : Set<int> = seq [2; 3]
val f : Set<int> = seq [0; 1; 2; 3; ...]
Danny
Seq.exists again, but with slightly different syntax -
let testseq = seq [ 1; 2; 3; 4 ]
let testn = 2
testseq |> Seq.exists (fun x -> x = testn)
See MSDN F#: Seq.exists function: https://msdn.microsoft.com/en-us/library/ee353562.aspx
Lots of other good ones there too!
(Another question, another answer.)
This works, but I don't think that it's the most idomatic way to do it - (you'll need to wait until the US wakes up to find out):
let s1 = seq [ 1; 2; 3; 4 ]
let s2 = seq [ 3; 4; 5; 6 ]
seq {
for a in s1 do
if not (Seq.exists (fun n -> n = a) s2) then
yield a
}

Resources