MathNumerics.LinearAlgebra Matrix.mapRows dimensionality issues - f#

So I have verified that the starting version of what I'm trying to do works, but for some reason when putting it into the Matrix.map high order function it breaks down.
Here is the failing function:
let SumSquares (theta:Vector<float>) (y:Vector<float>) (trainingData:Matrix<float>) =
let m = trainingData.RowCount
let theta' = theta.ToRowMatrix()
trainingData
|> Matrix.mapRows(fun a r -> (theta' * r) - y.[a] )
Here are some sample tests
Set up:
let tData = matrix [[1.0; 2.0]
[1.0; 3.0]
[1.0; 3.0]
[1.0; 4.0]]
let yVals = vector [5.0; 6.0; 7.0; 11.0]
let theta = vector [1.0; 0.2]
Test raw functionality of basic operation (theta transpose * vector - actual)
let theta' = theta.ToRowMatrix()
(theta.ToRowMatrix() * tData.[0, 0 .. 1]) - yVals.[0]
Testing in actual function:
tData |> SumSquares theta yVals
Here is a copy/paste of actual error. It reads as though its having issues of me mapping a larger vector to a smaller vector.
Parameter name: target
at MathNet.Numerics.LinearAlgebra.Storage.VectorStorage1.CopyToRow(MatrixStorage1 target, Int32 rowIndex, ExistingData existingData)
at FSI_0061.SumSquares(Vector1 theta, Vector1 y, Matrix`1 trainingData) in C:\projects\deleteme\ASPNet5Test\ConsoleApplication1\ConsoleApplication1\MachineLearning.fsx:line 23
at .$FSI_0084.main#() in C:\projects\deleteme\ASPNet5Test\ConsoleApplication1\ConsoleApplication1\MachineLearning.fsx:line 39
Stopped due to error

I found an even better easier way to do this. I have to credit s952163 for starting me down a good path, but this approach is even more optimized:
let square (x:Vector<float>) = x * x
let subtract (x:Vector<float>) (y:Vector<float>) = y - x
let divideBy (x:float) (y:float) = y / x
let SumSquares (theta:Vector<float>) (y:Vector<float>) (trainingData:Matrix<float>) =
let m = trainingData.RowCount |> float
(trainingData * theta)
|> subtract y
|> square
|> divideBy m

Since you know the number of rows you can just map to that. Arguably this is not pretty:
let SumSquares (theta:Vector<float>) (y:Vector<float>) (trainingData:Matrix<float>) =
let m = trainingData.RowCount
let theta' = theta.ToRowMatrix()
[0..m-1] |> List.map (fun i -> (((theta' * trainingData.[i,0..1]) |> Seq.exactlyOne) - yVals.[i] ))
Edit:
My guess is that mapRows wants everything to be in the same shape, and your output vector is different. So if you want to stick to the Vector type, this will just enumerate the indexed rows:
tData.EnumerateRowsIndexed() |> Seq.map (fun (i,r) -> (theta' * r) - yVals.[i])
and you can also use Matrix.toRowSeqi if you prefer to pipe it through, and get back a Matrix:
tData
|> Matrix.toRowSeqi
|> Seq.map (fun (i,x) -> (theta' * x) - yVals.[i])
|> DenseMatrix.ofRowSeq

Related

How to create a function which can test the random Word cooccurrence of another function?

So my team and I have created a function, wordMarkovChain, which generates a random string with nWords words, whose word-pairs are distributed according to the cooccurrence histogram wCooc.
Now we wanted to create a function which will test the newly created wordMarkovChain, so I can confirm that it's working as it should.
The function, diffw2, which compares two cooccurrence histograms as the average sum of squared differences, needs to take in two parameters, c1 and c2, of type wordCooccurrences and return a double, where c1 and c2 are two cooccurrence histograms of M elements such that c1(i, j) is the number
of times word number i is found following word number j.
The math behind the function should look like this: 1/M^2 * Summation from i=0 to M-1 Summation from i=0 to M-1 (c1(i, j) - c2(i, j))^2.
Sorry, I can't post a picture:(
Our two types which we have created seems to be the problem. c1 and c2 can have a different length but the type wordHistogram inside the type wordCoocurence can also have a different length.
The question is, how can we create such a function?
We tried with for loops, but we think it needs to be a recursive function. We are quite new to the whole concept of programming and are looking for some guidance. Please bear in mind that we do not possess vast knowledge of F#, especially not their in build functions.
CODE
// Types
type wordHistogram = (string * int) list
type wordCooccurrences = (string * wordHistogram) list
let diffw2 (c1 : wordCooccurrences) (c2 : wordCooccurrences) : double =
let mutable res = 0
let z1, z2 = c1 |> List.unzip
let z3, z4 = c2 |> List.unzip
let m1 = c1 |> List.length
let m2 = c2 |> List.length
let m = m1 + m2
for i in 0 .. (m - 1) do
for j in 0 .. (m - 1) do
for k in 0 .. ((z2.[j] |> List.length) - 1) do
res <- res + (snd z2.[j].[k] - snd z4.[j].[k]) * (snd z2.[j].[k] - snd z4.[j].[k])
(1.0 / (float(m * m))) * float(res)
This might get you closer to what you need. This could instead be done using recursive functions, but I usually prefer to use built-in higher-order functions like fold or fold2 when they can do the job. See https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/list.fold2%5B't1%2C't2%2C'state%5D-function-%5Bfsharp%5D
Each call of List.fold2 receives an "accumulator" with an initial value (or "state") of 0.0. This accumulator is passed to the lambda function in the parameter named "acc". As the lambda is applied to each element of the two input lists, it adds the intermediate result to that acc value, which is then returned as the result of the List.fold2 call.
type wordHistogram = (string * int) list
type wordCooccurrences = (string * wordHistogram) list
let diffw2 (c1 : wordCooccurrences) (c2 : wordCooccurrences) =
let m1 = c1 |> List.length
let m2 = c2 |> List.length
let m = m1 + m2
let res =
(0.0, c1, c2) |||>
List.fold2 (fun acc (_str1, hist1) (_str2, hist2) ->
let intermedRes =
(0.0, hist1, hist2) |||>
List.fold2 (fun acc (_str1, freq1) (_str2, freq2) ->
let diff = float (freq1 - freq2)
acc + diff * diff
)
acc + intermedRes
)
(1.0 / float(m * m)) * res
I think you are trying to do the following
let diffw2_ (c1 : wordCooccurrences) (c2 : wordCooccurrences) =
List.zip c1 c2
|> List.sumBy (fun ((_, w1), (_, w2)) ->
List.zip w1 w2
|> List.sumBy (fun ((_,i1), (_,i2)) ->
(float (i1 - i2)) ** 2.0
)
) |> (fun totalSumOfSquares -> totalSumOfSquares / (c1 |> List.length |> float))
This just returns the average square differences between corresponding elements of c1 and c2. It assumes that c1 and c2 have the same structure.
With F# (and functional programming in general), you typically avoid using mutable variables and for loops, in favor of pure functions.

Math Numerics Sum Matrix - FSharp

I'm working with the MathNumerics library and just attempting to sum all items in the matrix. It appears to not work. Here is what I've got...It appears to be wanting some type annotations somewhere with the sum. I tried the below annotations, but it seems to think that float is incompatible...?
let SumSquares (theta:Vector<float>) (y:float) (trainingData:Matrix<float>) : CostFunction =
let m = trainingData.RowCount
trainingData
|> Matrix.mapRows(fun a r -> vector[ square((theta * r) - y) ])
|> Matrix.sum<float,float>()

How to cumulate (scan) Deedle data frame values

I'm loading a sequence of records into a deedle data frame (from a database table). Is it possible to accumulate (for example sum cumulatively) the values, and get back a data frame? For example there is Series.scanValues but there is no Frame.scanValues. There is Frame.map, but it didn't do what I expected, it left all values as they were.
#if INTERACTIVE
#r #"Fsharp.Charting"
#load #"..\..\Deedle.fsx"
#endif
open FSharp.Charting
open FSharp.Charting.ChartTypes
open Deedle
type SeriesX = {
DataDate:DateTime
Series1:float
Series2:float
Series3:float
}
let rnd = new System.Random()
rnd.NextDouble() - 0.5
let data =
[for i in [100..-1..1] ->
{SeriesX.DataDate = DateTime.Now.AddDays(float -i)
SeriesX.Series1 = rnd.NextDouble() - 0.5
SeriesX.Series2 = rnd.NextDouble() - 0.5
SeriesX.Series3 = rnd.NextDouble() - 0.5
}
]
# now comes the deedle frame:
let df = data |> Frame.ofRecords
let df = df.IndexRows<DateTime>("DataDate")
df.["Series1"] |> Chart.Line
df.["Series1"].ScanValues((fun acc x -> acc + x),0.0) |> Chart.Line
let df' = df |> Frame.mapValues (Seq.scan (fun acc x -> acc + x) 0.0)
df'.["Series1"] |> Chart.Line
The last two lines just give me back the original values while I would like to have the accumulated values like in df.["Series1"].Scanvalues for Series1, Series2, and Series3.
For filtering and projection, series provides Where and Select methods
and corresponding Series.map and Series.filter functions (there is
also Series.mapValues and Series.mapKeys if you only want to transform
one aspect).
So you just apply your function to each Series:
let allSum =
df.Columns
|> Series.mapValues(Series.scanValues(fun acc v -> acc + (v :?> float)) 0.0)
|> Frame.ofColumns
and use Frame.ofColumns that to convert the result to the Frame.
Edit:
If you need to select only numerics columns, you can use the Frame.getNumericCols:
let allSum =
df
|> Frame.getNumericCols
|> Series.mapValues(Series.scanValues (+) 0.0)
|> Frame.ofColumns
without an explicit type cast code has become more beautiful :)
There is a Series.scanValues function. You can obtain a series from every column in your data frame like this: frame$column, which gets you a Series.
If you need all the columns at once to do the scan, you could first map each row into a single value (a tuple, for example) and the apply the Series.scanValues to that new column.

f# calc variance function

I am working through an F# function that calculates variance. I'm trying to step through each iteration to get the correct answer but think im getitng off track somewhere because I keep getting the wrong answer. Could someone please walk me through one iteration of this function to get me back on track
let variance values
let average = Seq.average values
let length = Seq.length values
values
|> Seq.map (fun x -> 1.0 / float length * (x - average) ** 2.0)
|> Seq.sum
call is variance [1.0..6.0]
To me the first value passed is 1.0 so it would be (1.0 / 6 * (1.0-3.5) ** 2.0) and therefore .166 * -2.5 ** 2.0
I'm also unsure what the ** means in formula I'm assuming multiply.
Correct answer should be 2.9166666667
To make it easier to understand, you can rewrite the code as follows:
let variance values =
let average = Seq.average values
let length = Seq.length values
let sum = values
|> Seq.map (fun x -> (x - average) ** 2.0)
|> Seq.sum in sum / float length
variance [1.0..6.0] |> printfn "%A"
Print: 2.916666667
Link: https://dotnetfiddle.net/09PHXn
By iteration:
let variancetest values =
let average = Seq.average values
let length = Seq.length values
values
|> Seq.iteri(fun i x ->
printfn "%i [%f]: %f ^ 2 = %A" i x (x - average) ((x - average) ** 2.0))
let sum = values
|> Seq.map (fun x -> (x - average) ** 2.0)
|> Seq.sum
let flength = float length
printfn "Sum = %f" sum
printfn "1/length = %f" (1.0 / flength)
printfn "Result / length = %f" (sum / flength)
variancetest [1.0..6.0]
Print:
0 [1.000000]: -2.500000 ^ 2 = 6.25
1 [2.000000]: -1.500000 ^ 2 = 2.25
2 [3.000000]: -0.500000 ^ 2 = 0.25
3 [4.000000]: 0.500000 ^ 2 = 0.25
4 [5.000000]: 1.500000 ^ 2 = 2.25
5 [6.000000]: 2.500000 ^ 2 = 6.25
Sum = 17.500000
1/length = 0.166667
Result / length = 2.916667
https://dotnetfiddle.net/02r3qG

How to generate an array of exponential weights?

I am trying to do an "unfold" - (I think), by starting with an initial value, applying some function to it repeatedly, and then getting a sequence as a result.
In this example, I'm trying to start with 1.0, multiply it by .80, and do it 4 times, such that I end up with an array = [| 1.0; 0.80; 0.64; 0.512 |]
VS 2010 says I'm using "i" in an invalid way, and that mutable values cannot be captured by closures - so this function does not compile. Can anyone possibly suggest a clean approach that actually works? Thank you.
let expSeries seed fade n =
//take see and repeatedly multiply it by the fade factor n times...
let mutable i = 0;
let mutable weight = seed;
[| while(i < n) do
yield weight;
weight <- weight * fade |]
let testWeights = expSeries 1.0 0.80 4
let exp_series seed fade n =
Array.init (n) (fun i -> seed * fade ** (float i))
I think this recursive version should work.
let expSeries seed fade n =
let rec buildSeq i weight = seq {
if i < n then
yield weight;
yield! buildSeq (i + 1) (weight * fade)
}
buildSeq 0 seed
|> Seq.toArray
Based on the anwer to this question, you can create an unfold, and take a number of values of it:
let weighed startvalue factor =
startvalue |> Seq.unfold (fun x -> Some (x, factor * x))
let fivevalues = weighed 1.0 .8 |> Seq.take 5
If you want to explicitly use an unfold, here's how:
let expSeries seed fade n =
Seq.unfold
(fun (weight,k) ->
if k > n then None
else Some(weight,(weight*fade, k+1)))
(seed,1)
|> Array.ofSeq
let arr = expSeries 1.0 0.80 4
Note that the reason your original code won't work is that mutable bindings can't be captured by closures, and sequence, list, and array expressions implicitly use closures.

Resources