No method matching pcacov - machine-learning

I am trying to apply PCA to reduce dimensionality and noise using Julia language but am getting an error message. Could you please help me to solve this issue.
Are there other alternatives in julia to the perform the same task?
Here's the error message:
julia> X = (train_input)' |> Array;
julia> typeof(X)
Matrix{Real} (alias for Array{Real, 2})
julia> using MultivariateStats, MLJMultivariateStatsInterface
julia> M = fit(PCA, X; maxoutdim = 3)
MethodError: no method matching pcacov(::Matrix{Float64}, ::Vector{Real}; maxoutdim=3, pratio=0.99)
Closest candidates are:
pcacov(::AbstractMatrix{T}, ::AbstractVector{T}; maxoutdim, pratio) where T<: Real at C:\Users\USER\.julia\packages\MultivariateStats\rCiqT\src\pca.jl:209

I can't reproduce your error. But this is how I get the job done via the MultivariateStats v0.10.0 package in the case of fitting a PCA model:
julia> using MultivariateStats
julia> X = rand(5, 100);
fit(PCA, X, maxoutdim=3)
PCA(indim = 5, outdim = 3, principalratio = 0.6599153346885055)
Pattern matrix (unstandardized loadings):
────────────────────────────────────
PC1 PC2 PC3
────────────────────────────────────
1 0.201331 -0.0213382 0.0748083
2 0.0394825 0.137933 0.213251
3 0.14079 0.213082 -0.119594
4 0.154639 -0.0585538 -0.0975059
5 0.15221 -0.145161 0.0554158
────────────────────────────────────
Importance of components:
─────────────────────────────────────────────────────────
PC1 PC2 PC3
─────────────────────────────────────────────────────────
SS Loadings (Eigenvalues) 0.108996 0.0893847 0.0779532
Variance explained 0.260295 0.21346 0.186161
Cumulative variance 0.260295 0.473755 0.659915
Proportion explained 0.394436 0.323466 0.282098
Cumulative proportion 0.394436 0.717902 1.0
─────────────────────────────────────────────────────────
julia> typeof(X)
Matrix{Float64} (alias for Array{Float64, 2})
julia> eltype(X)
Float64
As you can see, I used a Matrix with Float64 element types as the input. This is the difference between my input in comparison with yours, I guess. So this might be the problem in your case.
Keep in mind that rows represent the features and the columns represent the data samples!
Finally, since you asked for other alternatives, I introduce you to the WeightedPCA package. This package provides weighted principal component analysis (PCA) for data with samples of heterogeneous quality (heteroscedastic noise). Here is a quick example:
julia> using WeightedPCA
julia> X = rand(5, 100);
pc1, pc2, pc3 = wpca.(Ref(collect(eachrow(X))), [1, 2, 3], Ref(UniformWeights()));
In the above, I fitted an equally weighted PCA on the X data and I requested values on 1, 2, and 3 principal components. Using this package, you can even apply specific weights or optimal weights. This package can be installed by pkg> add https://github.com/dahong67/WeightedPCA.jl.
Furtherore, as Antonello said, one can utilize BetaML package to perform PCA. This package provides machine learning algorithms written in the Julia programming language. Let's use it to perform PCA:
julia> using BetaML
julia> X = rand(100, 5);
julia> mod = PCA(max_unexplained_var=0.3)
A PCA BetaMLModel (unfitted)
julia> reproj_X = fit!(mod,X)
100×4 Matrix{Float64}:
0.204151 -0.482558 -0.161929 0.222503
0.69425 -0.371519 -0.628404 0.462256
0.198191 -0.601537 -0.638573 0.463886
⋮
-0.00176858 0.557353 -0.4237 0.310565
0.533239 0.133691 -0.236009 -0.0793025
0.333652 -0.388115 -0.28662 0.481249
julia> info(mod)
Dict{String, Any} with 5 entries:
"explained_var_by_dim" => [0.277255, 0.484764, 0.669897, 0.846831, 1.0]
"fitted_records" => 100
"prop_explained_var" => 0.846831
"retained_dims" => 4
"xndims" => 5
In the above, the max_unexplained_var specifies the actual proportion of variance not explained in the reprojected dimensions or in other words, the maximum unexplained variance that I'm ready to accept.

The error message is telling you that somewhere in the PCA fit an internal function is called which requires an AbstractMatrix{T} and an AbstractVector{T} as an input, which means that the element type of both arguments T needs to be the same. In your case a Matrix{Float64} and a Vector{Real} is being passed. I assume that the Vector{Real} comes from your X input which as your first cell shows is a Matrix{Real}.
This generally indicates an issue in the construction of X, which shouldn't have an abstract element type like Real. Try float.(X) as an input to coerce all elements to Float64.

Related

Ambiguity in recurrent neural network training in Julia Flux

I'm using Julia's Flux library to learn about neural networks. According to the documentation for train! (where train! takes arguments (loss, params, data, opt)):
For each datapoint d in data, compute the gradient of loss with respect to params through backpropagation and call the optimizer opt.
(see source for train!: https://github.com/FluxML/Flux.jl/blob/master/src/optimise/train.jl)
For a conventional NN based on Dense -- let's say with a one-dimensional input and output, i.e. with one feature -- this is easy to understand. Each element in data is a pair of single numbers, an independent sample of 1-d input/output values. train! does forward- and backpropagation on each pair of 1-d samples one at a time. In the process, the loss function is evaluated on each sample. (Do I have this right?)
My question is: how does this extend to a recurrent NN? Take the case of an RNN with 1-d (i.e. one feature) input and output. It seems like there's some ambiguity in how to structure the input and output data, and the results change based on the structure. As one example:
x = [[1], [2], [3]]
y = [4, 5, 6]
data = zip(x, y)
m = RNN(1, 1)
opt = Descent()
loss(x, y) = sum((Flux.stack(m.(x), 1) .- y) .^ 2)
train!(loss, params(m), data, opt)
(loss function taken from: https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md)
In this example, when train! loops through each sample (for d in data), each value of d is a pair of single values from x and y, e.g. ([1], 4). loss is evaluated based on these single values. This is the same as in the Dense case.
On the other hand, consider:
x = [[[1], [2], [3]]]
y = [[4, 5, 6]]
m = RNN(1, 1)
opt = Descent()
loss(x, y) = sum((Flux.stack(m.(x), 1) .- y) .^ 2)
train!(loss, params(m), zip(x, y), opt)
Note that the only difference here is that x and y are nested in an extra pair of square brackets. As a result there's only one d in data, and it's a pair of sequences: ([[1], [2], [3]], [4, 5, 6]). loss can be evaluated on this version of d, and it returns a 1-d value, as required for training. But the value returned by loss is different than in any of the three results from the previous case, so the training process turns out differently.
The point is that both structures are valid in the sense that loss and train! handle them without error. Conceptually, I can make an argument for both structures being correct. But the results are different, and I assume that only one way is right. In other words, for training an RNN, should each d in data be a whole sequence, or a single element from a sequence?

Generating local interpretations using Shap Kernel Explainer

I am trying to interpret my model using shap kernel explainer. The dataset is of shape (176683, 42). The explainer (xgbexplainer) is successfully modelled and when I use it to generate shap_values, it throws Memory Error.
import shap
xgb_explainer = shap.KernelExplainer(trained_model.steps[-1][-1].predict,X_for_shap.values)
shap_val = xgb_explainer.shap_values(X_for_shap.loc[0], nsamples=1)
First I used nsamples as default = 2*X_for_shap.shape[2] + 2048, it returned
MemoryError: Unable to allocate array with shape (2132, 7420686) and data type float64
When I set it to nsamples = 1, it runs for indefinite time. Please help me out to understand where I am doing wrong here
This is the screenshot of the error message
One thing that I dont understand about the kernelexplainer is why we need to impute the missing features with some strategies ( mean , median k means etc ) ? Why not just ignore them and fit a linear learner and compare it with the the model without observing that feature ? P( y| {S} U feature_i ) - P( y | { S } ) ? What kind of added value SHAP approach provides with having the whole features but some of them unknown ?

arbitrarily weighted moving average (low- and high-pass filters)

Given input signal x (e.g. a voltage, sampled thousand times per second couple of minutes long), I'd like to calculate e.g.
/ this is not q
y[3] = -3*x[0] - x[1] + x[2] + 3*x[3]
y[4] = -3*x[1] - x[2] + x[3] + 3*x[4]
. . .
I'm aiming for variable window length and weight coefficients. How can I do it in q? I'm aware of mavg and signal processing in q and moving sum qidiom
In the DSP world it's called applying filter kernel by doing convolution. Weight coefficients define the kernel, which makes a high- or low-pass filter. The example above calculates the slope from last four points, placing the straight line via least squares method.
Something like this would work for parameterisable coefficients:
q)x:10+sums -1+1000?2f
q)f:{sum x*til[count x]xprev\:y}
q)f[3 1 -1 -3] x
0n 0n 0n -2.385585 1.423811 2.771659 2.065391 -0.951051 -1.323334 -0.8614857 ..
Specific cases can be made a bit faster (running 0 xprev is not the best thing)
q)g:{prev[deltas x]+3*x-3 xprev x}
q)g[x]~f[3 1 -1 -3]x
1b
q)\t:100000 f[3 1 1 -3] x
4612
q)\t:100000 g x
1791
There's a kx white paper of signal processing in q if this area interests you: https://code.kx.com/q/wp/signal-processing/
This may be a bit old but I thought I'd weigh in. There is a paper I wrote last year on signal processing that may be of some value. Working purely within KDB, dependent on the signal sizes you are using, you will see much better performance with a FFT based convolution between the kernel/window and the signal.
However, I've only written up a simple radix-2 FFT, although in my github repo I do have the untested work for a more flexible Bluestein algorithm which will allow for more variable signal length. https://github.com/callumjbiggs/q-signals/blob/master/signal.q
If you wish to go down the path of performing a full manual convolution by a moving sum, then the best method would be to break it up into blocks equal to the kernel/window size (which was based on some work Arthur W did many years ago)
q)vec:10000?100.0
q)weights:30?1.0
q)wsize:count weights
q)(weights$(((wsize-1)#0.0),vec)til[wsize]+) each til count v
32.5931 75.54583 100.4159 124.0514 105.3138 117.532 179.2236 200.5387 232.168.
If your input list not big then you could use the technique mentioned here:
https://code.kx.com/q/cookbook/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
That uses 'scan' adverb. As that process creates multiple lists which might be inefficient for big lists.
Other solution using scan is:
q)f:{sum y*next\[z;x]} / x-input list, y-weights, z-window size-1
q)f[x;-3 -1 1 3;3]
This function also creates multiple lists so again might not be very efficient for big lists.
Other option is to use indices to fetch target items from the input list and perform the calculation. This will operate only on input list.
q) f:{[l;w;i]sum w*l i+til 4} / w- weight, l- input list, i-current index
q) f[x;-3 -1 1 3]#'til count x
This is a very basic function. You can add more variables to it as per your requirements.

How to apply different cost functions to different output channels of a convolutional network?

I have a convolutional neural network whose output is a 4-channel 2D image. I want to apply sigmoid activation function to the first two channels and then use BCECriterion to computer the loss of the produced images with the ground truth ones. I want to apply squared loss function to the last two channels and finally computer the gradients and do backprop. I would also like to multiply the cost of the squared loss for each of the two last channels by a desired scalar.
So the cost has the following form:
cost = crossEntropyCh[{1, 2}] + l1 * squaredLossCh_3 + l2 * squaredLossCh_4
The way I'm thinking about doing this is as follow:
criterion1 = nn.BCECriterion()
criterion2 = nn.MSECriterion()
error = criterion1:forward(model.output[{{}, {1, 2}}], groundTruth1) + l1 * criterion2:forward(model.output[{{}, {3}}], groundTruth2) + l2 * criterion2:forward(model.output[{{}, {4}}], groundTruth3)
However, I don't think this is the correct way of doing it since I will have to do 3 separate backprop steps, one for each of the cost terms. So I wonder, can anyone give me a better solution to do this in Torch?
SplitTable and ParallelCriterion might be helpful for your problem.
Your current output layer is followed by nn.SplitTable that splits your output channels and converts your output tensor into a table. You can also combine different functions by using ParallelCriterion so that each criterion is applied on the corresponding entry of output table.
For details, I suggest you read documentation of Torch about tables.
After comments, I added the following code segment solving the original question.
M = 100
C = 4
H = 64
W = 64
dataIn = torch.rand(M, C, H, W)
layerOfTables = nn.Sequential()
-- Because SplitTable discards the dimension it is applied on, we insert
-- an additional dimension.
layerOfTables:add(nn.Reshape(M,C,1,H,W))
-- We want to split over the second dimension (i.e. channels).
layerOfTables:add(nn.SplitTable(2, 5))
-- We use ConcatTable in order to create paths accessing to the data for
-- numereous number of criterions. Each branch from the ConcatTable will
-- have access to the data (i.e. the output table).
criterionPath = nn.ConcatTable()
-- Starting from offset 1, NarrowTable will select 2 elements. Since you
-- want to use this portion as a 2 dimensional channel, we need to combine
-- then by using JoinTable. Without JoinTable, the output will be again a
-- table with 2 elements.
criterionPath:add(nn.Sequential():add(nn.NarrowTable(1, 2)):add(nn.JoinTable(2)))
-- SelectTable is simplified version of NarrowTable, and it fetches the desired element.
criterionPath:add(nn.SelectTable(3))
criterionPath:add(nn.SelectTable(4))
layerOfTables:add(criterionPath)
-- Here goes the criterion container. You can use this as if it is a regular
-- criterion function (Please see the examples on documentation page).
criterionContainer = nn.ParallelCriterion()
criterionContainer:add(nn.BCECriterion())
criterionContainer:add(nn.MSECriterion())
criterionContainer:add(nn.MSECriterion())
Since I used almost every possible table operation, it looks a little bit nasty. However, this is the only way I could solve this problem. I hope that it helps you and others suffering from the same problem. This is how the result looks like:
dataOut = layerOfTables:forward(dataIn)
print(dataOut)
{
1 : DoubleTensor - size: 100x2x64x64
2 : DoubleTensor - size: 100x1x64x64
3 : DoubleTensor - size: 100x1x64x64
}

AR model lattice implementation

I'm trying to go through the procedure of Speech Synthesis via AR model, or LPC synthesis, IIR all-pole filter model, what ever you call it.
The main idea is to get the auto-correlation(AR) coefficient and estimate error, then use the AR coefficients to filter the estimated error, we can get the reconstructed signal.
**MATLAB CODE**
data = [1 2 1 3 5 1 2 5];
% auto correlation coefficients
a = lpc(data, 4);
% estimated signal
est = filter([0 -a(2:end)],1,data);
% estimated error
e = data - est;
% reconstructed signal
rec = filter(1,a,e);
You will see that rec == data exactly.
Now comes my question.
I'm trying to convert the model into Latices implementation. After looking up the Matlab reference, it turned out that I should use
tf2latc
to convert the transfer function into lattice implementation and
latcfilt
to use the lattice to filter the data.
Simply repeating the procedure above just doesn't work.
So I'm looking for help in the following aspects:
1) Example on using the tr2latc and latcfilt function to perform a complete procedure of building the filter.
2) Example on using a lattice implementation to perform a voice reconstruction.
Thx
Well, finally I got the answer.
From a transfer function, we can get the lattice implementation coefficients. Then filter it with latcfilt.
a = [1 3 1 4 4];
[k v] = tf2latc(1,a)
x = [1 2 1 3 4 1 5];
filter(1,a,x)
latcfilt(k,v,x)
Then you can see that the two filter gives the same result.

Resources