I am trying to interpret my model using shap kernel explainer. The dataset is of shape (176683, 42). The explainer (xgbexplainer) is successfully modelled and when I use it to generate shap_values, it throws Memory Error.
import shap
xgb_explainer = shap.KernelExplainer(trained_model.steps[-1][-1].predict,X_for_shap.values)
shap_val = xgb_explainer.shap_values(X_for_shap.loc[0], nsamples=1)
First I used nsamples as default = 2*X_for_shap.shape[2] + 2048, it returned
MemoryError: Unable to allocate array with shape (2132, 7420686) and data type float64
When I set it to nsamples = 1, it runs for indefinite time. Please help me out to understand where I am doing wrong here
This is the screenshot of the error message
One thing that I dont understand about the kernelexplainer is why we need to impute the missing features with some strategies ( mean , median k means etc ) ? Why not just ignore them and fit a linear learner and compare it with the the model without observing that feature ? P( y| {S} U feature_i ) - P( y | { S } ) ? What kind of added value SHAP approach provides with having the whole features but some of them unknown ?
Related
I am trying to apply PCA to reduce dimensionality and noise using Julia language but am getting an error message. Could you please help me to solve this issue.
Are there other alternatives in julia to the perform the same task?
Here's the error message:
julia> X = (train_input)' |> Array;
julia> typeof(X)
Matrix{Real} (alias for Array{Real, 2})
julia> using MultivariateStats, MLJMultivariateStatsInterface
julia> M = fit(PCA, X; maxoutdim = 3)
MethodError: no method matching pcacov(::Matrix{Float64}, ::Vector{Real}; maxoutdim=3, pratio=0.99)
Closest candidates are:
pcacov(::AbstractMatrix{T}, ::AbstractVector{T}; maxoutdim, pratio) where T<: Real at C:\Users\USER\.julia\packages\MultivariateStats\rCiqT\src\pca.jl:209
I can't reproduce your error. But this is how I get the job done via the MultivariateStats v0.10.0 package in the case of fitting a PCA model:
julia> using MultivariateStats
julia> X = rand(5, 100);
fit(PCA, X, maxoutdim=3)
PCA(indim = 5, outdim = 3, principalratio = 0.6599153346885055)
Pattern matrix (unstandardized loadings):
────────────────────────────────────
PC1 PC2 PC3
────────────────────────────────────
1 0.201331 -0.0213382 0.0748083
2 0.0394825 0.137933 0.213251
3 0.14079 0.213082 -0.119594
4 0.154639 -0.0585538 -0.0975059
5 0.15221 -0.145161 0.0554158
────────────────────────────────────
Importance of components:
─────────────────────────────────────────────────────────
PC1 PC2 PC3
─────────────────────────────────────────────────────────
SS Loadings (Eigenvalues) 0.108996 0.0893847 0.0779532
Variance explained 0.260295 0.21346 0.186161
Cumulative variance 0.260295 0.473755 0.659915
Proportion explained 0.394436 0.323466 0.282098
Cumulative proportion 0.394436 0.717902 1.0
─────────────────────────────────────────────────────────
julia> typeof(X)
Matrix{Float64} (alias for Array{Float64, 2})
julia> eltype(X)
Float64
As you can see, I used a Matrix with Float64 element types as the input. This is the difference between my input in comparison with yours, I guess. So this might be the problem in your case.
Keep in mind that rows represent the features and the columns represent the data samples!
Finally, since you asked for other alternatives, I introduce you to the WeightedPCA package. This package provides weighted principal component analysis (PCA) for data with samples of heterogeneous quality (heteroscedastic noise). Here is a quick example:
julia> using WeightedPCA
julia> X = rand(5, 100);
pc1, pc2, pc3 = wpca.(Ref(collect(eachrow(X))), [1, 2, 3], Ref(UniformWeights()));
In the above, I fitted an equally weighted PCA on the X data and I requested values on 1, 2, and 3 principal components. Using this package, you can even apply specific weights or optimal weights. This package can be installed by pkg> add https://github.com/dahong67/WeightedPCA.jl.
Furtherore, as Antonello said, one can utilize BetaML package to perform PCA. This package provides machine learning algorithms written in the Julia programming language. Let's use it to perform PCA:
julia> using BetaML
julia> X = rand(100, 5);
julia> mod = PCA(max_unexplained_var=0.3)
A PCA BetaMLModel (unfitted)
julia> reproj_X = fit!(mod,X)
100×4 Matrix{Float64}:
0.204151 -0.482558 -0.161929 0.222503
0.69425 -0.371519 -0.628404 0.462256
0.198191 -0.601537 -0.638573 0.463886
⋮
-0.00176858 0.557353 -0.4237 0.310565
0.533239 0.133691 -0.236009 -0.0793025
0.333652 -0.388115 -0.28662 0.481249
julia> info(mod)
Dict{String, Any} with 5 entries:
"explained_var_by_dim" => [0.277255, 0.484764, 0.669897, 0.846831, 1.0]
"fitted_records" => 100
"prop_explained_var" => 0.846831
"retained_dims" => 4
"xndims" => 5
In the above, the max_unexplained_var specifies the actual proportion of variance not explained in the reprojected dimensions or in other words, the maximum unexplained variance that I'm ready to accept.
The error message is telling you that somewhere in the PCA fit an internal function is called which requires an AbstractMatrix{T} and an AbstractVector{T} as an input, which means that the element type of both arguments T needs to be the same. In your case a Matrix{Float64} and a Vector{Real} is being passed. I assume that the Vector{Real} comes from your X input which as your first cell shows is a Matrix{Real}.
This generally indicates an issue in the construction of X, which shouldn't have an abstract element type like Real. Try float.(X) as an input to coerce all elements to Float64.
I am trying to create a list based on my neural network outputs and use it in Tensorflow as a loss function.
Assume that results is list of size [1, batch_size] that is output by a neural network. I check to see whether the first value of this list is in a specific range passed in as a placeholder called valid_range, and if it is add 1 to a list. If it is not, add -1. The goal is to make all predictions of the network in the range, so the correct predictions is a tensor of all 1, which I call correct_predictions.
values_list = []
for j in range(batch_size):
a = results[0, j] >= valid_range[0]
b = result[0, j] <= valid_range[1]
c = tf.logical_and(a, b)
if (c == 1):
values_list.append(1)
else:
values_list.append(-1.)
values_list_tensor = tf.convert_to_tensor(values_list)
correct_predictions = tf.ones([batch_size, ], tf.float32)
Now, I want to use this as a loss function in my network, so that I can force all the predictions to be in the specified range. I try to train like this:
loss = tf.reduce_mean(tf.squared_difference(values_list_tensor, correct_predictions))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, gradient_clip_threshold)
optimize = optimizer.apply_gradients(zip(gradients, variables))
This, however, has a problem and throws an error on the last optimize line, saying:
ValueError: No gradients provided for any variable: ['<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d4afd0>',
'<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d66050>'
...
I tried to debug this in Tensorboard, and I notice that the list I am creating does not appear in the graph, so basically the x part of the loss function is not part of the network itself. Is there some way to accurately create a list based on the predictions of a neural network and use it in the loss function in Tensorflow to train the network?
Please help, I have been stuck on this for a few days now.
Edit:
Following what was suggested in the comments, I decided to use a l2 loss function, multiplying it by the binary vector I had from before values_list_tensor. The binary vector now has values 1 and 0 instead of 1 and -1. This way when the prediction is in the range the loss is 0, else it is the normal l2 loss. As I am unable to see the values of the tensors, I am not sure if this is correct. However, I can view the final loss and it is always 0, so something is wrong here. I am unsure if the multiplication is being done correctly and if values_list_tensor is calculated accurately? Can someone help and tell me what could be wrong?
loss = tf.reduce_mean(tf.nn.l2_loss(tf.matmul(tf.transpose(tf.expand_dims(values_list_tensor, 1)), tf.expand_dims(result[0, :], 1))))
Thanks
To answer the question in the comment. One way to write a piece-wise function is using tf.cond. For example, here is a function that returns 0 in [-1, 1] and x everywhere else:
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32)
y = tf.cond(tf.logical_or(tf.greater(x, 1.0), tf.less(x, -1.0)), lambda : x, lambda : 0.0)
y.eval({x: 1.5}) # prints 1.5
y.eval({x: 0.5}) # prints 0.0
TensorFlow provides the possibility for combining ValidationMonitors with several predefined estimators like tf.contrib.learn.DNNClassifier.
But I want to use a ValidationMonitor for my own estimator which I have created based on 1.
For my own estimator I initialize first a ValidationMonitor:
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(testX,testY,every_n_steps=50)
estimator = tf.contrib.learn.Estimator(model_fn=model,model_dir=direc,config=tf.contrib.learn.RunConfig(save_checkpoints_secs=1))
input_fn = tf.contrib.learn.io.numpy_input_fn({"x": x}, y, 4, num_epochs=1000)
Here I pass the monitor as shown in 2 for tf.contrib.learn.DNNClassifier:
estimator.fit(input_fn=input_fn, steps=1000,monitors=[validation_monitor])
This fails and following error was printed:
ValueError: Features are incompatible with given information. Given features: Tensor("input:0", shape=(?, 1), dtype=float64), required signatures: {'x': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(None)]), is_sparse=False)}.
How can I use monitors for my own estimators?
Thanks.
Problem is solved when passing input_fn containing testX and testY to ValidationMonitor instead of passing the tensors testX and testY directly.
For the record, your error was caused by the fact that ValidationMonitor expects x to be a dictionary like { 'feature_name_as_a_string' : feature_tensor }, which in your input_fn is done internally by the call to tf.contrib.learn.io.numpy_input_fn(...).
More information about how to build features dictionaries can be found in the Building Input Functions with tf.contrib.learn article of the documentation.
I have a convolutional neural network whose output is a 4-channel 2D image. I want to apply sigmoid activation function to the first two channels and then use BCECriterion to computer the loss of the produced images with the ground truth ones. I want to apply squared loss function to the last two channels and finally computer the gradients and do backprop. I would also like to multiply the cost of the squared loss for each of the two last channels by a desired scalar.
So the cost has the following form:
cost = crossEntropyCh[{1, 2}] + l1 * squaredLossCh_3 + l2 * squaredLossCh_4
The way I'm thinking about doing this is as follow:
criterion1 = nn.BCECriterion()
criterion2 = nn.MSECriterion()
error = criterion1:forward(model.output[{{}, {1, 2}}], groundTruth1) + l1 * criterion2:forward(model.output[{{}, {3}}], groundTruth2) + l2 * criterion2:forward(model.output[{{}, {4}}], groundTruth3)
However, I don't think this is the correct way of doing it since I will have to do 3 separate backprop steps, one for each of the cost terms. So I wonder, can anyone give me a better solution to do this in Torch?
SplitTable and ParallelCriterion might be helpful for your problem.
Your current output layer is followed by nn.SplitTable that splits your output channels and converts your output tensor into a table. You can also combine different functions by using ParallelCriterion so that each criterion is applied on the corresponding entry of output table.
For details, I suggest you read documentation of Torch about tables.
After comments, I added the following code segment solving the original question.
M = 100
C = 4
H = 64
W = 64
dataIn = torch.rand(M, C, H, W)
layerOfTables = nn.Sequential()
-- Because SplitTable discards the dimension it is applied on, we insert
-- an additional dimension.
layerOfTables:add(nn.Reshape(M,C,1,H,W))
-- We want to split over the second dimension (i.e. channels).
layerOfTables:add(nn.SplitTable(2, 5))
-- We use ConcatTable in order to create paths accessing to the data for
-- numereous number of criterions. Each branch from the ConcatTable will
-- have access to the data (i.e. the output table).
criterionPath = nn.ConcatTable()
-- Starting from offset 1, NarrowTable will select 2 elements. Since you
-- want to use this portion as a 2 dimensional channel, we need to combine
-- then by using JoinTable. Without JoinTable, the output will be again a
-- table with 2 elements.
criterionPath:add(nn.Sequential():add(nn.NarrowTable(1, 2)):add(nn.JoinTable(2)))
-- SelectTable is simplified version of NarrowTable, and it fetches the desired element.
criterionPath:add(nn.SelectTable(3))
criterionPath:add(nn.SelectTable(4))
layerOfTables:add(criterionPath)
-- Here goes the criterion container. You can use this as if it is a regular
-- criterion function (Please see the examples on documentation page).
criterionContainer = nn.ParallelCriterion()
criterionContainer:add(nn.BCECriterion())
criterionContainer:add(nn.MSECriterion())
criterionContainer:add(nn.MSECriterion())
Since I used almost every possible table operation, it looks a little bit nasty. However, this is the only way I could solve this problem. I hope that it helps you and others suffering from the same problem. This is how the result looks like:
dataOut = layerOfTables:forward(dataIn)
print(dataOut)
{
1 : DoubleTensor - size: 100x2x64x64
2 : DoubleTensor - size: 100x1x64x64
3 : DoubleTensor - size: 100x1x64x64
}
In the torch tutorial, I found the line:
mean[i] = trainData.data[{ {},i,{},{} }]:mean()
Is there anyone who can explain what the indexing { {},i,{},{} } is doing?
I could guess, but wanted to know the exact mechanism.
Thanks in advance.
This is actually a concise syntax for tensor narrowing / slicing, detailed here in the documentation.
Inside the [{ ... }], you can for each dimension of a tensor:
pass a number n to only keep the n-th component along this dimension,
pass a range {start,end} to keep all the components from start to end along this dimension,
pass {} to keep all the components along this dimension.
In this precise case, it's a narrowing from a u * v * w * x tensor to a u * 1 * w * x tensor by keeping only the i-th component along the 2nd dimension.