Using quantile in Flux (Julia) in loss function - machine-learning

I am trying to use quantile in a loss function to train! (for some robustness, like least trimmed squares), but it mutates the array and Zygote throws an error Mutating arrays is not supported, coming from sort! . Below is a simple example (the content does not make sense of course):
using Flux, StatsBase
xdata = randn(2, 100)
ydata = randn(100)
model = Chain(Dense(2,10), Dense(10, 1))
function trimmedLoss(x,y; trimFrac=0.f05)
yhat = model(x)
absRes = abs.(yhat .- y) |> vec
trimVal = quantile(absRes, 1.f0-trimFrac)
s = sum(ifelse.(absRes .> trimVal, 0.f0 , absRes ))/(length(absRes)*(1.f0-trimFrac))
#s = sum(absRes)/length(absRes) # using this and commenting out the two above works (no surprise)
end
println(trimmedLoss(xdata, ydata)) #works ok
Flux.train!(trimmedLoss, params(model), zip([xdata], [ydata]), ADAM())
println(trimmedLoss(xdata, ydata)) #changed loss?
This is all in Flux 0.10 with Julia 1.2
Thanks in advance for any hints or workaround!

Ideally, we'd define a custom adjoint for quantile so that this works out of the box. (Feel free to open an issue to remind us to do this.)
In the mean time there's a quick workaround. It's actually the sorting that causes trouble here so if you do quantile(xs, p, sorted=true) it'll work. Obviously this requires xs to be sorted to get correct results, so you might need to use quantile(sort(xs), ...).
Depending on your Zygote version you might also need an adjoint for sort. That one's pretty easy:
julia> using Zygote: #adjoint
julia> #adjoint function sort(x)
p = sortperm(x)
x[p], x̄ -> (x̄[invperm(p)],)
end
julia> gradient(x -> quantile(sort(x), 0.5, sorted=true), [1, 2, 3, 3])
([0.0, 0.5, 0.5, 0.0],)
We'll make that built-in in the next Zygote release, but for now if you add that to your script it'll get your code working.

Related

No method matching pcacov

I am trying to apply PCA to reduce dimensionality and noise using Julia language but am getting an error message. Could you please help me to solve this issue.
Are there other alternatives in julia to the perform the same task?
Here's the error message:
julia> X = (train_input)' |> Array;
julia> typeof(X)
Matrix{Real} (alias for Array{Real, 2})
julia> using MultivariateStats, MLJMultivariateStatsInterface
julia> M = fit(PCA, X; maxoutdim = 3)
MethodError: no method matching pcacov(::Matrix{Float64}, ::Vector{Real}; maxoutdim=3, pratio=0.99)
Closest candidates are:
pcacov(::AbstractMatrix{T}, ::AbstractVector{T}; maxoutdim, pratio) where T<: Real at C:\Users\USER\.julia\packages\MultivariateStats\rCiqT\src\pca.jl:209
I can't reproduce your error. But this is how I get the job done via the MultivariateStats v0.10.0 package in the case of fitting a PCA model:
julia> using MultivariateStats
julia> X = rand(5, 100);
fit(PCA, X, maxoutdim=3)
PCA(indim = 5, outdim = 3, principalratio = 0.6599153346885055)
Pattern matrix (unstandardized loadings):
────────────────────────────────────
PC1 PC2 PC3
────────────────────────────────────
1 0.201331 -0.0213382 0.0748083
2 0.0394825 0.137933 0.213251
3 0.14079 0.213082 -0.119594
4 0.154639 -0.0585538 -0.0975059
5 0.15221 -0.145161 0.0554158
────────────────────────────────────
Importance of components:
─────────────────────────────────────────────────────────
PC1 PC2 PC3
─────────────────────────────────────────────────────────
SS Loadings (Eigenvalues) 0.108996 0.0893847 0.0779532
Variance explained 0.260295 0.21346 0.186161
Cumulative variance 0.260295 0.473755 0.659915
Proportion explained 0.394436 0.323466 0.282098
Cumulative proportion 0.394436 0.717902 1.0
─────────────────────────────────────────────────────────
julia> typeof(X)
Matrix{Float64} (alias for Array{Float64, 2})
julia> eltype(X)
Float64
As you can see, I used a Matrix with Float64 element types as the input. This is the difference between my input in comparison with yours, I guess. So this might be the problem in your case.
Keep in mind that rows represent the features and the columns represent the data samples!
Finally, since you asked for other alternatives, I introduce you to the WeightedPCA package. This package provides weighted principal component analysis (PCA) for data with samples of heterogeneous quality (heteroscedastic noise). Here is a quick example:
julia> using WeightedPCA
julia> X = rand(5, 100);
pc1, pc2, pc3 = wpca.(Ref(collect(eachrow(X))), [1, 2, 3], Ref(UniformWeights()));
In the above, I fitted an equally weighted PCA on the X data and I requested values on 1, 2, and 3 principal components. Using this package, you can even apply specific weights or optimal weights. This package can be installed by pkg> add https://github.com/dahong67/WeightedPCA.jl.
Furtherore, as Antonello said, one can utilize BetaML package to perform PCA. This package provides machine learning algorithms written in the Julia programming language. Let's use it to perform PCA:
julia> using BetaML
julia> X = rand(100, 5);
julia> mod = PCA(max_unexplained_var=0.3)
A PCA BetaMLModel (unfitted)
julia> reproj_X = fit!(mod,X)
100×4 Matrix{Float64}:
0.204151 -0.482558 -0.161929 0.222503
0.69425 -0.371519 -0.628404 0.462256
0.198191 -0.601537 -0.638573 0.463886
⋮
-0.00176858 0.557353 -0.4237 0.310565
0.533239 0.133691 -0.236009 -0.0793025
0.333652 -0.388115 -0.28662 0.481249
julia> info(mod)
Dict{String, Any} with 5 entries:
"explained_var_by_dim" => [0.277255, 0.484764, 0.669897, 0.846831, 1.0]
"fitted_records" => 100
"prop_explained_var" => 0.846831
"retained_dims" => 4
"xndims" => 5
In the above, the max_unexplained_var specifies the actual proportion of variance not explained in the reprojected dimensions or in other words, the maximum unexplained variance that I'm ready to accept.
The error message is telling you that somewhere in the PCA fit an internal function is called which requires an AbstractMatrix{T} and an AbstractVector{T} as an input, which means that the element type of both arguments T needs to be the same. In your case a Matrix{Float64} and a Vector{Real} is being passed. I assume that the Vector{Real} comes from your X input which as your first cell shows is a Matrix{Real}.
This generally indicates an issue in the construction of X, which shouldn't have an abstract element type like Real. Try float.(X) as an input to coerce all elements to Float64.

Is there a way to get the final system of equations sent by cvxpy to the solver?

If I understand correctly, cvxpy converts our high-level problem description to the standard canonical form before it is sent to a solver.
By the standard form I mean the form that can be used for the descent algorithms, so, for instance, it would convert all the absolute values in the objective to be a difference of two positive numbers with some new constraints, etc.
Wondering if its possible to see what the reduction looked like for a problem I specify in cvxpy?
For instance, lets say I have the following problem:
import numpy as np
import cvxpy as cp
x = cp.Variable(2)
L = np.asarray([[1,2],[2,3]])
P = L.T # L
constraints = []
constraints.append(x >= [-10, -10])
constraints.append(x <= [10, 10])
obj = cp.Minimize(cp.quad_form(x, P) - [1, 2] * x)
prob = cp.Problem(obj, constraints)
prob.solve(), prob.solver_stats.solver_name
(-0.24999999999999453, 'OSQP')
So, I would like to see the actual arguments (P, q, A, l, u) being sent to the OSQP solver https://github.com/oxfordcontrol/osqp-python/blob/master/module/interface.py#L278
Any help is greatly appreciated!
From looking at the documentation, it seems you can do this using the command get_problem_data as follows:
data, chain, inverse_data = prob.get_problem_data(prob.solver_stats.solver_name)
I have not tried it, and it says it output depends on the particular solver and the solver chain, but it may help you!

Julia ReverseDiff: how to take a gradient w.r.t. only a subset of inputs?

In my data flow, I'm querying a small subset of a database, using those results to construct about a dozen arrays, and then, given some parameter values, computing a likelihood value. Then repeating for a subset of the database. I want to compute the gradient of the likelihood function with respect to the parameters but not the data. But ReverseDiff computes the gradient with respect to all inputs. How can I get around this? Specifically, how can I construct a ReverseDiff.Tape object
TL;DR: How to marry stochastic gradient descent and ReverseDiff? (I'm not wedded to using ReverseDiff. It just seemed like the right tool for the job.)
It seems like this has to be a common coding pattern. It's used all the time in my field. But I'm missing something. Julia's scoping rules seem to undermine the scoped/anonymous function approach, and ReverseDiff is holding on to the original data values in the tape generated instead of using the mutated values.
some sample code of things that don't work
using ReverseDiff
using Base.Test
mutable struct data
X::Array{Float64, 2}
end
const D = data(zeros(Float64, 2, 2))
# baseline known data to compare against
function f1(params)
X = float.([1 2; 3 4])
f2(params, X)
end
# X is data, want derivative wrt to params only
function f2(params, X)
sum(params[1]' * X[:, 1] - (params[1] .* params[2])' * X[:, 2].^2)
end
# store data of interest in D.X so that we can call just f2(params) and get our
# gradient
f2(params) = f2(params, D.X)
# use an inner function and swap out Z's data
function scope_test()
function f2_only_params(params)
f2(params, Z)
end
Z = float.([6 7; 1 3])
f2_tape = ReverseDiff.GradientTape(f2_only_params, [1, 2])
Z[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3,4])
return grad
end
function struct_test()
D.X[:] = float.([6 7; 1 3])
f2_tape = ReverseDiff.GradientTape(f2, [1., 2.])
D.X[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3., 4.])
return grad
end
function struct_test2()
D.X[:] = float.([1 2; 3 4])
f2_tape = ReverseDiff.GradientTape(f2, [3., 4.])
D.X[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3., 4.])
return grad
end
D.X[:] = float.([1 2; 3 4])
#test f1([3., 4.]) == f2([3., 4.], D.X)
#test f1([3., 4.]) == f2([3., 4.])
f1_tape = ReverseDiff.GradientTape(f1, [3,4])
f1_grad = ReverseDiff.gradient!(f1_tape, [3,4])
# fails! uses line 33 values
#test scope_test() == f1_grad
# fails, uses line 42 values
#test struct_test() == f1_grad
# succeeds, so, not completely random
#test struct_test2() == f1_grad
This is currently not possible (sadly). And there is a GitHub issue with the two work-arounds:
https://github.com/JuliaDiff/ReverseDiff.jl/issues/36
either do not use a prerecorded tape
or differentiate relative to all arguments and ignore the gradient for some of the input parameters.
I had the same issue, and I used the grad function of Knet instead. I supports only the differentiation relative to one argument, but this argument can be quite flexible (e.g. an array of arrays or an dict or arrays).
Thanks Alex, your answer was 90% of the way there. AutoGrad (what Knet is using at the time of writing) does provide a very nice interface that is natural I think for most users. However, it turns out that using anonymous functions with ReverseDiff is faster than the approach taken by AutoGrad, for reasons I don't quite understand.
If you follow the chain of issues referenced in what you linked, this seems to be what the ReverseDiff/ForwardDiff folks want people doing:
ReverseDiff.gradient(p -> f(p, non_differentiated_data), params)
Certainly disappointing that we can't get a precompiled tape with this incredibly common usage scenario, and maybe future work will change things. But this seems to be where things stand now.
Some references for those interested in further reading:
https://github.com/JuliaDiff/ForwardDiff.jl/issues/77
https://github.com/JuliaDiff/ForwardDiff.jl/issues/32
https://github.com/JuliaDiff/ForwardDiff.jl/pull/182

Create a List and Use it in Loss Function Tensorflow

I am trying to create a list based on my neural network outputs and use it in Tensorflow as a loss function.
Assume that results is list of size [1, batch_size] that is output by a neural network. I check to see whether the first value of this list is in a specific range passed in as a placeholder called valid_range, and if it is add 1 to a list. If it is not, add -1. The goal is to make all predictions of the network in the range, so the correct predictions is a tensor of all 1, which I call correct_predictions.
values_list = []
for j in range(batch_size):
a = results[0, j] >= valid_range[0]
b = result[0, j] <= valid_range[1]
c = tf.logical_and(a, b)
if (c == 1):
values_list.append(1)
else:
values_list.append(-1.)
values_list_tensor = tf.convert_to_tensor(values_list)
correct_predictions = tf.ones([batch_size, ], tf.float32)
Now, I want to use this as a loss function in my network, so that I can force all the predictions to be in the specified range. I try to train like this:
loss = tf.reduce_mean(tf.squared_difference(values_list_tensor, correct_predictions))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, gradient_clip_threshold)
optimize = optimizer.apply_gradients(zip(gradients, variables))
This, however, has a problem and throws an error on the last optimize line, saying:
ValueError: No gradients provided for any variable: ['<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d4afd0>',
'<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d66050>'
...
I tried to debug this in Tensorboard, and I notice that the list I am creating does not appear in the graph, so basically the x part of the loss function is not part of the network itself. Is there some way to accurately create a list based on the predictions of a neural network and use it in the loss function in Tensorflow to train the network?
Please help, I have been stuck on this for a few days now.
Edit:
Following what was suggested in the comments, I decided to use a l2 loss function, multiplying it by the binary vector I had from before values_list_tensor. The binary vector now has values 1 and 0 instead of 1 and -1. This way when the prediction is in the range the loss is 0, else it is the normal l2 loss. As I am unable to see the values of the tensors, I am not sure if this is correct. However, I can view the final loss and it is always 0, so something is wrong here. I am unsure if the multiplication is being done correctly and if values_list_tensor is calculated accurately? Can someone help and tell me what could be wrong?
loss = tf.reduce_mean(tf.nn.l2_loss(tf.matmul(tf.transpose(tf.expand_dims(values_list_tensor, 1)), tf.expand_dims(result[0, :], 1))))
Thanks
To answer the question in the comment. One way to write a piece-wise function is using tf.cond. For example, here is a function that returns 0 in [-1, 1] and x everywhere else:
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32)
y = tf.cond(tf.logical_or(tf.greater(x, 1.0), tf.less(x, -1.0)), lambda : x, lambda : 0.0)
y.eval({x: 1.5}) # prints 1.5
y.eval({x: 0.5}) # prints 0.0

Changing Variable values in Tensorflow and evaluating the cost function

I need to be able to plot e.g. the cost function values as a function of some parameter (for example the bias b below). If e.g. my graph is something like (pseudocode)
y = g(W x + b),
cost = sum(y ** 2),
where W and b are tf.Variables, I'd like to change b from, say 0 to 1 and plot the values of cost.
Please note that I do not want to call eval or sesssion.run after each change of b because of the overhead! E.g. for 100 plot points that would take forever.
I know of the existence of tf.assign, but doing something like [assign, cost, assign, cost, ...] and evaluating that doesn't seem to work
I guess I could update the value of b inside the graph and call cost after each update, but I wouldn't really want to change the graph
So how could I do this in an efficient manner? Thank you in advance!
EDIT: actually this is probably impossible to do without calling eval/run between the iterations... oh well...
In tensor-flow if you use variables you can only evaluate them only after an initialization. So you cannot probably evaluate them without a session.
but you can change the parameters like the following way
import tensorflow as tf
my_var = tf.Variable(10)
with tf.Session() as sess:
sess.run(my_var.initializer)
print(sess.run(my_var.assign_sub(2))) #>> 8
print(sess.run(my_var.assign_sub(2))) #>> 6
This sounds like a use case for feeding a different value at each step. Assuming b is a scalar variable, you could code your loop with something like the following:
import numpy as np
sess = tf.Session()
# Vary `b_val` from 0 to 1 in 100 steps.
for b_val in np.linspace(0, 1, 100):
# Evaluate `cost` using `b = b_val`.
cost_val = sess.run(cost, feed_dict={b: b_val})
# Do something with `cost_val`....

Resources