When I run this code:
x = tf.placeholder(tf.int32, shape=(None, 3))
with tf.Session() as sess:
feed_dict = dict()
feed_dict[x] = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
input = sess.run([x], feed_dict=feed_dict)
I get this error:
Placeholder_2:0 is both fed and fetched.
I'm not sure what I'm doing wrong here. Why does this not work?
Are you sure this code covers what you are trying to achieve?
You ask to read out whatever you pass through. This is not a valid call in tensorflow. If you want to pass through values and do nothing with it (what for?) you should have an identity operation.
x = tf.placeholder(tf.int32, shape=(None, 3))
y = tf.identity(x)
with tf.Session() as sess:
feed_dict = dict()
feed_dict[x] = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
input = sess.run([y], feed_dict=feed_dict)
The problem is "feeding" actually kind of overwrites whatever your op generates, thus you cannot fetch it at this moment (since there is nothing being really produced by this particular op anymore). If you add this identity op, you correctly feed (override x) do nothing with the result (identity) and fetch it (what identity produces, which is whatever you feeded as an output of x)
I found out what I was doing wrong.
x is a placeholder -- it holds information and evaluating x does not do anything. I forgot that vital piece of information and proceeded to attempt to run the Tensor x inside sess.run()
Code similar to this would work if, say, there was another Tensor y that depended on x and I ran that like sess.run([y], feed_dict=feed_dict)
Related
I'm trying to randomly sample a mini-batch from Flux.dataloader rather than iterate through... I already created train_dataloader object, and all I need is random sampling from this object. Can you help me do this? I tried to look up the source code of dataloader in https://github.com/FluxML/Flux.jl/blob/master/src/data/dataloader.jl but I couldn't figure it out. The code below is what I'm trying to do:
train_loader = DataLoader((X, L), batchsize=batch_size, shuffle=true)
(x, l) = train_loader[rand(1:length(train_loader))]
I solved in ad hoc method...
I just created an array that holds batches, and randomly selected batches from that array.
batch_collection = []
for (x, l) in train_loader
push!(batch_collection, (x,l))
end
random_batch = rand(batch_collection, 1)[1]
I have a pretrained model which was saved by
torch.save(net, 'lenet5_mnist_model')
And now I am loading it back and trying to calculate fisher information matrix like this:
precision_matrices = {}
batch_size = 32
my_model = torch.load('lenet5_mnist_model')
my_model.eval() # I tried to comment this off, but still no luck
for n, p in deepcopy({n: p for n, p in my_model.named_parameters()}).items()
p = torch.tensor(p, requires_grad = True)
p.data.zero_()
precision_matrices[n] = variable(p.data)
for idx in range(int(images.shape[0]/batch_size)):
x = images[idx*batch_size : (idx+1)*batch_size]
my_model.zero_grad()
x = Variable(x.cuda(), requires_grad = True)
output = my_model(x).view(1,-1)
label = output.max(1)[1].view(-1)
loss = F.nll_loss(F.log_softmax(output, dim=1), label)
loss = Variable(loss, requires_grad = True)
loss.backward()
for n, p in my_model.named_parameters():
precision_matrices[n].data += p.grad.data**2
Finally, the above code will crash at the last line, because p.grad is NoneType. So the error is:
AttributeError: 'NoneType' object has no attribute 'data'.
Could someone provide some guidance on what caused the NoneType grad for the parameters? How should I fix this?
Your loss does not backpropagate the gradients through the model, because you are creating a new loss tensor with the value of the actual loss, which is a leaf of the computational graph, meaning that there is no history to backpropagate through.
loss.backward() needs to be called on the output of loss = F.nll_loss(F.log_softmax(output, dim=1), label).
I'm assuming that you thought you need to create a tensor with requires_grad=True, to be able to calculate the gradients. That is not the case. Tensors created with requires_grad=True are the leaves of the computational graph (they start the graph) and every operation performed on any tensor that is part of the graph is tracked such that the gradients can flow through the intermediate results to the leaves. Only tensors that need to be optimised (i.e. learnable parameters) should set requires_grad=True manually (the model's parameters do that automatically), everything else regarding the gradients is inferred. Neither x nor the loss are learnable parameters.
This confusion presumably arose due to the use of Variable. It was deprecated in PyTorch 0.4.0, which was released over 2 years ago, and all of its functionality has been merged into the tensors. Please do not use Variable.
x = images[idx*batch_size : (idx+1)*batch_size]
my_model.zero_grad()
x = x.cuda()
output = my_model(x).view(1,-1)
label = output.max(1)[1].view(-1)
loss = F.nll_loss(F.log_softmax(output, dim=1), label)
loss.backward()
I am trying to create a list based on my neural network outputs and use it in Tensorflow as a loss function.
Assume that results is list of size [1, batch_size] that is output by a neural network. I check to see whether the first value of this list is in a specific range passed in as a placeholder called valid_range, and if it is add 1 to a list. If it is not, add -1. The goal is to make all predictions of the network in the range, so the correct predictions is a tensor of all 1, which I call correct_predictions.
values_list = []
for j in range(batch_size):
a = results[0, j] >= valid_range[0]
b = result[0, j] <= valid_range[1]
c = tf.logical_and(a, b)
if (c == 1):
values_list.append(1)
else:
values_list.append(-1.)
values_list_tensor = tf.convert_to_tensor(values_list)
correct_predictions = tf.ones([batch_size, ], tf.float32)
Now, I want to use this as a loss function in my network, so that I can force all the predictions to be in the specified range. I try to train like this:
loss = tf.reduce_mean(tf.squared_difference(values_list_tensor, correct_predictions))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, gradient_clip_threshold)
optimize = optimizer.apply_gradients(zip(gradients, variables))
This, however, has a problem and throws an error on the last optimize line, saying:
ValueError: No gradients provided for any variable: ['<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d4afd0>',
'<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d66050>'
...
I tried to debug this in Tensorboard, and I notice that the list I am creating does not appear in the graph, so basically the x part of the loss function is not part of the network itself. Is there some way to accurately create a list based on the predictions of a neural network and use it in the loss function in Tensorflow to train the network?
Please help, I have been stuck on this for a few days now.
Edit:
Following what was suggested in the comments, I decided to use a l2 loss function, multiplying it by the binary vector I had from before values_list_tensor. The binary vector now has values 1 and 0 instead of 1 and -1. This way when the prediction is in the range the loss is 0, else it is the normal l2 loss. As I am unable to see the values of the tensors, I am not sure if this is correct. However, I can view the final loss and it is always 0, so something is wrong here. I am unsure if the multiplication is being done correctly and if values_list_tensor is calculated accurately? Can someone help and tell me what could be wrong?
loss = tf.reduce_mean(tf.nn.l2_loss(tf.matmul(tf.transpose(tf.expand_dims(values_list_tensor, 1)), tf.expand_dims(result[0, :], 1))))
Thanks
To answer the question in the comment. One way to write a piece-wise function is using tf.cond. For example, here is a function that returns 0 in [-1, 1] and x everywhere else:
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32)
y = tf.cond(tf.logical_or(tf.greater(x, 1.0), tf.less(x, -1.0)), lambda : x, lambda : 0.0)
y.eval({x: 1.5}) # prints 1.5
y.eval({x: 0.5}) # prints 0.0
I wanted to update the parameters of a model manually with pytorch. I made a super simple standard sequential model (full code here) but whenever I try to train my model it does not train unless I create the actual variables explicitly (code for model variables explicitly). So with the sequential model the code looks as follow:
mdl_sgd = torch.nn.Sequential( torch.nn.Linear(D_sgd,1,bias=False) )
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
y_pred = mdl_sgd.forward(X)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## Manually zero the gradients after updating weights
mdl_sgd.zero_grad()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
for W in mdl_sgd.parameters():
#print(W.grad.data)
W.data = W.data - eta*W.grad.data
when I train it it seems that nothing happens. I've tried many things to make this work like wrapping it in a class and putting explicit require_grads=True or change the locations where I make the zero out the gradients etc but nothing seems to work. What I really want/need is to be able to explicitly be able to do the update rule myself (not with optimum). Not sure if thats the reason it doesn't work but the following does work for some reason:
X = poly_kernel_matrix(x_true,Degree_mdl) # maps to the feature space of the model
X = Variable(torch.FloatTensor(X).type(dtype), requires_grad=False)
Y = Variable(torch.FloatTensor(Y).type(dtype), requires_grad=False)
w_init=torch.randn(D_sgd,1).type(dtype)
W = Variable( w_init, requires_grad=True)
...
for i in range(nb_iter):
# Forward pass: compute predicted Y using operations on Variables
batch_xs, batch_ys = get_batch2(X,Y,M,dtype) # [M, D], [M, 1]
## FORWARD PASS
#y_pred = mdl_sgd.forward(X)
y_pred = batch_xs.mm(W)
## LOSS
loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
## BACKARD PASS
loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
## SGD update
W.data = W.data - eta*W.grad.data
## Manually zero the gradients after updating weights
#mdl_sgd.zero_grad()
W.grad.data.zero_()
the reason I know this is because the plot of the regression lines look sensible:
while when I use the torch.nn.Sequential I get:
I am sure its a really newbie question but I am not sure why I can't update the parameters. Does someone know why? I want to be able to update the parameters manually (however I want) and in this case I decided to use SGD to see if I could even update the parameters.
Note I also tried subclassing modules and registering params but it didn't work either. This is the class I built:
class regression_NN(torch.nn.Module):
def __init__(self,w_init):
"""
"""
super(type(self), self).__init__()
# mdl
#self.W = Variable(w_init, requires_grad=True)
#self.W = torch.nn.Parameter( Variable(w_init, requires_grad=True) )
#self.W = torch.nn.Parameter( w_init )
self.W = torch.nn.Parameter( w_init,requires_grad=True )
#self.mod_list = torch.nn.ModuleList([self.W])
def forward(self, x):
"""
"""
y_pred = x.mm(self.W)
return y_pred
All code is:
https://github.com/brando90/simple_regression
I'm relatively new at pytorch so I might have many bad practice...you can correct them if u want but Im mostly concerned that my paremters are not updating even when I try to explicitly register them in a class that inherits from torch.nn.Module.
I also linked to the question from the pytorch official forum: https://discuss.pytorch.org/t/how-does-one-make-sure-that-the-parameters-are-update-manually-in-pytorch-using-modules/6076
Given I have a linear model as the following I would like to get the gradient vector with regards to W and b.
# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.mul(X, W), b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
However if I try something like this where cost is a function of cost(x,y,w,b) and I only want to gradients with respect to w and b:
grads = tf.gradients(cost, tf.all_variable())
My placeholders will also be included (X and Y).
Even if I do get a gradient with [x,y,w,b] how do I know which element in the gradient that belong to each parameter since it is just a list without names to which parameter the derivative has be taken with regards to?
In this question I'm using parts of this code and I build on this question.
Quoting the docs for tf.gradients
Constructs symbolic partial derivatives of sum of ys w.r.t. x in xs.
So, this should work:
dc_dw, dc_db = tf.gradients(cost, [W, b])
Here, tf.gradients() returns the gradient of cost wrt each tensor in the second argument as a list in the same order.
Read tf.gradients for more information.