Tensorflow - Using tf.losses.hinge_loss causing Shapes Incompatible error - machine-learning

My current code using sparse_softmax_cross_entropy works fine.
loss_normal = (
tf.reduce_mean(tf.losses
.sparse_softmax_cross_entropy(labels=labels,
logits=logits,
weights=class_weights))
)
However, when I try to use the hinge_loss:
loss_normal = (
tf.reduce_mean(tf.losses
.hinge_loss(labels=labels,
logits=logits,
weights=class_weights))
)
It reported an error saying:
ValueError: Shapes (1024, 2) and (1024,) are incompatible
The error seems to be originated from this function in the losses_impl.py file:
with ops.name_scope(scope, "hinge_loss", (logits, labels)) as scope:
...
logits.get_shape().assert_is_compatible_with(labels.get_shape())
...
I modified my code as below to just extract 1 column of the logits tensor:
loss_normal = (
tf.reduce_mean(tf.losses
.hinge_loss(labels=labels,
logits=logits[:,1:],
weights=class_weights
))
)
But it still reports a similar error:
ValueError: Shapes (1024, 1) and (1024,) are incompatible.
Can someone please help point out why my code works fine with sparse_softmax_cross_entropy loss but not hinge_loss?

The tensor labels has the shape [1024], the tensor logits has [1024, 2] shape. This works fine for tf.nn.sparse_softmax_cross_entropy_with_logits:
labels: Tensor of shape [d_0, d_1, ..., d_{r-1}] (where r is rank of
labels and result) and dtype int32 or int64. Each entry in labels must
be an index in [0, num_classes). Other values will raise an exception
when this op is run on CPU, and return NaN for corresponding loss and
gradient rows on GPU.
logits: Unscaled log probabilities of shape
[d_0, d_1, ..., d_{r-1}, num_classes] and dtype float32 or float64.
But tf.hinge_loss requirements are different:
labels: The ground truth output tensor. Its shape should match the
shape of logits. The values of the tensor are expected to be 0.0 or
1.0.
logits: The logits, a float tensor.
You can resolve this in two ways:
Reshape the labels to [1024, 1] and use just one row of logits, like you did - logits[:,1:]:
labels = tf.reshape(labels, [-1, 1])
hinge_loss = (
tf.reduce_mean(tf.losses.hinge_loss(labels=labels,
logits=logits[:,1:],
weights=class_weights))
)
I think you'll also need to reshape the class_weights the same way.
Use all of learned logits features via tf.reduce_sum, which will make a flat (1024,) tensor:
logits = tf.reduce_sum(logits, axis=1)
hinge_loss = (
tf.reduce_mean(tf.losses.hinge_loss(labels=labels,
logits=logits,
weights=class_weights))
)
This way you don't need to reshape labels or class_weights.

Related

How to set target in cross entropy loss for pytorch multi-class problem

Problem Statement: I have an image and a pixel of the image can belong to only(either) one of Band5','Band6', 'Band7' (see below for details). Hence, I have a pytorch multi-class problem but I am unable to understand how to set the targets which needs to be in form [batch, w, h]
My dataloader return two values:
x = chips.loc[:, :, :, self.input_bands]
y = chips.loc[:, :, :, self.output_bands]
x = x.transpose('chip','channel','x','y')
y_ohe = y.transpose('chip','channel','x','y')
Also, I have defined:
input_bands = ['Band1','Band2', 'Band3', 'Band3', 'Band4'] # input classes
output_bands = ['Band5','Band6', 'Band7'] #target classes
model = ModelName(num_classes = 3, depth=default_depth, in_channels=5, merge_mode='concat').to(device)
loss_new = nn.CrossEntropyLoss()
In my training function:
#get values from dataloader
X = normalize_zero_to_one(X) #input
y = normalize_zero_to_one(y) #target
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
optim.zero_grad()
outputs = model(images)
loss = loss_new(outputs, masks) # (preds, target)
loss.backward()
optim.step() # Update weights
I know the the target (here masks) should be [batch_size, w, h]. However, it is currently [batch_size, channels, w, h].
I read a lot of posts including 1, 2 and they say the target should only contain the target class indices. I don't understand how can I concatenate indices of three classes and still set target as [batch_size, w, h].
Right now, I get the error:
RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4
To the best of my understanding, I don't need to do any one hot encoding. Similar errors and explanation I found on the internet are here:'
Reference 1
Reference 2
Reference 3
Reference 4
Any help will be appreciated! Thank you.
If I understand correctly, your current "target" is [batch_size, channels, w, h] with channels==3 as you have three possible targets.
What are the values in your target represent? You basically have a 3-vector target for each pixel - are these the expected class probabilities? Are they "one-hot-vectors" indicating the correct "band"?
If so, you can get the target indices by simply taking the argmax along the target channel dimension:
proper_target = torch.argmax(masks, dim=1) # make sure keepdim=False
loss = loss_new(outputs, proper_target)

How do you implement softmax with Tensorflow.JS

Using Tensorflow.JS, I am trying to get a machine learning model running with a last dense layer using a softmax activation function. When I try to run it, I receive:
Error when checking target: expected dense_Dense2 to have shape [,1], but got array with shape [3,2].
If I comment out the fit function and run it, I get back a 1x2 array (as expected since I have 2 units on the last layer and I am only entering in one test.
Additionally, when I alter the y variables to this array: [[1,2,3]], it trains (but I don't think this is correct since the ys are not the correct shape of the last layer (2).
Any advice or help would be appreciated to fill in my gap of knowledge.
var tf = require('#tensorflow/tfjs');
let xs = tf.tensor([
[1,2],
[2,1],
[0,0]
]);
let ys = tf.tensor([
[1,2],
[2,1],
[1,1]
]);
async function createModel () {
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [2], units: 32, activation: "relu"}));
model.add(tf.layers.dense({units: 2}));
model.compile({loss: "sparseCategoricalCrossentropy",optimizer:"adam"});
//await model.fit(xs, ys, {epochs:1000});
model.predict(tf.tensor([[1,2]])).print();
}
createModel();
Here is the softmax activation on the last layer:
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [2], units: 32, activation: "relu"}));
model.add(tf.layers.dense({units: 2, activation:'softmax'}));
model.compile({loss: "sparseCategoricalCrossentropy",optimizer:"adam"});
model.predict(tf.tensor([[1,2]])).print();
For the error:
Error when checking target: expected dense_Dense2 to have shape [,1], but got array with shape [3,2].
You can consider the answer given here
Your error is related to the loss function sparseCategoricalCrossentropy which expect labels as tensor1d. If you change this loss function to categoricalCrossentropy, it will work. Both losses do the same thing, it is the way the labels are encoded that is different. But as it is in the question the labels are neither encoded for categoricalCrossentropy nor sparseCategoricalCrossentropy.
When using sparseCategoricalCrossentropy, the labels are a 1d tensor where the value are the index of the category
When using categoricalCrossentropy, the labels are a 2d tensor aka one-hot encoding

How does binary cross entropy loss work on autoencoders?

I wrote a vanilla autoencoder using only Dense layer.
Below is my code:
iLayer = Input ((784,))
layer1 = Dense(128, activation='relu' ) (iLayer)
layer2 = Dense(64, activation='relu') (layer1)
layer3 = Dense(28, activation ='relu') (layer2)
layer4 = Dense(64, activation='relu') (layer3)
layer5 = Dense(128, activation='relu' ) (layer4)
layer6 = Dense(784, activation='softmax' ) (layer5)
model = Model (iLayer, layer6)
model.compile(loss='binary_crossentropy', optimizer='adam')
(trainX, trainY), (testX, testY) = mnist.load_data()
print ("shape of the trainX", trainX.shape)
trainX = trainX.reshape(trainX.shape[0], trainX.shape[1]* trainX.shape[2])
print ("shape of the trainX", trainX.shape)
model.fit (trainX, trainX, epochs=5, batch_size=100)
Questions:
1) softmax provides probability distribution. Understood. This means, I would have a vector of 784 values with probability between 0 and 1. For example [ 0.02, 0.03..... upto 784 items], summing all 784 elements provides 1.
2) I don't understand how the binary crossentropy works with these values. Binary cross entropy is for two values of output, right?
In the context of autoencoders the input and output of the model is the same. So, if the input values are in the range [0,1] then it is acceptable to use sigmoid as the activation function of last layer. Otherwise, you need to use an appropriate activation function for the last layer (e.g. linear which is the default one).
As for the loss function, it comes back to the values of input data again. If the input data are only between zeros and ones (and not the values between them), then binary_crossentropy is acceptable as the loss function. Otherwise, you need to use other loss functions such as 'mse' (i.e. mean squared error) or 'mae' (i.e. mean absolute error). Note that in the case of input values in range [0,1] you can use binary_crossentropy, as it is usually used (e.g. Keras autoencoder tutorial and this paper). However, don't expect that the loss value becomes zero since binary_crossentropy does not return zero when both prediction and label are not either zero or one (no matter they are equal or not). Here is a video from Hugo Larochelle where he explains the loss functions used in autoencoders (the part about using binary_crossentropy with inputs in range [0,1] starts at 5:30)
Concretely, in your example, you are using the MNIST dataset. So by default the values of MNIST are integers in the range [0, 255]. Usually you need to normalize them first:
trainX = trainX.astype('float32')
trainX /= 255.
Now the values would be in range [0,1]. So sigmoid can be used as the activation function and either of binary_crossentropy or mse as the loss function.
Why binary_crossentropy can be used even when the true label values (i.e. ground-truth) are in the range [0,1]?
Note that we are trying to minimize the loss function in training. So if the loss function we have used reaches its minimum value (which may not be necessarily equal to zero) when prediction is equal to true label, then it is an acceptable choice. Let's verify this is the case for binray cross-entropy which is defined as follows:
bce_loss = -y*log(p) - (1-y)*log(1-p)
where y is the true label and p is the predicted value. Let's consider y as fixed and see what value of p minimizes this function: we need to take the derivative with respect to p (I have assumed the log is the natural logarithm function for simplicity of calculations):
bce_loss_derivative = -y*(1/p) - (1-y)*(-1/(1-p)) = 0 =>
-y/p + (1-y)/(1-p) = 0 =>
-y*(1-p) + (1-y)*p = 0 =>
-y + y*p + p - y*p = 0 =>
p - y = 0 => y = p
As you can see binary cross-entropy have the minimum value when y=p, i.e. when the true label is equal to predicted label and this is exactly what we are looking for.

Create a List and Use it in Loss Function Tensorflow

I am trying to create a list based on my neural network outputs and use it in Tensorflow as a loss function.
Assume that results is list of size [1, batch_size] that is output by a neural network. I check to see whether the first value of this list is in a specific range passed in as a placeholder called valid_range, and if it is add 1 to a list. If it is not, add -1. The goal is to make all predictions of the network in the range, so the correct predictions is a tensor of all 1, which I call correct_predictions.
values_list = []
for j in range(batch_size):
a = results[0, j] >= valid_range[0]
b = result[0, j] <= valid_range[1]
c = tf.logical_and(a, b)
if (c == 1):
values_list.append(1)
else:
values_list.append(-1.)
values_list_tensor = tf.convert_to_tensor(values_list)
correct_predictions = tf.ones([batch_size, ], tf.float32)
Now, I want to use this as a loss function in my network, so that I can force all the predictions to be in the specified range. I try to train like this:
loss = tf.reduce_mean(tf.squared_difference(values_list_tensor, correct_predictions))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, gradient_clip_threshold)
optimize = optimizer.apply_gradients(zip(gradients, variables))
This, however, has a problem and throws an error on the last optimize line, saying:
ValueError: No gradients provided for any variable: ['<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d4afd0>',
'<tensorflow.python.training.optimizer._RefVariableProcessor object at 0x7f0245d66050>'
...
I tried to debug this in Tensorboard, and I notice that the list I am creating does not appear in the graph, so basically the x part of the loss function is not part of the network itself. Is there some way to accurately create a list based on the predictions of a neural network and use it in the loss function in Tensorflow to train the network?
Please help, I have been stuck on this for a few days now.
Edit:
Following what was suggested in the comments, I decided to use a l2 loss function, multiplying it by the binary vector I had from before values_list_tensor. The binary vector now has values 1 and 0 instead of 1 and -1. This way when the prediction is in the range the loss is 0, else it is the normal l2 loss. As I am unable to see the values of the tensors, I am not sure if this is correct. However, I can view the final loss and it is always 0, so something is wrong here. I am unsure if the multiplication is being done correctly and if values_list_tensor is calculated accurately? Can someone help and tell me what could be wrong?
loss = tf.reduce_mean(tf.nn.l2_loss(tf.matmul(tf.transpose(tf.expand_dims(values_list_tensor, 1)), tf.expand_dims(result[0, :], 1))))
Thanks
To answer the question in the comment. One way to write a piece-wise function is using tf.cond. For example, here is a function that returns 0 in [-1, 1] and x everywhere else:
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32)
y = tf.cond(tf.logical_or(tf.greater(x, 1.0), tf.less(x, -1.0)), lambda : x, lambda : 0.0)
y.eval({x: 1.5}) # prints 1.5
y.eval({x: 0.5}) # prints 0.0

How to repeat an unknown dimension in Keras for both backends

For example (I can do this with Theano without a problem):
# log_var has shape --> (num, )
# Mean has shape --> (?, num)
std_var = T.repeat(T.exp(log_var)[None, :], Mean.shape[0], axis=0)
With TensorFlow I can do this:
std_var = tf.tile(tf.reshape(tf.exp(log_var), [1, -1]), (tf.shape(Mean)[0], 1))
But I don't know how to do the same for Keras, may be like this:
std_var = K.repeat(K.reshape(K.exp(log_var), [1, -1]), Mean.get_shape()[0])
or
std_var = K.repeat_elements(K.exp(log_var), Mean.get_shape()[0], axis=0)
... because Mean has unknown dimension at axis 0.
I need this for a custom layer output:
return K.concatenate([Mean, Std], axis=1)
Keras has an abstraction layer keras.backend which you seem to have found already (you refer to it as K). This layer provides all the functions for both Theano and TensorFlow you will need.
Say your TensorFlow code works, which is
std_var = tf.tile(tf.reshape(tf.exp(log_var), [1, -1]), (tf.shape(Mean)[0], 1))
then you can translate it to the abstract version by writing it like this:
std_var = K.tile(K.reshape(K.exp(log_var), (1, -1)), K.shape(Mean)[0])
Both Theano and TensorFlow support the unknown axis syntax (-1 for unknown axes) so this is not a problem.
On a side note I am not sure whether your TF code is correct though. You reshape to (1, -1), meaning that the dimension of axis 0 will be 1. I think what you rather want to do is to do this:
std_var = K.tile(K.reshape(K.exp(log_var), (-1, num)), K.shape(Mean)[0])

Resources