How to keep input and output shape consistent after applying conv2d and convtranspose2d to image data?

How to keep input and output shape consistent after applying conv2d and convtranspose2d to image data? - machine-learning

I'm using Pytorch to experiment image segmentation task. I found input and output shape are often inconsistent after applying Conv2d() and Convtranspose2d() to my image data of shape [1,1,height,width]). How to fix it the issue for arbitrary height and width?
Best regards
import torch
data = torch.rand(1,1,16,26)
a = torch.nn.Conv2d(1,1,kernel_size=3, stride=2)
b = a(data)
print(b.shape)
c = torch.nn.ConvTranspose2d(1,1,kernel_size=3, stride=2)
d = c(b)
print(d.shape) # torch.Size([1, 1, 15, 25])

TLDR; Given the same parameters nn.ConvTranspose2d is not the invert operation of nn.Conv2d in terms of dimension shape conservation.
From an input with spatial dimension x_in, nn.Conv2d will output a tensor with respective spatial dimension x_out:
x_out = [(x_in + 2p - d*(k-1) - 1)/s + 1]
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, and s the stride.
In your case: k=3, s=2, while other parameters default to p=0 and d=1. In other words x_out = [(x_in - 3)/2 + 1]. So given x_in=16, you get x_out = [7.5] = 7.
On the other hand, we have for nn.ConvTranspose2d:
x_out = (x_in-1)*s - 2p + d*(k-1) + op + 1
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, s the stride, and op the output padding.
In your case: k=3, s=2, while other parameters default to p=0, d=1, and op=0. You get x_out = (x_in-1)*2 + 3. So given x_in=7, you get x_out = 15.
However, if you apply an output padding on your transpose convolution, you will get the desired shape:
>>> conv = nn.Conv2d(1,1, kernel_size=3, stride=2)
>>> convT = nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
>>> convT(conv(data)).shape
torch.Size([1, 1, 16, 26])

Related

Loss for Multi-label Classification

I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.
My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)

It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
class Dataset(data.Dataset):
def __init__(self):
super().__init__()
def __len__(self):
return 100
def __getitem__(self, index):
i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
x = torch.rand(i, EMB_SIZE)
y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))
# pad data to fit in batch
pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
x_padded = torch.cat((pad, x))
y_padded = torch.cat((pad, y))
# construct tensor to mask loss
mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))
return x_padded, y_padded, mask
Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.
Notice how:
x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.
Inference
The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.
Therefore, in the training loop, we simply have to apply the mask to our loss.
for x, y, mask in dl:
y_pred = model(x)
loss = mask*bce(y_pred, y)
# backpropagation, loss postprocessing, logs, etc.

This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/#aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just tf.nn.weighted_cross_entropy_with_logits, but you need to set the weight.
The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the y_pred , element-wise, this will give you the modified y_pred . After that, you can use any accuracy available function to calculate the final result.

if you can use the CROSSENTROPYLOSS instead of BCEWithLogitsLoss there is something called ignore_index. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax vs sigmoid). but I think you can still use the CROSSENTROPYLOSSfor binary classification as well.

Optimizing filter sizes of CNN with Optuna

I have created a CNN for classification of three classes based on input images of size 39 x 39. I'm optimizing the parameters of the network using Optuna. For Optuna I'm defining the following parameters to optimize:
num_blocks = trial.suggest_int('num_blocks', 1, 4)
num_filters = [int(trial.suggest_categorical("num_filters", [32, 64, 128, 256]))]
kernel_size = trial.suggest_int('kernel_size', 2, 7)
num_dense_nodes = trial.suggest_categorical('num_dense_nodes', [64, 128, 256, 512, 1024])
dense_nodes_divisor = trial.suggest_categorical('dense_nodes_divisor', [1, 2, 4, 8])
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
drop_out = trial.suggest_discrete_uniform('drop_out', 0.05, 0.5, 0.05)
lr = trial.suggest_loguniform('lr', 1e-6, 1e-1)
dict_params = {'num_blocks': num_blocks,
'num_filters': num_filters,
'kernel_size': kernel_size,
'num_dense_nodes': num_dense_nodes,
'dense_nodes_divisor': dense_nodes_divisor,
'batch_size': batch_size,
'drop_out': drop_out,
'lr': lr}
My network looks as follows:
input_tensor = Input(shape=(39,39,3))
# 1st cnn block
x = Conv2D(filters=dict_params['num_filters'],
kernel_size=dict_params['kernel_size'],
strides=1, padding='same')(input_tensor)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# additional cnn blocks
for i in range(1, dict_params['num_blocks']):
x = Conv2D(filters=dict_params['num_filters']*(2**i), kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# mlp
x = Flatten()(x)
x = Dense(dict_params['num_dense_nodes'], activation='relu')(x)
x = Dropout(dict_params['drop_out'])(x)
x = Dense(dict_params['num_dense_nodes'] // dict_params['dense_nodes_divisor'], activation='relu')(x)
output_tensor = Dense(self.number_of_classes, activation='softmax')(x)
# instantiate and compile model
cnn_model = Model(inputs=input_tensor, outputs=output_tensor)
opt = Adam(lr=dict_params['lr'])
loss = 'categorical_crossentropy'
cnn_model.compile(loss=loss, optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
I'm optimizing (minimizing) the validation loss with Optuna. There is a maximum of 4 blocks in the network and the number of filters is doubled for each block. That means e.g. 64 in the first block, 128 in the second block, 256 in the third and so on. There are two problems. First, when we start with e.g. 256 filters and a total of 4 blocks, in the last block there will be 2048 filters which is too much.
Is it possible to make the num_filters parameter dependent on the num_blocks parameter? That means if there are more blocks, the starting filter size should be smaller. So, for example, if num_blocks is chosen to be 4, num_filters should only be sampled from 32, 64 and 128.
Second, I think it is common to double the filter size but there are also networks with constant filter sizes or two convolutions (with same number of filters) before a max pooling layer (similar to VGG) and so on. Is it possible to adapt the Optuna optimization to cover all these variations?

keras Bidirectional layer using 4 dimension data

I'am designing keras model for classification based on article data.
I have data with 4 dimension as follows
[batch, article_num, word_num, word embedding size]
and i want to feed each (word_num, word embedding) data into keras Bidirectional layer
in order to get result with 3 dimension as follows.
[batch, article_num, bidirectional layer output size]
when i tried to feed 4 dimension data for testing like this
inp = Input(shape=(article_num, word_num, ))
# dims = [batch, article_num, word_num]
x = Reshape((article_num * word_num, ), input_shape = (article_num, word_num))(inp)
# dims = [batch, article_num * word_num]
x = Embedding(word_num, word_embedding_size, input_length = article_num * word_num)(x)
# dims = [batch, article_num * word_num, word_embedding_size]
x = Reshape((article_num , word_num, word_embedding_size),
input_shape = (article_num * word_num, word_embedding_size))(x)
# dims = [batch, article_num, word_num, word_embedding_size]
x = Bidirectional(CuDNNLSTM(50, return_sequences = True),
input_shape=(article_num , word_num, word_embedding_size))(x)
and i got the error
ValueError: Input 0 is incompatible with layer bidirectional_12: expected ndim=3, found ndim=4
how can i achieve this?

If you don't want it to touch the article_num dimension, you can try using a TimeDistributed wrapper. But I'm not certain that it will be compatible with bidirectional and other stuff.
inp = Input(shape=(article_num, word_num))
x = TimeDistributed(Embedding(word_num, word_embedding_size)(x))
#option 1
#x1 shape : (batch, article_num, word_num, 50)
x1 = TimeDistributed(Bidirectional(CuDNNLSTM(50, return_sequences = True)))(x)
#option 2
#x2 shape : (batch, article_num, 50)
x2 = TimeDistributed(Bidirectional(CuDNNLSTM(50)))(x)
Hints:
Don't use input_shape everywhere, you only need it at the Input tensor.
You probably don't need any of the reshapes if you also use a TimeDistributed in the embedding.
If you don't want word_num in the final dimension, use return_sequences=False.

Tensorflow dynamic_rnn TypeError: 'Tensor' object is not iterable

I'm trying to get a basic LSTM working in TensorFlow. I'm receiving the following error:
TypeError: 'Tensor' object is not iterable.
The offending line is:
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)`
I'm using version 1.0.1 on windows 7. My inputs and label have the following shapes
x_shape = (50, 40, 18), y_shape = (50, 40)
Where:
batch size = 50
sequence length = 40
input vector length at each step = 18
I'm building my graph as follows
def build_graph(learn_rate, seq_len, state_size=32, batch_size=5):
# use a fixed sequence length
seqlen = tf.constant(seq_len, shape=[batch_size],dtype=tf.int32)
# Placeholders
x = tf.placeholder(tf.float32, [batch_size, None, 18])
y = tf.placeholder(tf.float32, [batch_size, None])
keep_prob = tf.constant(1.0)
# RNN
cell = tf.contrib.rnn.LSTMCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size],
initializer=tf.constant_initializer(0.0))
init_state = tf.tile(init_state, [batch_size, 1])
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, x, sequence_length=seqlen,
initial_state=init_state,)
# Add dropout, as the model otherwise quickly overfits
rnn_outputs = tf.nn.dropout(rnn_outputs, keep_prob)
# Prediction layer
with tf.variable_scope('prediction'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
preds = tf.tanh(tf.matmul(rnn_outputs, W) + b)
# MSE
loss = tf.square(tf.subtract(y, preds))
# loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y))
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Can anyone tell me what I am missing?

Sequence length should be iterable e.g. a list or tensor, not a scalar. In your case specifically, you need to replace sequence length = 40 with a list of the lengths of each input. For instance, if your first sequence has 10 steps, the second 13 and the third 18, you would pass in [10, 13, 18]. This lets TensorFlow's dynamic RNN know how many steps to unroll for (I believe it uses a while loop internally).

How to interpret predictions in TensorFlow, they seem to have the wrong shape

I have a TensorFlow model with some of these characteristics:
state_size = 800,
num_classes = 14313,
batch_size = 10,
num_steps = 16, # width of the tensor
num_layers = 3
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in
tf.split(x_one_hot, num_steps, 1)] # still a list of tensors (batch_size, num_classes)
...
logits = tf.matmul(rnn_outputs, W) + b
predictions = tf.nn.softmax(logits)
Now I want to feed it a np.array (shape = batch_size x num_steps, so 10 x 16) and I get a predictions tensor back.
Weirdly, its shape is 160 x 14313. The latter is the number of classes. But where does 160 come from? I don't understand that. I would like to have a probability for each of my classes, for each of the elements of the batch (which is 10). How did the num_steps become involved, how to I read from this pred. tensor which is the expected element after those 16 numbers?

In this case the 160 comes from the shape as you suspected.
which means that for each batch of 10, has 16 timesteps, this is technically flattened when you do your shape variable.
At this point you have logits of shape 160 * classes. so you can do predictions[i] for each batch which then will have the probability of each class being the desired class.
which is why to get the chosen class you would do something like tf.argmax(predictions, 1) to get a tensor with the classification
this will have a shape of 160 in your case, so it will be the predicted
class for each one of the batches.
In order to get the probabilities, you could use the logits:
def prob(logit):
return 1/(1 + np.exp(-logit)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to keep input and output shape consistent after applying conv2d and convtranspose2d to image data? - machine-learning

Related

Loss for Multi-label Classification

Optimizing filter sizes of CNN with Optuna

keras Bidirectional layer using 4 dimension data

Tensorflow dynamic_rnn TypeError: 'Tensor' object is not iterable

How to interpret predictions in TensorFlow, they seem to have the wrong shape

Categories

Resources