Related
I'm using Pytorch to experiment image segmentation task. I found input and output shape are often inconsistent after applying Conv2d() and Convtranspose2d() to my image data of shape [1,1,height,width]). How to fix it the issue for arbitrary height and width?
Best regards
import torch
data = torch.rand(1,1,16,26)
a = torch.nn.Conv2d(1,1,kernel_size=3, stride=2)
b = a(data)
print(b.shape)
c = torch.nn.ConvTranspose2d(1,1,kernel_size=3, stride=2)
d = c(b)
print(d.shape) # torch.Size([1, 1, 15, 25])
TLDR; Given the same parameters nn.ConvTranspose2d is not the invert operation of nn.Conv2d in terms of dimension shape conservation.
From an input with spatial dimension x_in, nn.Conv2d will output a tensor with respective spatial dimension x_out:
x_out = [(x_in + 2p - d*(k-1) - 1)/s + 1]
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, and s the stride.
In your case: k=3, s=2, while other parameters default to p=0 and d=1. In other words x_out = [(x_in - 3)/2 + 1]. So given x_in=16, you get x_out = [7.5] = 7.
On the other hand, we have for nn.ConvTranspose2d:
x_out = (x_in-1)*s - 2p + d*(k-1) + op + 1
Where [.] is the whole part function, p the padding, d the dilation, k the kernel size, s the stride, and op the output padding.
In your case: k=3, s=2, while other parameters default to p=0, d=1, and op=0. You get x_out = (x_in-1)*2 + 3. So given x_in=7, you get x_out = 15.
However, if you apply an output padding on your transpose convolution, you will get the desired shape:
>>> conv = nn.Conv2d(1,1, kernel_size=3, stride=2)
>>> convT = nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
>>> convT(conv(data)).shape
torch.Size([1, 1, 16, 26])
I am working on a multi-label classification problem. My gt labels are of shape 14 x 10 x 128, where 14 is the batch_size, 10 is the sequence_length, and 128 is the vector with values 1 if the item in sequence belongs to the object and 0 otherwise.
My output is also of same shape: 14 x 10 x 128. Since, my input sequence was of varying length I had to pad it to make it of fixed length 10. I'm trying to find the loss of the model as follows:
total_loss = 0.0
unpadded_seq_lengths = [3, 4, 5, 7, 9, 3, 2, 8, 5, 3, 5, 7, 7, ...] # true lengths of sequences
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
for data in training_dataloader:
optimizer.zero_grad()
# shape of input 14 x 10 x 128
output = model(data)
batch_loss = 0.0
for batch_idx, sequence in enumerate(output):
# sequence shape is 10 x 128
true_seq_len = unpadded_seq_lengths[batch_idx]
# only keep unpadded gt and predicted labels since we don't want loss to be influenced by padded values
predicted_labels = sequence[:true_seq_len, :] # for example, 3 x 128
gt_labels = gt_labels_padded[batch_idx, :true_seq_len, :] # same shape as above, gt_labels_padded has shape 14 x 10 x 128
# loop through unpadded predicted and gt labels and calculate loss
for item_idx, predicted_labels_seq_item in enumerate(predicted_labels):
# predicted_labels_seq_item and gt_labels_seq_item are 1D vectors of length 128
gt_labels_seq_item = gt_labels[item_idx]
current_loss = criterion(predicted_labels_seq_item, gt_labels_seq_item)
total_loss += current_loss
batch_loss += current_loss
batch_loss.backward()
optimizer.step()
Can anybody please check to see if I'm calculating loss correctly. Thanks
Update:
Is this the correct approach for calculating accuracy metrics?
# batch size: 14
# seq length: 10
for epoch in range(10):
TP = FP = TN = FN = 0.
for x, y, mask in tr_dl:
# mask shape: (10,)
out = model(x) # out shape: (14, 10, 128)
y_pred = (torch.sigmoid(out) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
y_pred = y_pred[mask] # y_pred shape: (14, 10, 10, 128)
y_labels = y[mask] # y_labels shape: (14, 10, 10, 128)
# do I flatten y_pred and y_labels?
y_pred = y_pred.flatten()
y_labels = y_labels.flatten()
for idx, prediction in enumerate(y_pred):
if prediction == 1 and y_labels[idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif prediction == 1 and y_labels[idx] == 0:
FP += 1
elif prediction == 0 and y_labels[idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
It is usually recommended to stick with batch-wise operations and avoid going into single-element processing steps while in the main training loop. One way to handle this case is to make your dataset return padded inputs and labels with additionally a mask that will come useful for loss computation. In other words, to compute the loss term with sequences of varying sizes, we will use a mask instead of doing individual slices.
Dataset
The way to proceed is to make sure you build the mask in the dataset and not in the inference loop. Here I am showing a minimal implementation that you should be able to transfer to your dataset without much hassle:
class Dataset(data.Dataset):
def __init__(self):
super().__init__()
def __len__(self):
return 100
def __getitem__(self, index):
i = random.randint(5, SEQ_LEN) # for demo puporse, generate x with random length
x = torch.rand(i, EMB_SIZE)
y = torch.randint(0, N_CLASSES, (i, EMB_SIZE))
# pad data to fit in batch
pad = torch.zeros(SEQ_LEN-len(x), EMB_SIZE)
x_padded = torch.cat((pad, x))
y_padded = torch.cat((pad, y))
# construct tensor to mask loss
mask = torch.cat((torch.zeros(SEQ_LEN-len(x)), torch.ones(len(x))))
return x_padded, y_padded, mask
Essentially in the __getitem__, we not only pad the input x and target y with zero values, we also construct a simple mask containing the positions of the padded values in the currently processed element.
Notice how:
x_padded, shaped (SEQ_LEN, EMB_SIZE)
y_padded, shaped (SEQ_LEN, N_CLASSES)
mask, shaped (SEQ_LEN,)
are all three tensors which are shape invariant across the dataset, yet mask contains the padding information necessary for us to compute the loss function appropriately.
Inference
The loss you've used nn.BCEWithLogitsLoss, is the correct one since it's a multi-dimensional loss used for binary classification. In other words, you can use it here in this multi-label classification task, considering each one of the 128 logits as an individual binary prediction. Do not use nn.CrossEntropyLoss) as suggested elsewhere, since the softmax will push a single logit (i.e. class), which is the behaviour required for single-label classification tasks.
Therefore, in the training loop, we simply have to apply the mask to our loss.
for x, y, mask in dl:
y_pred = model(x)
loss = mask*bce(y_pred, y)
# backpropagation, loss postprocessing, logs, etc.
This is what you need for the first part of the question, there are already loss functions implemented in tensorflow: https://medium.com/#aadityaura_26777/the-loss-function-for-multi-label-and-multi-class-f68f95cae525. Yours is just tf.nn.weighted_cross_entropy_with_logits, but you need to set the weight.
The second part of the question is not straightforward, because there's conditioning on the IOU, generally, when you do machine learning, you should heavily depend on matrix multiplication, in your case, you probably need to pre-calculate the IOU -> 1 or 0 as a vector, then multiply with the y_pred , element-wise, this will give you the modified y_pred . After that, you can use any accuracy available function to calculate the final result.
if you can use the CROSSENTROPYLOSS instead of BCEWithLogitsLoss there is something called ignore_index. you can use it to exclude your padded sequences. the difference between the 2 losses is the activation function used (softmax vs sigmoid). but I think you can still use the CROSSENTROPYLOSSfor binary classification as well.
I have created a CNN for classification of three classes based on input images of size 39 x 39. I'm optimizing the parameters of the network using Optuna. For Optuna I'm defining the following parameters to optimize:
num_blocks = trial.suggest_int('num_blocks', 1, 4)
num_filters = [int(trial.suggest_categorical("num_filters", [32, 64, 128, 256]))]
kernel_size = trial.suggest_int('kernel_size', 2, 7)
num_dense_nodes = trial.suggest_categorical('num_dense_nodes', [64, 128, 256, 512, 1024])
dense_nodes_divisor = trial.suggest_categorical('dense_nodes_divisor', [1, 2, 4, 8])
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64, 128])
drop_out = trial.suggest_discrete_uniform('drop_out', 0.05, 0.5, 0.05)
lr = trial.suggest_loguniform('lr', 1e-6, 1e-1)
dict_params = {'num_blocks': num_blocks,
'num_filters': num_filters,
'kernel_size': kernel_size,
'num_dense_nodes': num_dense_nodes,
'dense_nodes_divisor': dense_nodes_divisor,
'batch_size': batch_size,
'drop_out': drop_out,
'lr': lr}
My network looks as follows:
input_tensor = Input(shape=(39,39,3))
# 1st cnn block
x = Conv2D(filters=dict_params['num_filters'],
kernel_size=dict_params['kernel_size'],
strides=1, padding='same')(input_tensor)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# additional cnn blocks
for i in range(1, dict_params['num_blocks']):
x = Conv2D(filters=dict_params['num_filters']*(2**i), kernel_size=dict_params['kernel_size'], strides=1, padding='same')(x)
x = BatchNormalization()(x, training=training)
x = Activation('relu')(x)
x = MaxPooling2D(padding='same')(x)
x = Dropout(dict_params['drop_out'])(x)
# mlp
x = Flatten()(x)
x = Dense(dict_params['num_dense_nodes'], activation='relu')(x)
x = Dropout(dict_params['drop_out'])(x)
x = Dense(dict_params['num_dense_nodes'] // dict_params['dense_nodes_divisor'], activation='relu')(x)
output_tensor = Dense(self.number_of_classes, activation='softmax')(x)
# instantiate and compile model
cnn_model = Model(inputs=input_tensor, outputs=output_tensor)
opt = Adam(lr=dict_params['lr'])
loss = 'categorical_crossentropy'
cnn_model.compile(loss=loss, optimizer=opt, metrics=['accuracy', tf.keras.metrics.AUC()])
I'm optimizing (minimizing) the validation loss with Optuna. There is a maximum of 4 blocks in the network and the number of filters is doubled for each block. That means e.g. 64 in the first block, 128 in the second block, 256 in the third and so on. There are two problems. First, when we start with e.g. 256 filters and a total of 4 blocks, in the last block there will be 2048 filters which is too much.
Is it possible to make the num_filters parameter dependent on the num_blocks parameter? That means if there are more blocks, the starting filter size should be smaller. So, for example, if num_blocks is chosen to be 4, num_filters should only be sampled from 32, 64 and 128.
Second, I think it is common to double the filter size but there are also networks with constant filter sizes or two convolutions (with same number of filters) before a max pooling layer (similar to VGG) and so on. Is it possible to adapt the Optuna optimization to cover all these variations?
I have issue with constructing Input data for BiRNN network.
I'm create License Plate detection system like described : https://arxiv.org/pdf/1601.05610v1.pdf
I have got to "4.2.3 Sequence Labelling" part where I need to train BiRNN with dataset of (total_count_of_images, None, 256) shape, None because it's length of image and and it is different for every picture in data set.
Let's say I have 3000 Images. Then shape would look like :
train.shape : (3000,) but really it is (3000, None, 256) !?
So I got example code from
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/bidirectional_rnn.ipynb
So I'm struggling even with starting to train my RNN. I don't understand how I need to constrct input data/model, input placeholders, variables etc to achieve any training process.
As far as I know everything should work. My code :
reset_graph()
'''
Dataset : (10000, 784)
Labels : (10000, 10)
To classify images using a bidirectional reccurent neural network, we consider
every image row as a sequence of pixels. Because MNIST image shape is 28*28px,
we will then handle 28 sequences of 28 steps for every sample.
'''
# Parameters
learning_rate = 0.001
training_iters = 100 # 100000
display_step = 10
batch_size = 40
# Network Parameters
n_input = 256 # data inpit size/256D
n_steps = 256 # timesteps
n_hidden = 200 # hidden layer num of features
n_classes = 36 # MNIST total classes (0-9 digits and a-z letters)
# tf Graph input
x = tf.placeholder("float", [batch_size, None , n_input], name='input_placeholder')
y = tf.placeholder("float", [batch_size, None, n_classes], name='labels_placeholder')
# Define weights
weights = {
# Hidden layer weights => 2*n_hidden because of foward + backward cells
'out': tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
}
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
}
def BiRNN(x, weights, biases):
print('Input x',x.get_shape().as_list())
print('weights[\'out\']', weights['out'].get_shape().as_list())
print('biases[\'out\']', biases['out'].get_shape().as_list())
# Prepare data shape to match `bidirectional_rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
#x = tf.transpose(x, [1, 0, 2])
#print('Transposed x',x.get_shape().as_list())
# Reshape to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_steps])
print('Reshaped x',x.get_shape().as_list())
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_input, x)
print(len(x),'of [ ',x[0],' ] kinds')
# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
# Backward direction cell
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
# Get lstm cell output
outputs, _, _ = rnn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32)
print( len(outputs),'of [ ',outputs[0],' ] kinds' )
# Linear activation, using rnn inner loop last output
ret = tf.matmul(outputs[-1], weights['out']) + biases['out']
print('ret', ret.get_shape().as_list())
return ret
pred = BiRNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
OUTPUT :
Input x [40, None, 256]
weights['out'] [400, 36]
biases['out'] [36]
Reshaped x [None, 256]
256 of [ Tensor("split:0", shape=(?, 256), dtype=float32) ] kinds
256 of [ Tensor("concat:0", shape=(?, 400), dtype=float32) ] kinds
ret [None, 36]
Everything just right there.
Problems start at session part :
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
batch_data = batch_gen(batch_size)
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = next(batch_data)
print(batch_x.shape)
print(batch_y.shape)
#m[:,0, None, None].shape
#Run optimization op (backprop)
print('Optimizer')
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
print('Display')
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print("Iter " + str(step * batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print("Optimization Finished!")
# Calculate accuracy for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: test_data, y: test_label}))
There I got following error :
(40,)
(40,)
Optimizer
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-96-a53814db8181> in <module>()
14 #Run optimization op (backprop)
15 print('Optimizer')
---> 16 sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
17
18 if step % display_step == 0:
/home/nauris/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
715 try:
716 result = self._run(None, fetches, feed_dict, options_ptr,
--> 717 run_metadata_ptr)
718 if run_metadata:
719 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/home/nauris/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
886 ' to a larger type (e.g. int64).')
887
--> 888 np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
889
890 if not subfeed_t.get_shape().is_compatible_with(np_val.shape):
/home/nauris/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
480
481 """
--> 482 return array(a, dtype, copy=False, order=order)
483
484 def asanyarray(a, dtype=None, order=None):
ValueError: setting an array element with a sequence.
Any help would be highly appreciated. Thanks everyone in advance.
Realized that error occurs because you cannot feed numpy ndarray with inconsistent dimensions such as (3000, None, 256) in my case. Haven't found any solution yet.
I am training a convolutional neural network using TensorFlow to classify images of buildings into 5 classes.
Training dataset:
Class 1 - 3000 images
Class 2 - 3000 images
Class 3 - 3000 images
Class 4 - 3000 images
Class 5 - 3000 images
I started out with a very simple architecture:
Input image - 256 x 256 x 3
Convolutional layer 1 - 128 x 128 x 16 (3x3 filters, 16 filters, stride=2)
Convolutional layer 2 - 64 x 64 x 32 (3x3 filters, 32 filters, stride=2)
Convolutional layer 3 - 32 x 32 x 64 (3x3 filters, 64 filters, stride=2)
Max-pooling layer - 16 x 16 x 64 (2x2 pooling)
Fully-connected layer 1 - 1 x 1024
Fully-connected layer 2 - 1 x 64
Output - 1 x 5
Other details of my network:
Cost-function: tf.softmax_cross_entropy_with_logits
Optimizer: Adam optimizer (Learning rate=0.01, Epsilon=0.1)
Mini-batch size: 5
My cost-function has a high starting value of around 10^10 and then drops rapidly to a value of about 1.6 (after a few hundred iterations) and saturates at that value (no matter how long I train the network for). The cost-function value on the test set is the same. This value is equivalent to predicting approximately equal probability for each class and it makes the same predictions for all images. My predictions look something like this:
[0.191877 0.203651 0.194455 0.200043 0.203081]
A high error on both the training and test set indicate high bias i.e. underfitting. I increased the complexity of my network by adding layers and increasing the number of filters and my latest network is this (the number of layers and filter sizes are similar to AlexNet):
Input image - 256 x 256 x 3
Convolutional layer 1 - 64 x 64 x 64 (11x11 filters, 64 filters, stride=4)
Convolutional layer 2 - 32 x 32 x 128 (5x5 filters, 128 filters, stride=2)
Convolutional layer 3 - 16 x 16 x 256 (3x3 filters, 256 filters, stride=2)
Convolutional layer 4 - 8 x 8 x 512 (3x3 filters, 512 filters, stride=2)
Convolutional layer 5 - 8 x 8 x 256 (3x3 filters, 256 filters, stride=1)
Fully-connected layer 1 - 1 x 4096
Fully-connected layer 2 - 1 x 4096
Fully-connected layer 3 - 1 x 4096
Dropout layer (0.5 probability)
Output - 1 x 5
However, my cost-function is still saturating at approximately 1.6 and making the same predictions.
My questions are:
What other solutions should I try to fix a high bias network? I have (and still am) trying different learning rates and initialisation of weights - but to no avail.
Is it because my training set is too small? Wouldn't a small training set lead to a high variance network? It would overfit to the training images and have low training error, but high test error.
Is it possible that there are no distinguishable features in these images? However, considering the fact that other CNNs can distinguish between breeds of dogs, this does not seem likely.
As a sanity check, I am training my network on a very small dataset (50 images) and I am expecting it to overfit. However, it doesn't look like it is going to; it looks like the same problem is going to occur.
Code:
import tensorflow as tf
sess = tf.Session()
BATCH_SIZE = 50
MAX_CAPACITY = 300
TRAINING_STEPS = 3001
# To get the list of image filenames and labels from the text file
def read_labeled_image_list(list_filename):
f = open(list_filename,'r')
filenames = []
labels = []
for line in f:
filename, label = line[:-1].split(' ')
filenames.append(filename)
labels.append(int(label))
return filenames,labels
# To get images and labels in batches
def add_to_batch(image,label):
image_batch,label_batch = tf.train.batch([image,label],batch_size=BATCH_SIZE,num_threads=1,capacity=MAX_CAPACITY)
return image_batch, tf.reshape(label_batch,[BATCH_SIZE])
# To decode a single image and its label
def read_image_with_label(input_queue):
""" Image """
# Read
file_contents = tf.read_file(input_queue[0])
example = tf.image.decode_png(file_contents)
# Reshape
my_image = tf.cast(example,tf.float32)
my_image = tf.reshape(my_image,[256,256,3])
# Normalisation
my_image = my_image/255
my_mean = tf.reduce_mean(my_image)
# Centralisation
my_image = my_image - my_mean
""" Label """
label = input_queue[1]-1
return add_to_batch(my_image,label)
# Network
def inference(x):
""" Layer 1: Convolutional """
# Initialise variables
W_conv1 = tf.Variable(tf.truncated_normal([11,11,3,64],stddev=0.0001),name='W_conv1')
b_conv1 = tf.Variable(tf.constant(0.1,shape=[64]),name='b_conv1')
# Convolutional layer
h_conv1 = tf.nn.relu(tf.nn.conv2d(x,W_conv1,strides=[1,4,4,1],padding='SAME') + b_conv1)
""" Layer 2: Convolutional """
# Initialise variables
W_conv2 = tf.Variable(tf.truncated_normal([5,5,64,128],stddev=0.0001),name='W_conv2')
b_conv2 = tf.Variable(tf.constant(0.1,shape=[128]),name='b_conv2')
# Convolutional layer
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_conv1,W_conv2,strides=[1,2,2,1],padding='SAME') + b_conv2)
""" Layer 3: Convolutional """
# Initialise variables
W_conv3 = tf.Variable(tf.truncated_normal([3,3,128,256],stddev=0.0001),name='W_conv3')
b_conv3 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv3')
# Convolutional layer
h_conv3 = tf.nn.relu(tf.nn.conv2d(h_conv2,W_conv3,strides=[1,2,2,1],padding='SAME') + b_conv3)
""" Layer 4: Convolutional """
# Initialise variables
W_conv4 = tf.Variable(tf.truncated_normal([3,3,256,512],stddev=0.0001),name='W_conv4')
b_conv4 = tf.Variable(tf.constant(0.1,shape=[512]),name='b_conv4')
# Convolutional layer
h_conv4 = tf.nn.relu(tf.nn.conv2d(h_conv3,W_conv4,strides=[1,2,2,1],padding='SAME') + b_conv4)
""" Layer 5: Convolutional """
# Initialise variables
W_conv5 = tf.Variable(tf.truncated_normal([3,3,512,256],stddev=0.0001),name='W_conv5')
b_conv5 = tf.Variable(tf.constant(0.1,shape=[256]),name='b_conv5')
# Convolutional layer
h_conv5 = tf.nn.relu(tf.nn.conv2d(h_conv4,W_conv5,strides=[1,1,1,1],padding='SAME') + b_conv5)
""" Layer X: Pooling
# Pooling layer
h_pool1 = tf.nn.max_pool(h_conv3,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')"""
""" Layer 6: Fully-connected """
# Initialise variables
W_fc1 = tf.Variable(tf.truncated_normal([8*8*256,4096],stddev=0.0001),name='W_fc1')
b_fc1 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc1')
# Multiplication layer
h_conv5_reshaped = tf.reshape(h_conv5,[-1,8*8*256])
h_fc1 = tf.nn.relu(tf.matmul(h_conv5_reshaped, W_fc1) + b_fc1)
""" Layer 7: Fully-connected """
# Initialise variables
W_fc2 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc2')
b_fc2 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc2')
# Multiplication layer
h_fc2 = tf.nn.relu(tf.matmul(h_fc1, W_fc2) + b_fc2)
""" Layer 8: Fully-connected """
# Initialise variables
W_fc3 = tf.Variable(tf.truncated_normal([4096,4096],stddev=0.0001),name='W_fc3')
b_fc3 = tf.Variable(tf.constant(0.1,shape=[4096]),name='b_fc3')
# Multiplication layer
h_fc3 = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3)
""" Layer 9: Dropout layer """
# Keep/drop nodes with 50% chance
h_dropout = tf.nn.dropout(h_fc3,0.5)
""" Readout layer: Softmax """
# Initialise variables
W_softmax = tf.Variable(tf.truncated_normal([4096,5],stddev=0.0001),name='W_softmax')
b_softmax = tf.Variable(tf.constant(0.1,shape=[5]),name='b_softmax')
# Multiplication layer
y_conv = tf.nn.relu(tf.matmul(h_dropout,W_softmax) + b_softmax)
""" Summaries """
tf.histogram_summary('W_conv1',W_conv1)
tf.histogram_summary('W_conv2',W_conv2)
tf.histogram_summary('W_conv3',W_conv3)
tf.histogram_summary('W_conv4',W_conv4)
tf.histogram_summary('W_conv5',W_conv5)
tf.histogram_summary('W_fc1',W_fc1)
tf.histogram_summary('W_fc2',W_fc2)
tf.histogram_summary('W_fc3',W_fc3)
tf.histogram_summary('W_softmax',W_softmax)
tf.histogram_summary('b_conv1',b_conv1)
tf.histogram_summary('b_conv2',b_conv2)
tf.histogram_summary('b_conv3',b_conv3)
tf.histogram_summary('b_conv4',b_conv4)
tf.histogram_summary('b_conv5',b_conv5)
tf.histogram_summary('b_fc1',b_fc1)
tf.histogram_summary('b_fc2',b_fc2)
tf.histogram_summary('b_fc3',b_fc3)
tf.histogram_summary('b_softmax',b_softmax)
return y_conv
# Training
def cost_function(y_label,y_conv):
# Reshape y_label to one-hot vectors
sparse_labels = tf.reshape(y_label,[BATCH_SIZE,1])
indices = tf.reshape(tf.range(BATCH_SIZE),[BATCH_SIZE,1])
concated = tf.concat(1,[indices,sparse_labels])
dense_labels = tf.sparse_to_dense(concated,[BATCH_SIZE,5],1.0,0.0)
# Cross-entropy
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv,dense_labels))
# Accuracy
y_prob = tf.nn.softmax(y_conv)
correct_prediction = tf.equal(tf.argmax(dense_labels,1), tf.argmax(y_prob,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Add to summary
tf.scalar_summary('loss',cost)
tf.scalar_summary('accuracy',accuracy)
return cost, accuracy
def main ():
# To get list of filenames and labels
filename = '/labels/filenames_with_labels_server.txt'
image_list, label_list = read_labeled_image_list(filename)
images = tf.convert_to_tensor(image_list, dtype=tf.string)
labels = tf.convert_to_tensor(label_list,dtype=tf.int32)
# To create the queue
input_queue = tf.train.slice_input_producer([images,labels],shuffle=True,capacity=MAX_CAPACITY)
# To train network
image,label = read_image_with_label(input_queue)
y_conv = inference(image)
loss,acc = cost_function(label,y_conv)
train_step = tf.train.AdamOptimizer(learning_rate=0.001,epsilon=0.1).minimize(loss)
# To write and merge summaries
writer = tf.train.SummaryWriter('/SummaryLogs/log', sess.graph)
merged = tf.merge_all_summaries()
# To save variables
saver = tf.train.Saver()
""" Run session """
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
print('Running...')
for step in range(1,TRAINING_STEPS):
loss_val,acc_val,_,summary_str = sess.run([loss,acc,train_step,merged])
writer.add_summary(summary_str,step)
print "Step %d, Loss %g, Accuracy %g"%(step,loss_val,acc_val)
if(step == 1):
save_path = saver.save(sess,'/SavedVariables/model',global_step=step)
print "Initial model saved: %s"%save_path
save_path = saver.save(sess,'/SavedVariables/model-final')
print "Final model saved: %s"%save_path
""" Close session """
print('Finished')
sess.close()
if __name__ == '__main__':
main()
EDIT:
After making some changes, I managed to get the network to overfit to a small training set of 50 images.
Changes:
Initialization of weights using Xavier initialization
Initialization of bias to zero
No normalisation of images i.e. no division by 255
Centred the images by subtracting the mean pixel value (calculated over the whole training set). In this case, the mean was 114.
Encouraged by this, I proceeded to train my network on the whole training set, only to encounter the SAME issue again. These are the outputs:
Step 1, Loss 1.37815, Accuracy 0.4
y_conv (before softmax):
[[ 0.30913264 0. 1.20176554 0. 0. ]
[ 0. 0. 1.23200822 0. 0. ]
[ 0. 0. 0. 0. 0. ]
[ 0. 0. 1.65852785 0.01910716 0. ]
[ 0. 0. 0.94612855 0. 0.10457891]]
y_prob (after softmax):
[[ 0.1771856 0.130069 0.43260741 0.130069 0.130069 ]
[ 0.13462381 0.13462381 0.46150482 0.13462381 0.13462381]
[ 0.2 0.2 0.2 0.2 0.2 ]
[ 0.1078648 0.1078648 0.56646001 0.1099456 0.1078648 ]
[ 0.14956713 0.14956713 0.38524282 0.14956713 0.16605586]]
Very quickly it becomes:
Step 39, Loss 1.60944, Accuracy 0.2
y_conv (before softmax):
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
y_prob (after softmax):
[[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]
[ 0.2 0.2 0.2 0.2 0.2]]
Clearly a y_conv of all zeros is not a good sign. Looking at the histograms, the weight variables do not change after initialization; only the bias variables change.
This is not so much a "complete" answer but rather a "things you can try if you are facing a similar problem" answer.
I managed to get my network to start to learn something with the following changes:
Xavier initialization of weights
Zero initialization of bias
No normalization of images to [0,1]
Subtracting the mean pixel value (calculated over the whole training set) from the images
No ReLU in the final layer that calculates y_conv
After 3000 iterations of training with a batch-size of 50 images (approximately 10 epochs):
On the testing set it does not perform so well, because my training set is very small and my network was over-fitting; this was expected so I am not surprised there. At least now I know that I have to focus on getting a larger training set, add more regularization or simplify my network.