Best output activation function for binary mask classification - machine-learning

I have a CNN which inputs a satellite image and should output a mask where it finds cars. I have manually labelled images and created masks for each image where each pixel is 1 if there is part of a car in that pixel, 0 otherwise.
I am trying to work out the best output layer activation function and loss function, and I'm fishing for opinions. I know there is a wealth of information out there but I find myself getting confused about whether my problem is regression or classification.
Could someone please offer their opinion? I am currently using the following output and loss in keras:
conv10 = Conv2D(1, 1, activation='sigmoid')(conv9)
model = Model(inputs=[inputs], outputs=[conv10])
model.compile(optimizer=Adam(lr=1e-4), loss='binary_crossentropy', metrics=['accuracy'])
Is this a good idea? Thanks!

This seems like a good idea from my point of view, because you want to output a probability P(px is part of a car | image) for each pixel px in image. Therefore, that's a binary classification problem, for which using the binary_crossentropy loss function (plus a sigmoid activation in the output layer) is appropriate.

Related

What does the 4D array returned by net.forward() in OpenCV DNN means? I have little knowledge about deep learning

I need to use face detection to finish my homework and then I searched on the Internet and I think that using a pre-trained deep learning face detector model with OpenCV's DNN module is easy and good, it works well. Where I learnt it is here: https://www.pyimagesearch.com/2018/02/26/face-detection-with-opencv-and-deep-learning/ , but I am really confused about the 4D array returned by net.forward():
net = cv2.dnn.readNetFromCaffe("deploy.prototxt", "res10_300x300_ssd_iter_140000_fp16.caffemodel")
def detect_img(net, image):
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0), False, False)
net.setInput(blob)
detections = net.forward() # Here is the 4D array.
print(detections.shape)
return show_detections(image, detections)
I almost know nothing about deep learning. I think that I guessed out something by reading "deploy.prototxt" which may be a configuration file of the pre-trained model, I guess, but I still feel really confused about it. May I ask whether there is one way that I can understand the meaning of the 4D array quickly or not? Could I understand how the pre-trained model works roughly, with poor knowledge of deep learning, in a week?
3rd dimension helps you iterate over predictions and
in the 4th dimension, there are actual results
class_lable = int(inference_results[0, 0, i,1]) --> gives one hot encoded class label for ith box
conf = inference_results[0, 0, i, 2] --> gives confidence of ith box prediction
TopLeftX,TopLeftY, BottomRightX, BottomRightY = inference_results[0, 0, i, 3:7] -->gives
co-ordinates bounding boxes for resized small image
and 2nd dimension is used when the predictions are made in more than one stages, for example in YOLO the predictions are done at 3 different layers.
you can iterate over these predictions using 2nd dimension like [:,i,:,:]

Can a dense layer on many inputs be represented as a single matrix multiplication?

Denote a[2, 3] to be a matrix of dimension 2x3. Say there are 10 elements in each input and the network is a two-element classifier (cat or dog, for example). Say there is just one dense layer. For now I am ignoring the bias vector. I know this is an over-simplified neural net, but it is just for this example. Each output in a dense layer of a neural net can be calculated as
output = matmul(input, weights)
Where weights is a weight matrix 10x2, input is an input vector 1x10, and output is an output vector 1x2.
My question is this: Can an entire series of inputs be computed at the same time with a single matrix multiplication? It seems like you could compute
output = matmul(input, weights)
Where there are 100 inputs total, and input is 100x10, weights is 10x2, and output is 100x2.
In back propagation, you could do something similar:
input_err = matmul(output_err, transpose(weights))
weights_err = matmul(transpose(input), output_err)
weights -= learning_rate*weights_err
Where weights is the same, output_err is 100x2, and input is 100x10.
However, I tried to implement a neural network in this way from scratch and I am currently unsuccessful. I am wondering if I have some other error or if my approach is fundamentally wrong.
Thanks!
If anyone else is wondering, I found the answer to my question. This does not in fact work, for a few reasons. Essentially, computing all inputs in this way is like running a network with a batch size equal to the number of inputs. The weights do not get updated between inputs, but rather all at once. And so while it seems that calculating together would be valid, it makes it so that each input does not individually influence the training step by step. However, with a reasonable batch size, you can do 2d matrix multiplications, where the input is batch_size by input_size in order to speed up training.
In addition, if predicting on many inputs (in the test stage, for example), since no weights are updated, an entire matrix multiplication of num_inputs by input_size can be run to compute all inputs in parallel.

Autoencoders: Find the important neurons

I have implemented Autoencoder using Keras that takes 112*112*3 neurons as input and 100 neurons as the compressed/encoded state. I want to find the neurons out of these 100 that learns the important features. So far i have calculated eigen values(e) and eigen vectors(v) using the following steps. And i found out that around first 30 values of (e) is greater than 0. Does that mean the first 30 modes are the important ones? Is there any other method that could find the important neurons?
Thanks in Advance
x_enc = enc_model.predict(x_train, batch_size=BATCH_SIZE) # shape (3156,100)
x_mean = np.mean(x_enc, axis=0) # shape (100,)
x_stds = np.std(x_enc, axis=0) # shape (100,)
x_cov = np.cov((x_enc - x_mean).T) # shape (100,100)
e, v = np.linalg.eig(x_cov) # shape (100,) and (100,100) respectively
I don't know if the approach you are using will actually give you any useful results since the way the network learns and what it exactly learns aren't known, I suggest you use a different kind of autoencoder, that automatically learns disentangled representations of the data in a latent space, this way you can be sure that all the parameters you find are actually contributing to the representation of your data. check this article

Cross Entropy Loss for Semantic Segmentation Keras

I'm pretty sure this is a silly question but I can't find it anywhere else so I'm going to ask it here.
I'm doing semantic image segmentation using a cnn (unet) in keras with 7 labels. So my label for each image is (7,n_rows,n_cols) using the theano backend. So across the 7 layers for each pixel, it's one-hot encoded. In this case, is the correct error function to use categorical cross-entropy? It seems that way to me but the network seems to learn better with binary cross-entropy loss. Can someone shed some light on why that would be and what the principled objective is?
Binary cross-entropy loss should be used with sigmod activation in the last layer and it severely penalizes opposite predictions. It does not take into account that the output is a one-hot coded and the sum of the predictions should be 1. But as mis-predictions are severely penalizing the model somewhat learns to classify properly.
Now to enforce the prior of one-hot code is to use softmax activation with categorical cross-entropy. This is what you should use.
Now the problem is using the softmax in your case as Keras don't support softmax on each pixel.
The easiest way to go about it is permute the dimensions to (n_rows,n_cols,7) using Permute layer and then reshape it to (n_rows*n_cols,7) using Reshape layer. Then you can added the softmax activation layer and use crossentopy loss. The data should also be reshaped accordingly.
The other way of doing so will be to implement depth-softmax :
def depth_softmax(matrix):
sigmoid = lambda x: 1 / (1 + K.exp(-x))
sigmoided_matrix = sigmoid(matrix)
softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
return softmax_matrix
and use it as a lambda layer:
model.add(Deconvolution2D(7, 1, 1, border_mode='same', output_shape=(7,n_rows,n_cols)))
model.add(Permute(2,3,1))
model.add(BatchNormalization())
model.add(Lambda(depth_softmax))
If tf image_dim_ordering is used then you can do way with the Permute layers.
For more reference check here.
I tested the solution of #indraforyou and think that the proposed method has some mistakes. As the commentsection does not allow for proper code segments, here is what I think would be the fixed version:
def depth_softmax(matrix):
from keras import backend as K
exp_matrix = K.exp(matrix)
softmax_matrix = exp_matrix / K.expand_dims(K.sum(exp_matrix, axis=-1), axis=-1)
return softmax_matrix
This method will expect the ordering of the matrix to be (height, width, channels).

Confusion between Binary_crossentropy and Categorical_crossentropy

I am doing binary class classification using deep neural network. Whenever I am using binary_crossentropy my model is not giving good accuracy (it is closer to the random prediction). But if I use categorical crossentropy by making the size of the output layer 2, I am getting good accuracy in only 1 epoch which is close to the 0.90. Can anyone please explain what is happening here?
I also have this problem while trying to use binary_crossentropy with softmax activation in the output layer. As far as I know, softmax give the probability of each class, so if your output layer has 2 nodes, it will be something like p(x1), p(x2) and x1 + x2 = X. Therefore, if you have only 1 output node, it will always be equals to 1.0 (100%), that's why you have close to random prediction (honestly, it will be close to your category distribution in the evaluation set).
Try changing it to another activation method like sigmoid or relu.

Resources