How to prevent a lazy Convolutional Neural Network? - machine-learning

How to prevent a lazy Convolutional Neural Network? I end with a ‘lazy CNN’ after training it with KERAS. Whatever the input is, the output is constant. What do you think the problem is?
I try to repeat an experiment of NVIDIA’s End to End Learning for Self-Driving Cars the paper. Absolutely, I do not have a real car but a Udacity’s simulator . The simulator generates figures about the foreground of a car.
A CNN receives the figure, and it gives the steering angle to keep the car in the track. The rule of the game is to keep the simulated car runs in the track safely. It is not very difficult.
The strange thing is sometimes I end with a lazy CNN after training it with KERAS, which gives constant steering angles. The simulated car will go off the trick, but the output of the CNN has no change. Especially the layer gets deeper, e.g. the CNN in the paper.
If I use a CNN like this, I can get a useful model after training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
But, if I use a deeper CNN, I have more chance to receive a lazy CNN.
Specifically, if I use a CNN which likes NVIDIA’s, I almost receive a lazy CNN after every training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(1164))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
I use ‘relu’ for convolution layers, and the activation function for the fully connected layer is ‘sigmoid’. I try to change the activation function, but there is no effect.
There is my analysis. I do not agree with a bug in my program because I can successfully drive the car with same codes and a simpler CNN. I think the reason is the simulator or the structure of the neural network. In a real self-driving car, the training signal, that is the steering angle, should contain noise; therefor, the driver never holds the wheel still in the real road. But in the simulator, the training signal is very clean. Almost 60% of the steering angle is zero. The optimizer can easily do the job by turning the output of CNN close to the zero. It seems the optimizer is lazy too. However, when we really want this CNN output something, it also gives zeros. So, I add small noise for these zero steering angles. The chance that I get a lazy CNN is smaller, but it is not disappearing.
What do you think about my analysis? Is there other strategy that I can use? I am wondering whether similar problems have been solved in the long history of CNN research.
resource:
The related files have been uploaded to GitHub. You can repeat the entire experiment with these files.

I can't run your model, because neither the question not the GitHub repo contains the data. That's why I am 90% sure of my answer.
But I think the main problem of your network is the sigmoid activation function after dense layers. I assume, it will train well when there's just two of them, but four is too much.
Unfortunately, NVidia's End to End Learning for Self-Driving Cars paper doesn't specify it explicitly, but these days the default activation is no longer sigmoid (as it once was), but relu. See this discussion if you're interested why that is so. So the solution I'm proposing is try this model:
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(36, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(48, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Flatten())
model.add(Dense(1164, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(10, activation="relu"))
model.add(Dense(1))
It mimics the NVidia's network architecture and does not suffer from the vanishing gradients.

Related

How to get a good binary classification deep neural model where negative data is more on dataset?

I wanted to make a binary image classification using Cifar-10 dataset. Where I modified Cifar-10 such a way that class-0 as class-True(1) and all other class as class-False(0). Now there is only two classes in my dataset - True(1) and False(0).
while I am doing training using the following Keras model(Tensorflow as backend) I am getting almost 99% accuracy.
But in the test I am finding that all the False is predicted as False and all True are also predicted as False - and getting 99% accuracy.
But I do not wanted that all True are predicted as False.
I was expecting that all True are predicted as True.
How can I resolve this problem?
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
output=model.fit(x_train, y_train, batch_size=32, epochs=10)
You have a few options here:
Get more data with True label. However in most scenarios this is not easily possible.
Use only a small amount of the data that is labeled False. Maybe it is enough to train your model?
Use weights for the loss function during training. In Kerasyou can do this using the class_weight option of fit. The class True should have a higher weight than the class False in your example.
As mentioned in the comments this is a huge problem in the ML field. These are just a few very simple things you could try.

How to use categorical_hinge loss in keras in order to train with an SVM in the last layer?

I wanna train a CNN using SVM to classify at the last layer. I understand that the categorical_hinge is the best loss function for that . I have 6 classes to classify .
My model is as shown below:
model = Sequential()
model.add(Conv2D(50, 3, 3, activation = 'relu', input_shape = train_data.shape[1:]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(Flatten())
model.add(Dense(400, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
Is there a problem with the network , data processing , or the loss function?
The model does not learn anything after a point as shown in the image
What should I do?
Your model has a single output neuron, there is no way this will work with 6 classes. The output of your model should have 6 neurons. Also the output of your model should have no activation function in order to produce logits that the categorical hinge can use.
Note that the categorical hinge was added recently (2-3 weeks ago) so its quite new and probably not many people have tested it.
Use hinge loss in and linear activation in last layer.
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))
model.compile(loss='hinge',
optimizer='adadelta',
metrics=['accuracy'])
for more information visit https://github.com/keras-team/keras/issues/6090

Image classifier with Keras not converging

all. I am trying to build an image classifier with Keras (Tensorflow as backend). The objective is to separate memes from other images.
I am using the structure convolutional layers + fully connected layers with max pooling and dropouts.
The code is as following:
model = Sequential()
model.add(Conv2D(64, (3,3), activation='relu', input_shape=conv_input_shape))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.
compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
The input is a matrix of shape (n, 100, 100, 3). n RGB images with resolution 100 x 100, and output labels are [1, 0] for meme and [0, 1] otherwise.
However, when I train the model, the loss won't ever decrease from the first iteration.
Is there anything off in the code?
I am thinking that meme is actually not that different from other images in many ways except that some of them have some sort of captions together with some other features.
What are some better architectures to solve a problem like this?

Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Background:
Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.
I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.
def build_model1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
return model
Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.
def build_model2():
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation = "relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation = "softmax"))
return model
Question:
Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?
Thank y'all for your time and help.
The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:
These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.

How to load only specific weights on Keras

I have a trained model that I've exported the weights and want to partially load into another model.
My model is built in Keras using TensorFlow as backend.
Right now I'm doing as follows:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape, trainable=False))
model.add(Activation('relu', trainable=False))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), trainable=False))
model.add(Activation('relu', trainable=False))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), trainable=True))
model.add(Activation('relu', trainable=True))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.load_weights("image_500.h5")
model.pop()
model.pop()
model.pop()
model.pop()
model.pop()
model.pop()
model.add(Conv2D(1, (6, 6),strides=(1, 1), trainable=True))
model.add(Activation('relu', trainable=True))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
I'm sure it's a terrible way to do it, although it works.
How do I load just the first 9 layers?
If your first 9 layers are consistently named between your original trained model and the new model, then you can use model.load_weights() with by_name=True. This will update weights only in the layers of your new model that have an identically named layer found in the original trained model.
The name of the layer can be specified with the name keyword, for example:
model.add(Dense(8, activation='relu',name='dens_1'))
This call:
weights_list = model.get_weights()
will return a list of all weight tensors in the model, as Numpy arrays.
All what you have to do next is to iterate over this list and apply:
for i, weights in enumerate(weights_list[0:9]):
model.layers[i].set_weights(weights)
where model.layers is a flattened list of the layers comprising the model. In this case, you reload the weights of the first 9 layers.
More information is available here:
https://keras.io/layers/about-keras-layers/
https://keras.io/models/about-keras-models/

Resources