I want to fine tune a gender detector trained in caffe using my own data set. This model was trained by using around half a million face images. They fine tuned VGG16 pre-trained on ImageNet. I want to use this model as my base model.I downloaded gender.caffemodel file from this link:
IMDB-Wiki
I've used the tool provided in link below to convert this model to an h5 file to use in Keras:
https://github.com/pierluigiferrari/caffe_weight_converter
This tool only converts the weights.
I want to use Keras to train my model. So, I define VGG-16 Architecture like this:
tmp_model = Sequential()
tmp_model.add(ZeroPadding2D((1,1),input_shape=(224, 224, 3)))
tmp_model.add(Convolution2D(64, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(64, 3, 3, activation='relu'))
tmp_model.add(MaxPooling2D((2,2), strides=(2,2)))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(128, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(128, 3, 3, activation='relu'))
tmp_model.add(MaxPooling2D((2,2), strides=(2,2)))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(256, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(256, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(256, 3, 3, activation='relu'))
tmp_model.add(MaxPooling2D((2,2), strides=(2,2)))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(MaxPooling2D((2,2), strides=(2,2)))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(ZeroPadding2D((1,1)))
tmp_model.add(Convolution2D(512, 3, 3, activation='relu'))
tmp_model.add(MaxPooling2D((2,2), strides=(2,2)))
tmp_model.add(Flatten())
tmp_model.add(Dense(4096, activation='relu'))
tmp_model.add(Dropout(0.5))
tmp_model.add(Dense(4096, activation='relu'))
tmp_model.add(Dropout(0.5))
tmp_model.add(Dense(2, activation='softmax'))
tmp_model.load_weights('/home/gender.h5')
This code loads the weights successfully. Now I want to use the weights of this model to fine tune another network for some other classification task, with a different number of classes.
Since number of classes is different than in tmp_model, I copy the weights from the tmp_model to my new model's layers, except the last layer, i.e. softmax. The code for the new model, is exactly the same as tmp_model, except for the last layer.
Now what I did, was copy the weights from tmp_model to new model, layer by layer:
for i, weights in enumerate(weights_list[0:31]):
model.layers[i].set_weights(weights)
And the problem occurs here. When I run my code, It gives me this error:
ValueError: You called `set_weights(weights)` on layer "zero_padding2d_14" with a weight list of length 3, but the layer was expecting 0 weights. Provided weights: [[[[ 0.27716994 0.05686508 0.0098957 ... -0.055...
As I said, tmp_model and model, have the exact same architecture, except for the last layer. Which is why I'm just copying the weight for all the layers, except the last one. What am I doing wrong?
In my opinion, weights_list only includes layers which contains weights, and normally it only has 16 layers because VGG16 has 16 layers which contain weights . But model.layers ranges from 0 to N, here N is larger than 16 since model.layers includes layers which contains no weights, such as relu layer,padding_layer, and max_pooling layer, so on.
Related
I wanted to make a binary image classification using Cifar-10 dataset. Where I modified Cifar-10 such a way that class-0 as class-True(1) and all other class as class-False(0). Now there is only two classes in my dataset - True(1) and False(0).
while I am doing training using the following Keras model(Tensorflow as backend) I am getting almost 99% accuracy.
But in the test I am finding that all the False is predicted as False and all True are also predicted as False - and getting 99% accuracy.
But I do not wanted that all True are predicted as False.
I was expecting that all True are predicted as True.
How can I resolve this problem?
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
output=model.fit(x_train, y_train, batch_size=32, epochs=10)
You have a few options here:
Get more data with True label. However in most scenarios this is not easily possible.
Use only a small amount of the data that is labeled False. Maybe it is enough to train your model?
Use weights for the loss function during training. In Kerasyou can do this using the class_weight option of fit. The class True should have a higher weight than the class False in your example.
As mentioned in the comments this is a huge problem in the ML field. These are just a few very simple things you could try.
I tried implementing AlexNet as explained in this video. Pardon me if I have implemented it wrong, this is the code for my implementation it in keras.
Edit : The cifar-10 ImageDataGenerator
cifar_generator = ImageDataGenerator()
cifar_data = cifar_generator.flow_from_directory('datasets/cifar-10/train',
batch_size=32,
target_size=input_size,
class_mode='categorical')
The Model described in Keras:
model = Sequential()
model.add(Convolution2D(filters=96, kernel_size=(11, 11), input_shape=(227, 227, 3), strides=4, activation='relu'))
model.add(MaxPool2D(pool_size=(3 ,3), strides=2))
model.add(Convolution2D(filters=256, kernel_size=(5, 5), strides=1, padding='same', activation='relu'))
model.add(MaxPool2D(pool_size=(3 ,3), strides=2))
model.add(Convolution2D(filters=384, kernel_size=(3, 3), strides=1, padding='same', activation='relu'))
model.add(Convolution2D(filters=384, kernel_size=(3, 3), strides=1, padding='same', activation='relu'))
model.add(Convolution2D(filters=256, kernel_size=(3, 3), strides=1, padding='same', activation='relu'))
model.add(MaxPool2D(pool_size=(3 ,3), strides=2))
model.add(Flatten())
model.add(Dense(units=4096))
model.add(Dense(units=4096))
model.add(Dense(units=10, activation='softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
I have used an ImageDataGenerator to train this network on the cifar-10 data set. However, I am only able to get an accuracy of about .20. I cannot figure out what I am doing wrong.
For starters, you need to extend the relu activation to your two intermediate dense layers, too; as they are now:
model.add(Dense(units=4096))
model.add(Dense(units=4096))
i.e. with linear activation (default), it can be shown that they are equivalent to a simple linear unit each (Andrew Ng devotes a whole lecture in his first course on the DL specialization explaining this). Change them to:
model.add(Dense(units=4096, activation='relu'))
model.add(Dense(units=4096, activation='relu'))
Check the SO thread Why must a nonlinear activation function be used in a backpropagation neural network?, as well as the AlexNet implementations here and here to confirm this.
all. I am trying to build an image classifier with Keras (Tensorflow as backend). The objective is to separate memes from other images.
I am using the structure convolutional layers + fully connected layers with max pooling and dropouts.
The code is as following:
model = Sequential()
model.add(Conv2D(64, (3,3), activation='relu', input_shape=conv_input_shape))
model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.
compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
The input is a matrix of shape (n, 100, 100, 3). n RGB images with resolution 100 x 100, and output labels are [1, 0] for meme and [0, 1] otherwise.
However, when I train the model, the loss won't ever decrease from the first iteration.
Is there anything off in the code?
I am thinking that meme is actually not that different from other images in many ways except that some of them have some sort of captions together with some other features.
What are some better architectures to solve a problem like this?
How to prevent a lazy Convolutional Neural Network? I end with a ‘lazy CNN’ after training it with KERAS. Whatever the input is, the output is constant. What do you think the problem is?
I try to repeat an experiment of NVIDIA’s End to End Learning for Self-Driving Cars the paper. Absolutely, I do not have a real car but a Udacity’s simulator . The simulator generates figures about the foreground of a car.
A CNN receives the figure, and it gives the steering angle to keep the car in the track. The rule of the game is to keep the simulated car runs in the track safely. It is not very difficult.
The strange thing is sometimes I end with a lazy CNN after training it with KERAS, which gives constant steering angles. The simulated car will go off the trick, but the output of the CNN has no change. Especially the layer gets deeper, e.g. the CNN in the paper.
If I use a CNN like this, I can get a useful model after training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
But, if I use a deeper CNN, I have more chance to receive a lazy CNN.
Specifically, if I use a CNN which likes NVIDIA’s, I almost receive a lazy CNN after every training.
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(36, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(48, 5, strides=(2, 2)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, strides=(1, 1)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(1164))
model.add(Activation('sigmoid'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.add(Dense(50))
model.add(Activation('sigmoid'))
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.add(Dense(1))
I use ‘relu’ for convolution layers, and the activation function for the fully connected layer is ‘sigmoid’. I try to change the activation function, but there is no effect.
There is my analysis. I do not agree with a bug in my program because I can successfully drive the car with same codes and a simpler CNN. I think the reason is the simulator or the structure of the neural network. In a real self-driving car, the training signal, that is the steering angle, should contain noise; therefor, the driver never holds the wheel still in the real road. But in the simulator, the training signal is very clean. Almost 60% of the steering angle is zero. The optimizer can easily do the job by turning the output of CNN close to the zero. It seems the optimizer is lazy too. However, when we really want this CNN output something, it also gives zeros. So, I add small noise for these zero steering angles. The chance that I get a lazy CNN is smaller, but it is not disappearing.
What do you think about my analysis? Is there other strategy that I can use? I am wondering whether similar problems have been solved in the long history of CNN research.
resource:
The related files have been uploaded to GitHub. You can repeat the entire experiment with these files.
I can't run your model, because neither the question not the GitHub repo contains the data. That's why I am 90% sure of my answer.
But I think the main problem of your network is the sigmoid activation function after dense layers. I assume, it will train well when there's just two of them, but four is too much.
Unfortunately, NVidia's End to End Learning for Self-Driving Cars paper doesn't specify it explicitly, but these days the default activation is no longer sigmoid (as it once was), but relu. See this discussion if you're interested why that is so. So the solution I'm proposing is try this model:
model = Sequential()
model.add(Lambda(lambda x: x/255.0 - 0.5, input_shape = (160,320,3)))
model.add(Cropping2D(cropping=((70,25),(0,0))))
model.add(Conv2D(24, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(36, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(48, (5, 5), strides=(2, 2), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation="relu"))
model.add(Flatten())
model.add(Dense(1164, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(10, activation="relu"))
model.add(Dense(1))
It mimics the NVidia's network architecture and does not suffer from the vanishing gradients.
I have a trained model that I've exported the weights and want to partially load into another model.
My model is built in Keras using TensorFlow as backend.
Right now I'm doing as follows:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape, trainable=False))
model.add(Activation('relu', trainable=False))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), trainable=False))
model.add(Activation('relu', trainable=False))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), trainable=True))
model.add(Activation('relu', trainable=True))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.load_weights("image_500.h5")
model.pop()
model.pop()
model.pop()
model.pop()
model.pop()
model.pop()
model.add(Conv2D(1, (6, 6),strides=(1, 1), trainable=True))
model.add(Activation('relu', trainable=True))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
I'm sure it's a terrible way to do it, although it works.
How do I load just the first 9 layers?
If your first 9 layers are consistently named between your original trained model and the new model, then you can use model.load_weights() with by_name=True. This will update weights only in the layers of your new model that have an identically named layer found in the original trained model.
The name of the layer can be specified with the name keyword, for example:
model.add(Dense(8, activation='relu',name='dens_1'))
This call:
weights_list = model.get_weights()
will return a list of all weight tensors in the model, as Numpy arrays.
All what you have to do next is to iterate over this list and apply:
for i, weights in enumerate(weights_list[0:9]):
model.layers[i].set_weights(weights)
where model.layers is a flattened list of the layers comprising the model. In this case, you reload the weights of the first 9 layers.
More information is available here:
https://keras.io/layers/about-keras-layers/
https://keras.io/models/about-keras-models/