How to use mid-level fine tuning in Keras? - machine-learning

My task is to adapt a pre-trained network from Keras for classification of aerial images (we have a database of 30 categories of aerial images, each containing 200-400 images).
Now, what I don't really understand is this next part.
We must use mid-level fine tuning using a smaller image database, which contains 21 aerial categories.
How can I achieve this?
Should I try to fine tune the smaller database on top of a VGG16 network and then save the model and train the larger database on top of it?

I'm guessing that they want you to fine-tune a trained model by freezing its first X layers and only updating the weights of the last few layers (maybe just the last one, not sure what "mid-level fine-tuning" means).
You need to take your trained model and replace its last layer with 30 outputs to a new layer of 21 outputs. Then you need to freeze all your other layers (except the new one) and train the model on the new dataset.
In Keras you just need to set: "trainable=False" for each layer.
How can I "freeze" Keras layers?

Related

Is there a difference in the weights of hidden layers if I train a model with an output layer of 10 neurons 10 times or only one time with 100 neurons

Essentially I don't have enough RAM to train the model I want from scratch using the 2000 classes all at once. Because of that I was wondering if I could use an output layer of 200 neurons and save the weights after training the model with those 200 classes and then load those same weights and train the model yet again with another 200 different classes until I trained the model with all the 2000 classes.
Note that this dataset is being used to pre-train the model so that I can then, retrain the model with another, much smaller, dataset. So essentially I want to pre-train the model with this big dataset and then switch the output layer and retrain the last layers of the model with a much smaller one.
Does this way of training achieve the same weights on hidden layers as training the model one time with the 2000 classes?
No. Your weights will be different. This would work only if you were training a linear model, but not a neural network.
I find it rather suspicious to see that the problem lies in number of outputs changing from 200 to 2000. This is 10x increase in memory use of final layer, but this should not be a huge number to begin with, maybe your last (penultime) hidden layer is too large? Even if your previous layer is say 2000 too, this would give us a matrix of 2000x2000 which is barely 4,000,000 floats which is 16 megabytes.

Multilabel Classification of concatenated images

I'm working on a side project based on multi-label classification. We consider images of 64x64 pixels made up of 4 thumbnails of 32x32 pixels that was randomly added together. The thumbnails are taken from the Cifar10 database, ending up with 40k train images, and 20k test images.
The initial goal of multi class classification becomes multilabel classification. Here is an example of the dataset.
The problem is that I tried a lot of things and the pure accuracy of the model doesn't exceed 1%, whereas the loss decreases.
Here is what I tried:
balancing the dataset ( same proportion of images regarding the class inside the image ).
Data augmentation up to 200k images in the train
Transfer Learning with dozens of models with/without fine tuning, and changed the last layer.
changing the multilabel problem to a multi class problem, I ended up with 385 classes that contains all the combinations ( I think ) of the images.
convolution 2D with a stride of 32 and a kernel size of 32x32.
Visio Transformer.
Trying dozens of optimizers, with different learning rate using a learning scheduler.
I'm pretty sure that the delimitation between the thumbnails is a problem for the convolution kernel because of the decor relation of the thumbnails in their corners.
I'm out of ideas that was I'm asking this question.

Fine-tune a model with larger input size

I was wondering does it make sense to fine-tune a model with larger input size? Ideally, the properties I would like to have:
Fine-tune: meaning reuse the weights by pre-training
Larger input size: Not down-sampling before feeding in the model. Maybe have a larger stride size?
Specifically I'm trying to fine-tune InceptionV3 in Keras with my specific label set. I want a larger data size since I hope the model can implicitly learn some important characters. With InceptionV3 default size (299x299) this doesn't sounds possible to me.
But that sounds like I have to change the specific model that I'm re-using (say by modifying specific layer in model architecure), then re-using pre-trained weights doesn't make sense?
If you want to fine-tune a classification model usually you would remove a few of the top layers, which acts as the classifier, and add your own layers. This is the same with fine-tuning the Inception_V3 model: you can remove the top layers and add your own classifier with the desired number of units (i.e. number of classes in your dataset). For example:
from keras.applications.inception_v3 import InceptionV3
# let's say our images are of size (1000, 1000, 3)
inc_v3 = InceptionV3(include_top=False, input_shape=(1000, 1000, 3), pooling)
# add your desired layers to the top
# we only add one layer just for illustration
# but you can add as many layers as you want
out = Dense(num_classes, activation='softmax')(inc_v3.output)
# construct the new model
model = Model(inc_v3.input, out)
However, note that you need to first freeze all the base layers (i.e. layers of Inception_V3 model) for fine-tuning. Further, instead of adding a pooling layer at the top (i.e. pooling='avg'), you can also use other alternatives such as using a Flatten layer.
Further, I recommend you to read the relevant official Keras tutorial: Building powerful image classification models using very little data (the second and third sections are mostly relevant to this).

How should I optimize neural network for image classification using pretrained models

Thank you for viewing my question. I'm trying to do image classification based on some pre-trained models, the images should be classified to 40 classes. I want to use VGG and Xception pre-trained model to convert each image to two 1000-dimensions vectors and stack them to a 1*2000 dimensions vector as the input of my network and the network has an 40 dimensions output. The network has 2 hidden layers, one with 1024 neurons and the other one has 512 neurons.
Structure:
image-> vgg(1*1000 dimensions), xception(1*1000 dimensions)->(1*2000 dimensions) as input -> 1024 neurons -> 512 neurons -> 40 dimension output -> softmax
However, using this structure I can only achieve about 30% accuracy. So my question is that how could I optimize the structure of my networks to achieve higher accuracy? I'm new to deep learning so I'm not quiet sure my current design is 'correct'. I'm really looking forward to your advice
I'm not entirely sure I understand your network architecture, but some pieces don't look right to me.
There are two major transfer learning scenarios:
ConvNet as fixed feature extractor. Take a pretrained network (any of VGG and Xception will do, do not need both), remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. For example, in an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
Tip #1: take only one pretrained network.
Tip #2: no need for multiple hidden layers for your own classifier.
Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.
Tip #3: keep the early pretrained layers fixed.
Tip #4: use a small learning rate for fine-tuning because you don't want to distort other pretrained layers too quickly and too much.
This architecture much more resembled the ones I saw that solve the same problem and has higher chances to hit high accuracy.
There are couple of steps you may try when the model is not fitting well:
Increase training time and decrease learning rate. It may be stopping at very bad local optima.
Add additional layers that can extract specific features for the large number of classes.
Create multiple two-class deep networks for each class ('yes' or 'no' output class). This will let each network be more specialized for each class, rather than training one single network to learn all 40 classes.
Increase training samples.

use trained keras cnn to generate feature maps

I trained up a very vanilla CNN using keras/theano that does a pretty good job of detecting whether a small (32X32) portion of an image contains a (relatively simple) object of type A or B (or neither). The output is an array of three numbers [prob(neither class), prob(A), prob(B)]. Now, I want to take a big image (512X680, methinks) and sweep across the image, running the trained model on each 32X32 sub-image to generate a feature map that's 480X648, at each point consisting of a 3-vector of the aforementioned probabilities. Basically, I want to use my whole trained CNN as a (nonlinear) filter with three-dimensional output. At the moment, I am cutting each 32X32 out of the image one at a time and running the model on it, then dropping the resulting 3-vectors in a big 3X480X648 array. However, this approach is very slow. Is there a faster/better way to do this?

Resources