When should you use pretrained weights when training deep learning models? - machine-learning

I am interested in training a range of image and object detection models and I am wondering what the general rule of when to use pretrained weights of a network like VGG16 is.
For example, it seems obvious that fine-tuning pre-trained VGG16 imagenet model weights is helpful you are looking for a subset ie. Cats and Dogs.
However it seems less clear to me whether using these pretrained weights is a good idea if you are training an image classifier with 300 classes with only some of them being subsets of the classes in the pretrained model.
What is the intuition around this?

Lower layers learn features that are not necessarily specific to your application/dataset: corners, edges , simple shapes, etc. So it does not matter if your data is strictly a subset of the categories that the original network can predict.
Depending on how much data you have available for training, and how similar the data is to the one used in the pretrained network, you can decide to freeze the lower layers and learn only the higher ones, or simply train a classifier on top of your pretrained network.
Check here for a more detailed answer

Related

What to do if neural network always performs poorly even after addressing overfitting?

I have a medical image dataset of ~10K 256x256 images with which I am training a deep neural classifier for disease classification. I have been working with popular CNNs like InceptionV3 and ResNets.
These models have achieved validation set accuracies in the 50-60% range and I noticed that they were overfitting. So to improve the performance, I then tried common strategies like a dropout in the dense layers, smaller learning rates, and L2 regularization. After these modifications showed no reduction in overfitting, I next moved to smaller and simpler architectures with just 2-3 convolution layers + 1 FC classification layer which I thought would mitigate the issue. However, with the simpler models, the learning curves still showed signs of overfitting. Particularly, when training for 100 epochs, the models would have similar train and validation losses for the first 20-30 epochs, but then diverge after that.
I'm not sure what other strategies I can experiment with at this point and I'm worried that trying more experiments aimlessly is inefficient. Should I just accept that the models cannot generalize to this task well?
Additionally, FYI, the dataset is imbalanced, but I have dealt with this using data augmentation and a weighted cross-entropy loss as well but no real difference.
Try to use modern classification approaches like transformers or efficientnets - their accuracy is higher. To compare different modern architectures please use paperswithcode.
Augmentations, regularizations are must-have in training process, doesn't matter if balanced or imbalanced data you have.
You can try to make over- or undersampling of your data to get better results
Try to use warmup and learning rate schedules, this improves the convergence of the model

why the input size is varies in pretrained models using keras?

the pretrained models such as vgg16, inception v3, mobilenet, resnet152 and so.
please give some knowledge about this.
why this input size differ from one model to another?
vgg16 299*299
resnet 224*224
inception v3 299*299
mobilenet 224*224
All of these models are implementations of particular scientific papers, which all used different input sizes. Some models use the published weights, meaning that if you want to use these weights to reproduce their results, then you have to use the same input size.
But note that this applies only if you use the pretrained weights from the ImageNet dataset, if you want to train these models from scratch (random initialization), then you can specify a different input_shape without any issue, just respecting some constraints due to the depth of the model.

Image similarity detection with TensorFlow

Recently I started to play with tensorflow, while trying to learn the popular algorithms i am in a situation where i need to find similarity between images.
Image A is supplied to the system by me, and userx supplies an image B and the system should retrieve image A to the userx if image B is similar(color and class).
Now i have got few questions:
Do we consider this scenario to be supervised learning? I am asking
because i don't see it as a classification problem(confused!!)
What algorithms i should use to train etc..
Re-training should be done quite often, how should i tackle this
problem so i don't train everytime from scratch( fine-tuning??)
Do we consider this scenario to be supervised learning?
It is supervised learning when you have labels to optimize your model. So for most neural networks, it is supervised.
However, you might also look at the complete task. I guess you don't have any ground truth for image pairs and the "desired" similarity value your model should output?
One way to solve this problem which sounds inherently unsupervised is to take a CNN (convolutional neural network) trained (in a supervised way) on the 1000 classes of image net. To get the similarity of two images, you could then simply take the euclidean distance of the output probability distribution. This will not lead to excellent results, but is probably a good starter.
What algorithms i should use to train etc..
First, you should define what "similar" means for you. Are two images similar when they contain the same object (classes)? Are they similar if the general color of the image is the same?
For example, how similar are the following 3 pairs of images?
Have a look at FaceNet and search for "Content based image retrieval" (CBIR):
Wikipedia
Google Scholar
This can be a supervised learning. You can classify the images into categories, if two images are in the same categories (or close in a category), you can think of them as similar.
You can use the deep conventional neural networks for imagenet such as inception model. The inception model outputs a probability map for 1000 classes (which is a vector whose values sum to 1). You can calculate the distance of vectors of two images to get their similarity.
On the same page of the inception model, you will also find the instructions to retrain a model: https://github.com/tensorflow/models/tree/master/inception#how-to-fine-tune-a-pre-trained-model-on-a-new-task

type of recognition of convolution neural network

I was trying to create a convolution neural network for the recognition of animals, vehicles, buildings, trees, plants from a large data-set having the combination of these objects.
At the time of training I got a doubt about the way in which the network should be trained. My doubt is that whether I could train the network with the data-set of whole animals as a single attribute or train each animals separately?
Means, one group for lions, one for tigers, one for elephants etc and at the time of testing I can code it to output the result as animal if any one of its subcategory is satisfied.
I got this doubt since I have read that there should be a correct pattern in the data-set for the efficient detection and there should be a pattern only if we are training with the subcategory of objects than the vast data-set.
I have attached a figure showing the sample dataset(only logically correct). I want to know whether there should be separate data-set or single data-set.
Training on a separate data-set or a single data-set will depend on a variety of factors. If you want to classify the images in your test dataset using the Convolution Neural Network into just animals and not further subdivide them, then training on a single-data should be done. However, if you plan to further sub classify the images into tigers and lions, then the training needs to be done on separate datasets of tigers and lions.
The type of the dataset that you use for training will highly depend on your requirements of classification on the test dataset.
Moreover, you have to make sure that you normalize the images before you use it for training.

Fine Tuning of GoogLeNet Model

I trained GoogLeNet model from scratch. But it didn't give me the promising results.
As an alternative, I would like to do fine tuning of GoogLeNet model on my dataset. Does anyone know what are the steps should I follow?
Assuming you are trying to do image classification. These should be the steps for finetuning a model:
1. Classification layer
The original classification layer "loss3/classifier" outputs predictions for 1000 classes (it's mum_output is set to 1000). You'll need to replace it with a new layer with appropriate num_output. Replacing the classification layer:
Change layer's name (so that when you read the original weights from caffemodel file there will be no conflict with the weights of this layer).
Change num_output to the right number of output classes you are trying to predict.
Note that you need to change ALL classification layers. Usually there is only one, but GoogLeNet happens to have three: "loss1/classifier", "loss2/classifier" and "loss3/classifier".
2. Data
You need to make a new training dataset with the new labels you want to fine tune to. See, for example, this post on how to make an lmdb dataset.
3. How extensive a finetuning you want?
When finetuning a model, you can train ALL model's weights or choose to fix some weights (usually filters of the lower/deeper layers) and train only the weights of the top-most layers. This choice is up to you and it ususally depends on the amount of training data available (the more examples you have the more weights you can afford to finetune).
Each layer (that holds trainable parameters) has param { lr_mult: XX }. This coefficient determines how susceptible these weights to SGD updates. Setting param { lr_mult: 0 } means you FIX the weights of this layer and they will not be changed during the training process.
Edit your train_val.prototxt accordingly.
4. Run caffe
Run caffe train but supply it with caffemodel weights as an initial weights:
~$ $CAFFE_ROOT/build/tools/caffe train -solver /path/to/solver.ptototxt -weights /path/to/orig_googlenet_weights.caffemodel
Fine-tuning is a very useful trick to achieve a promising accuracy compared to past manual feature. #Shai already posted a good tutorial for fine-tuning the Googlenet using Caffe, so I just want to give some recommends and tricks for fine-tuning for general cases.
In most of time, we face a task classification problem that new dataset (e.g. Oxford 102 flower dataset or Cat&Dog) has following four common situations CS231n:
New dataset is small and similar to original dataset.
New dataset is small but is different to original dataset (Most common cases)
New dataset is large and similar to original dataset.
New dataset is large but is different to original dataset.
In practice, most of time we do not have enough data to train the network from scratch, but may be enough for pre-trained model. Whatever which cases I mentions above only thing we must care about is that do we have enough data to train the CNN?
If yes, we can train the CNN from scratch. However, in practice it is still beneficial to initialize the weight from pre-trained model.
If no, we need to check whether data is very different from original datasets? If it is very similar, we can just fine-tune the fully connected neural network or fine-tune with SVM. However, If it is very different from original dataset, we may need to fine-tune the convolutional neural network to improve the generalization.

Resources