Fine-tuning and transfer learning by the example of YOLO - machine-learning

I have a general question regarding fine-tuning and transfer learning, which came up when I tried to figure out how to best get yolo to detect my custom object (being hands).
I apologize for the long text possibily containing lots of false information. I would be glad if someone had the patience to read it and help me clear my confusion.
After lots of googling, I learned that many people regard fine-tuning to be a sub-class of transfer learning while others believe that they are to different approaches to training a model. At the same time, people differentiate between re-training only the last classifier layer of a model on a custom dataset vs. also re-training other layers of the model (and possbibly adding an enirely new classifier instead of retraining?). Both approaches use pre-trained models.
My final confusien lies here: I followed these instructions: https://github.com/thtrieu/darkflow to train tiny yolo via darkflow, using the command:
# Initialize yolo-new from yolo-tiny, then train the net on 100% GPU:
flow --model cfg/yolo-new.cfg --load bin/tiny-yolo.weights --train --gpu 1.0
But what happens here? I suppose I only retrain the classifier because the instructions say to change the number of classes in the last layer in the configuration file. But then again, it is also required to change the number of filters in the second last layer, a convolutional layer.
Lastly, the instructions provide an example of an alternative training:
# Completely initialize yolo-new and train it with ADAM optimizer
flow --model cfg/yolo-new.cfg --train --trainer adam and I don't understand at all how this relates to the different ways of transfer learning.

If you are using AlexeyAB's darknet repo (not darkflow), he suggests to do Fine-Tuning instead of Transfer Learning by setting this param in cfg file : stopbackward=1 .
Then input ./darknet partial yourConfigFile.cfg yourWeightsFile.weights outPutName.LastLayer# LastLayer# such as :
./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81
It will create yolov3.conv.81 and will freeze the lower layer, then you can train by using weights file yolov3.conv.81 instead of original darknet53.conv.74.
References : https://github.com/AlexeyAB/darknet#how-to-improve-object-detection , https://groups.google.com/forum/#!topic/darknet/mKkQrjuLPDU

I have not worked on YOLO but looking at your problems I think I can help. Fine tuning, re-training, post-tuning are all somewhat ambiguous terms often used interchangeably. It's all about how much you want to change the pre-trained weights.
Since you are loading the weights in the first case with --load, the pre-trained weights are being loaded here - it could mean you are adjusting the weights a bit with a low learning rate or maybe not changing them at all. In the second case, however, you are not loading any weights, so probably you are training it from scratch. So when you make small (fine) changes, call it fine-tuning, post-tuning would be tuning again after initial training, maybe not as fine as fine-tuning and retraining would then be training the whole network or a part again
There would be separate ways in which you can freeze some layers optionally.

Related

How to add a regression head after the fully connected layer in convolutional network using Tensorflow?

I am new to deep learning and Tensorflow and have to learn this topic due to a project I am currently working on. I am using convolutional network to detect and find the location of a single object in the image. I am using the method introduced in Standford CS231n class. The lecturer mentioned about connecting a regression head after the fully connected layer in the network to find the location of the object. I know there is DNNRegressor in Tensorflow. Should I use this as the regression head?
Before I modified Tensorflow's tutorial on using ConvNet to recognize handwritten digit for my case. I am not too sure how can I add the regression head to that program so that it can also find a bounding box for the object.
I just had the chance to touch machine learning and deep learning this week, apology if I asked a really silly question, but I really need to find a solution to my problem. Thank you very much.
First of all, in order to train a neural network for object localization task, you have to have a data set with localized objects. This answers your question whether you can work with MNIST data set or not. MNIST contains just a class label for each image, so you need to get another data set. Justin also talks about popular data sets at around 37:34.
The way object localization works is by learning to output 4 values per image, instead of class distribution. This four-valued vector is compared to the ground truth four-valued vector and the loss function is usually L1 or L2 norm of their difference. So in code, regression head is an ordinary regression layer, which can be implemented in tensorflow by a simple tf.reduce_mean call.
A small yet complete example that performs object localization can be found here. Also recommend to take a look at this question.
I was looking for this problem as well and I found the following part in the document.
Dense (fully connected) layers, which perform classification on the features extracted by the convolutional layers and downsampled by the pooling layers. In a dense layer, every node in the layer is connected to every node in the preceding layer.
Based on this quote, it seems you can't do regression but classification.
EDIT: After some research, I found out a way to use a fully-connected layer in tensorflow.
import tensorflow.contrib.slim as slim
#create your network **net**.
#In the last step, you should use
y_prime = slim.fully_connected(net, 1, activation_fn=None, reuse=reuse)
loss = tf.reduce_mean(tf.square(y_prime - y)) #L2 norm
lr = tf.placeholder(tf.float32)
opt = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
You can add more fully connected layers before the last step which can have more nodes.

how to load reference .caffemodel for training

I'm using alexnet to train my own dataset.
The example code in caffe comes with
bvlc_reference_caffenet.caffemodel
solver.prototxt
train_val.prototxt
deploy.prototxt
When I train with the following command:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
I'd like to start with weights given in bvlc_reference.caffenet.caffemodel.
My questions are
How do I do that?
Is it a good idea to start from the those weights? Would this converge faster? Would this be bad if my data are vastly different from the Imagenet dataset?
1.
In order to use existing .caffemodel weights for fine-tuning, you need to use --weights command line argument:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --weights=models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
2.
In most cases fine-tuning a net is quite a recommended practice, even when the input images are quite different than "imagenet" photos.
However, you should note that when training for the original weights you are about to use, some (very reasonable) assumptions were made. You should decide whether these assumptions are still true for your task.
For instance, most nets were trained with simple data augmentation using an image and its horizontal flip. However, if your task is to distinguish between images that are flipped you will find it very difficult to fine tune.

Image classification, narrow domain with custom labels

Let's suppose I would like to classify motorbikes by model.
there are couple of hundreds models of motorbikes I'm interested in.
I do have tens, sometimes hundreds of pictures of each motorbike model.
Can you please point me to the practical example that demonstrates how to train model on your data and then use it to classify images? It needs to be a deep learning model, not simple logistic regression.
I'm not sure about it, but it seems like I can't use pre-trained neural net because it has been trained on wide range of objects like cat, human, cars etc. They may be not too good at distinguishing the motorbike nuances I'm interested in.
I found couple of such examples (tensorflow has one), but sadly, all of them were using pre-trained model. None of it had example how to train it on your own dataset.
In cases like yours you either use transfer learning or fine tuning. If you have more then thousand images of motorbikes I would use fine tuning and if you have less transfer learning.
Fine tuning is using a pre trained model and using a different classifier part. Then the new classifier part maybe the last 1-2 layers of the trained model are trained to your dataset.
Transfer learning means using a pre trained model and letting it output features for an input image. Now you use a new classifier based on those features. Maybe a SVM or a logistic regression.
An example for this can be seen here: https://github.com/cpra/dlvc2016/blob/master/lectures/lecture10.pdf. slide 33.
This paper Quick, Draw! Doodle Recognition from a kaggle challenge may be similar enough to what you are doing. The code is on github. You may need some data augmentation if you only have a few hundred images for each category.
What you want is pretty EZ. Follow the darknet YOLO implementation
Instruction: https://pjreddie.com/darknet/yolo/
Code https://github.com/pjreddie/darknet
Training YOLO on COCO
You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.
Get The COCO Data
To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:
cp scripts/get_coco_dataset.sh data
cd data
bash get_coco_dataset.sh
Add your data inside and make sure it is same as testing samples.
Now you should have all the data and the labels generated for Darknet.
Then call training script with the pre-trained weight.
Keep in mind that only training on your motorcycle may not result in good estimation. There would be biased result coming out, I red it somewhere b4.
The rest is all inside the link. Good luck

Reusing Inception V3 Conv Neural Net (tensorflow) with 0% accuracy

EDIT1: My code is the same as here, https://github.com/tensorflow/models/blob/master/inception/inception. The only difference is that I pack my files into TFRecords and feed it bactch wise. Also, the ratio of Class 0 : Class 1 is 70:30.
I'm currently working on a project in which I'm making use of inception-V3 CNN model to train a classifier. Currently, I am working on a binary classifier (either predict 1 or 0) but, my model only predicts class 0 for everything. While troubleshooting I've found that the probability of prediction is 100% for class 0 all the time. I have verified everything from the input queuing system to the eval and testing, everything seems to be working well too.
Strangely, the loss value reduces in a perfect semi-parabolic fashion which makes me think that the loss has converged to a local minima. Upon testing the script only churns out class 0(with 100% probability) each time. Another thing I've noticed is that the activation across various Conv layers are always constant which could imply that the neurons are just not firing at all.
My question is,
1. Is my model working ? The loss seems to converge but the activation across various layers seems to be stagnant.
2. I am using the training code available from the models section of the tensorflow (https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py)
I am reusing the train, eval and supporting code to train my model with a custom input pipeline created by me (which is also working). Can someone help guide me in the right direction on this?
Thanks.
Ik I am a little late in answering this question 😅
First of all your model didn't learn anything at all. All it did (cleverly 😂) was to predict class 0 for all cases so that it achieves a baseline accuracy of 70% without any effort. (Probably the model was lazy 😪😋) JK. This is a very well known problem in machine learning. This is called as class imbalance problem. Refer this http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/.
Apart from the techniques mentioned there. The one technique that works wonders is using class weights. That is, basically telling the network to be biased towards the weaker class. In your case class weights will be class0:class1 = 3:7. This is a hyperparameter too! But this is a good point to start.
Moreover, you didn't give any info about your dataset size. Whether you are fine tuning or training from scratch. Without them it's hard to speculate. By default I would suggest fine tuning.
Moreover, by loss you mean the training loss or validation loss? Because training loss has literally no info regarding the performance of the model. Moreover, in my opinion both training loss, and validation losses have very little info to derive meaningful insights about the model's performance. Use other metrics like confusion matrix,f1 score, recall, precision etc.
Finally, there is absolutely no single answer to your question. The only way is the hard way - you will learn along with the model 😉. Because I consider training a NN especially a CNN an art. In which, intuition plays a very crucial role coz, most of the times, the least expected changes would give the best results. Anyway that's the fun part of training a NN.
Happy training 💪
P.S: Try using the visualisation tools like gradcam to know whether the model is looking at the correct part of the image for classification. This is very important!

Using Convolution Neural network as Binary classifiers

Given any image I want my classifier to tell if it is Sunflower or not. How can I go about creating the second class ? Keeping the set of all possible images - {Sunflower} in the second class is an overkill. Is there any research in this direction ? Currently my classifier uses a neural network in the final layer. I have based it upon the following tutorial :
https://github.com/torch/tutorials/tree/master/2_supervised
I am taking images with 254x254 as the input.
Would SVM help in the final layer ? Also I am open to using any other classifier/features that might help me in this.
The standard approach in ML is that:
1) Build model
2) Try to train on some data with positive\negative examples (start with 50\50 of pos\neg in training set)
3) Validate it on test set (again, try 50\50 of pos\neg examples in test set)
If results not fine:
a) Try different model?
b) Get more data
For case #b, when deciding which additional data you need the rule of thumb which works for me nicely would be:
1) If classifier gives lots of false positive (tells that this is a sunflower when it is actually not a sunflower at all) - get more negative examples
2) If classifier gives lots of false negative (tells that this is not a sunflower when it is actually a sunflower) - get more positive examples
Generally, start with some reasonable amount of data, check the results, if results on train set or test set are bad - get more data. Stop getting more data when you get the optimal results.
And another thing you need to consider, is if your results with current data and current classifier are not good you need to understand if the problem is high bias (well, bad results on train set and test set) or if it is a high variance problem (nice results on train set but bad results on test set). If you have high bias problem - more data or more powerful classifier will definitely help. If you have a high variance problem - more powerful classifier is not needed and you need to thing about the generalization - introduce regularization, remove couple of layers from your ANN maybe. Also possible way of fighting high variance is geting much, MUCH more data.
So to sum up, you need to use iterative approach and try to increase the amount of data step by step, until you get good results. There is no magic stick classifier and there is no simple answer on how much data you should use.
It is a good idea to use CNN as the feature extractor, peel off the original fully connected layer that was used for classification and add a new classifier. This is also known as the transfer learning technique that has being widely used in the Deep Learning research community. For your problem, using the one-class SVM as the added classifier is a good choice.
Specifically,
a good CNN feature extractor can be trained on a large dataset, e.g. ImageNet,
the one-class SVM can then be trained using your 'sunflower' dataset.
The essential part of solving your problem is the implementation of the one-class SVM, which is also known as anomaly detection or novelty detection. You may refer http://scikit-learn.org/stable/modules/outlier_detection.html for some insights about the method.

Resources