loading even layers of pre-trained BERT for classification - machine-learning

I'm using the transformers library of HuggingFace. As far as I know changing the number of hidden layers in the config file leads to loading the first x layers of the pre-trained BERT. I want to load the even layers (or the last x layers) of the pre-trained BERT and then fine-tune them for a classification task.
An example for classification tasks can be found here : run_glue.py
Thanks in advance

Related

Pre-Trained model to extract the feature of the images tensorflow?

Could someone please provide details of model available to extract the feature of images model for tensorflow or Keras? I have been looking for pre-trained models that will extract the features of the image. And then I will create a vector of the images then apply the nearest neighbor to find out similar images.
Any ordinary pre-trained classification model like vgg or resNet will extract different features of the image on each layer. While the earlier layers will respond to more basic and simple features like edges, the deeper layers will respond to more specific features. If you want to have specific features extracted from images, you have to label some data and train your model with that dataset.
For that, you can use the first couple of layers from a pre-trained model as an encoder.
But I would guess a CNN only solution will get you better results. Here is a nice read about the subject: https://arxiv.org/ftp/arxiv/papers/1709/1709.08761.pdf
Keras actually includes some applications with pre-trained weights, including vgg16: https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py
There you can find the link to the weights for this vgg16 model (pre-trained on imageNet):
https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

When should you use pretrained weights when training deep learning models?

I am interested in training a range of image and object detection models and I am wondering what the general rule of when to use pretrained weights of a network like VGG16 is.
For example, it seems obvious that fine-tuning pre-trained VGG16 imagenet model weights is helpful you are looking for a subset ie. Cats and Dogs.
However it seems less clear to me whether using these pretrained weights is a good idea if you are training an image classifier with 300 classes with only some of them being subsets of the classes in the pretrained model.
What is the intuition around this?
Lower layers learn features that are not necessarily specific to your application/dataset: corners, edges , simple shapes, etc. So it does not matter if your data is strictly a subset of the categories that the original network can predict.
Depending on how much data you have available for training, and how similar the data is to the one used in the pretrained network, you can decide to freeze the lower layers and learn only the higher ones, or simply train a classifier on top of your pretrained network.
Check here for a more detailed answer

how to load reference .caffemodel for training

I'm using alexnet to train my own dataset.
The example code in caffe comes with
bvlc_reference_caffenet.caffemodel
solver.prototxt
train_val.prototxt
deploy.prototxt
When I train with the following command:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
I'd like to start with weights given in bvlc_reference.caffenet.caffemodel.
My questions are
How do I do that?
Is it a good idea to start from the those weights? Would this converge faster? Would this be bad if my data are vastly different from the Imagenet dataset?
1.
In order to use existing .caffemodel weights for fine-tuning, you need to use --weights command line argument:
./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --weights=models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
2.
In most cases fine-tuning a net is quite a recommended practice, even when the input images are quite different than "imagenet" photos.
However, you should note that when training for the original weights you are about to use, some (very reasonable) assumptions were made. You should decide whether these assumptions are still true for your task.
For instance, most nets were trained with simple data augmentation using an image and its horizontal flip. However, if your task is to distinguish between images that are flipped you will find it very difficult to fine tune.

Fine Tuning of GoogLeNet Model

I trained GoogLeNet model from scratch. But it didn't give me the promising results.
As an alternative, I would like to do fine tuning of GoogLeNet model on my dataset. Does anyone know what are the steps should I follow?
Assuming you are trying to do image classification. These should be the steps for finetuning a model:
1. Classification layer
The original classification layer "loss3/classifier" outputs predictions for 1000 classes (it's mum_output is set to 1000). You'll need to replace it with a new layer with appropriate num_output. Replacing the classification layer:
Change layer's name (so that when you read the original weights from caffemodel file there will be no conflict with the weights of this layer).
Change num_output to the right number of output classes you are trying to predict.
Note that you need to change ALL classification layers. Usually there is only one, but GoogLeNet happens to have three: "loss1/classifier", "loss2/classifier" and "loss3/classifier".
2. Data
You need to make a new training dataset with the new labels you want to fine tune to. See, for example, this post on how to make an lmdb dataset.
3. How extensive a finetuning you want?
When finetuning a model, you can train ALL model's weights or choose to fix some weights (usually filters of the lower/deeper layers) and train only the weights of the top-most layers. This choice is up to you and it ususally depends on the amount of training data available (the more examples you have the more weights you can afford to finetune).
Each layer (that holds trainable parameters) has param { lr_mult: XX }. This coefficient determines how susceptible these weights to SGD updates. Setting param { lr_mult: 0 } means you FIX the weights of this layer and they will not be changed during the training process.
Edit your train_val.prototxt accordingly.
4. Run caffe
Run caffe train but supply it with caffemodel weights as an initial weights:
~$ $CAFFE_ROOT/build/tools/caffe train -solver /path/to/solver.ptototxt -weights /path/to/orig_googlenet_weights.caffemodel
Fine-tuning is a very useful trick to achieve a promising accuracy compared to past manual feature. #Shai already posted a good tutorial for fine-tuning the Googlenet using Caffe, so I just want to give some recommends and tricks for fine-tuning for general cases.
In most of time, we face a task classification problem that new dataset (e.g. Oxford 102 flower dataset or Cat&Dog) has following four common situations CS231n:
New dataset is small and similar to original dataset.
New dataset is small but is different to original dataset (Most common cases)
New dataset is large and similar to original dataset.
New dataset is large but is different to original dataset.
In practice, most of time we do not have enough data to train the network from scratch, but may be enough for pre-trained model. Whatever which cases I mentions above only thing we must care about is that do we have enough data to train the CNN?
If yes, we can train the CNN from scratch. However, in practice it is still beneficial to initialize the weight from pre-trained model.
If no, we need to check whether data is very different from original datasets? If it is very similar, we can just fine-tune the fully connected neural network or fine-tune with SVM. However, If it is very different from original dataset, we may need to fine-tune the convolutional neural network to improve the generalization.

How to train a caffemodel on our own dataset?

I tried using pre-trained bvlc_reference_caffenet.caffemodel for object recognition from images. I got good results for images containing only a single object. For images with multiple objects, I removed the argmax() term from prediction which gives the class label with the maximum probability.
Still, the accuracy is very less for the labels which I am getting. So, I am thinking of training the same caffemodel on my own dataset (containing images with multiple objects). How should I proceed? Is there any way to retrain a pre-trained caffemodel with the different dataset?
What you are after is called "finetuning": taking a deep net trained for task A, reusing its weights and re-train it to accomplish task B.
You can start with this tutorial, but you will find much more information simply by googling "finetune caffe model".
You may also be interested in this post regarding training caffe with mutiple categories per input image.

Resources