How to extract weights from trained tensorflow object detection api model - object-detection-api

I am using the Tensorflow Object Detection API to train a couple of models (with SSD and Faster RCNN) in a custom dataset. Everything works well, but I want to know how to extract the convolutional and classification model weights, in order to load those weights in an external (for instance keras) convolutional and full connected corresponding model. I've read about the meta architectures (SSDMetaArch and FasterRCNNMetaArch) and restoring checkpoint, but I am not sure yet how to do it for my purpose.
The above because I want to use something like CAM or GradCAM to visually check what the model learns for every class in my dataset.
Thank you

Related

Is there a way to finetune yolo_v4 with transfer learning toolkit v3.0?

I am quite new to nvidia-tlt. Currently, I have trained, pruned and retrained the model with the kitti dataset, also am able to do these steps on any datasets with the required kitti format. What I want to do is used a previously trained model on kitti and fine tune it to my own data. The config file have the options pretrained_model_path, resume_model_path and pruned_model_path, So there is no option for the fine-tune in config. If I try to use pretrained_model_path, it throws an exception for the shape.
Invalid argument: Incompatible shapes: [6,29484,3] vs. [6,29484,12]
That error is expected.
Technically the pretrained model that we download from ngc comes without final layer which represents the total number of classes and their respective bboxes.
Once you train that model with any dataset, then the trained model will be frozen with the top layer. Now, if you want to finetune the same model with different number of classes you will get error related to invalid shapes.
You need to train the model on the new dataset from the beginning.
If you want to finetune the model with different dataset but of the same classes then you can use the previously trained model.

MobileNet Pre-Trained Model - Classification

I am currently working with a pre-trained MobileNet model that classifies images from a set of 1000 categories. For the purpose of my IOS application, I only need it to recognize/classify one type of object in the scene. How can I train the model so that it only classifies the one object I need but does it extremely well?
I am new to machine learning and unfamiliar with transfer learning techniques. Would doing this type of training reduce the model size and make it more efficient at recognizing the one object I need? If yes, what are resources that teach me how to keep training this pre-trained model for my objective.
Briefly, you want to turn your 1000-way classifier to a binary classifier.
The answer below assumes you have access to the original data, and that you know how to train the original model (that is, you have access to the training script). Here goes:
Assuming you're only interested in a single category C, you want to first map all instances (x, C) of the data to (x, 1) and all other instances (x, not_C) to (x, 0), then train a model on the resulting data (or, continue training the pre-trained model, if the training script also accepts a starting point for the model).
The model would then lose the ability to discern between non-C classes, and hopefully become better at discriminating C vs non-C instances.
Note: A less hacky approach would be to actually restrict the model to output only 0 or 1 and change the objective to a binary softmax. However, that would require some manipulation of the model's architecture, which you can do without.

Pre-Trained model to extract the feature of the images tensorflow?

Could someone please provide details of model available to extract the feature of images model for tensorflow or Keras? I have been looking for pre-trained models that will extract the features of the image. And then I will create a vector of the images then apply the nearest neighbor to find out similar images.
Any ordinary pre-trained classification model like vgg or resNet will extract different features of the image on each layer. While the earlier layers will respond to more basic and simple features like edges, the deeper layers will respond to more specific features. If you want to have specific features extracted from images, you have to label some data and train your model with that dataset.
For that, you can use the first couple of layers from a pre-trained model as an encoder.
But I would guess a CNN only solution will get you better results. Here is a nice read about the subject: https://arxiv.org/ftp/arxiv/papers/1709/1709.08761.pdf
Keras actually includes some applications with pre-trained weights, including vgg16: https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py
There you can find the link to the weights for this vgg16 model (pre-trained on imageNet):
https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

Do I still need to load word2vec model at model testing?

This may sound like a naive question, but i am quite new on this. Let's say I use the Google pre-trained word2vector model (https://github.com/dav/word2vec) to train a classification model. I save my classification model. Now I load back the classification model into memory for testing new instances. Do I need to load the Google word2vector model again? Or is it only used for training my model?
It depends on how your corpuses and test examples are structured and pre-processed.
You are probably using the pre-trained word-vectors to turn text into numerical features. At first, text examples are vectorized to train the classifier. Later, other (test/production) text examples will be vectorized in the same, and presented to get the classifier to get its judgements.
So you will need to use the same text-to-vectors process for test/production text examples as was used during training. Perhaps you've done that in a separate earlier bulk step, in which case you already have the features in the vector form the classifier uses. But often your classifier pipeline will itself take raw text, and vectorize it – in which case it will need the same pre-trained (word)->(vector) mappings available at test time as were available during training.

Image classification, narrow domain with custom labels

Let's suppose I would like to classify motorbikes by model.
there are couple of hundreds models of motorbikes I'm interested in.
I do have tens, sometimes hundreds of pictures of each motorbike model.
Can you please point me to the practical example that demonstrates how to train model on your data and then use it to classify images? It needs to be a deep learning model, not simple logistic regression.
I'm not sure about it, but it seems like I can't use pre-trained neural net because it has been trained on wide range of objects like cat, human, cars etc. They may be not too good at distinguishing the motorbike nuances I'm interested in.
I found couple of such examples (tensorflow has one), but sadly, all of them were using pre-trained model. None of it had example how to train it on your own dataset.
In cases like yours you either use transfer learning or fine tuning. If you have more then thousand images of motorbikes I would use fine tuning and if you have less transfer learning.
Fine tuning is using a pre trained model and using a different classifier part. Then the new classifier part maybe the last 1-2 layers of the trained model are trained to your dataset.
Transfer learning means using a pre trained model and letting it output features for an input image. Now you use a new classifier based on those features. Maybe a SVM or a logistic regression.
An example for this can be seen here: https://github.com/cpra/dlvc2016/blob/master/lectures/lecture10.pdf. slide 33.
This paper Quick, Draw! Doodle Recognition from a kaggle challenge may be similar enough to what you are doing. The code is on github. You may need some data augmentation if you only have a few hundred images for each category.
What you want is pretty EZ. Follow the darknet YOLO implementation
Instruction: https://pjreddie.com/darknet/yolo/
Code https://github.com/pjreddie/darknet
Training YOLO on COCO
You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.
Get The COCO Data
To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:
cp scripts/get_coco_dataset.sh data
cd data
bash get_coco_dataset.sh
Add your data inside and make sure it is same as testing samples.
Now you should have all the data and the labels generated for Darknet.
Then call training script with the pre-trained weight.
Keep in mind that only training on your motorcycle may not result in good estimation. There would be biased result coming out, I red it somewhere b4.
The rest is all inside the link. Good luck

Resources