Pre-Trained model to extract the feature of the images tensorflow? - image-processing

Could someone please provide details of model available to extract the feature of images model for tensorflow or Keras? I have been looking for pre-trained models that will extract the features of the image. And then I will create a vector of the images then apply the nearest neighbor to find out similar images.

Any ordinary pre-trained classification model like vgg or resNet will extract different features of the image on each layer. While the earlier layers will respond to more basic and simple features like edges, the deeper layers will respond to more specific features. If you want to have specific features extracted from images, you have to label some data and train your model with that dataset.
For that, you can use the first couple of layers from a pre-trained model as an encoder.
But I would guess a CNN only solution will get you better results. Here is a nice read about the subject: https://arxiv.org/ftp/arxiv/papers/1709/1709.08761.pdf
Keras actually includes some applications with pre-trained weights, including vgg16: https://github.com/fchollet/keras/blob/master/keras/applications/vgg16.py
There you can find the link to the weights for this vgg16 model (pre-trained on imageNet):
https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

Related

How to extract weights from trained tensorflow object detection api model

I am using the Tensorflow Object Detection API to train a couple of models (with SSD and Faster RCNN) in a custom dataset. Everything works well, but I want to know how to extract the convolutional and classification model weights, in order to load those weights in an external (for instance keras) convolutional and full connected corresponding model. I've read about the meta architectures (SSDMetaArch and FasterRCNNMetaArch) and restoring checkpoint, but I am not sure yet how to do it for my purpose.
The above because I want to use something like CAM or GradCAM to visually check what the model learns for every class in my dataset.
Thank you

why the input size is varies in pretrained models using keras?

the pretrained models such as vgg16, inception v3, mobilenet, resnet152 and so.
please give some knowledge about this.
why this input size differ from one model to another?
vgg16 299*299
resnet 224*224
inception v3 299*299
mobilenet 224*224
All of these models are implementations of particular scientific papers, which all used different input sizes. Some models use the published weights, meaning that if you want to use these weights to reproduce their results, then you have to use the same input size.
But note that this applies only if you use the pretrained weights from the ImageNet dataset, if you want to train these models from scratch (random initialization), then you can specify a different input_shape without any issue, just respecting some constraints due to the depth of the model.

When should you use pretrained weights when training deep learning models?

I am interested in training a range of image and object detection models and I am wondering what the general rule of when to use pretrained weights of a network like VGG16 is.
For example, it seems obvious that fine-tuning pre-trained VGG16 imagenet model weights is helpful you are looking for a subset ie. Cats and Dogs.
However it seems less clear to me whether using these pretrained weights is a good idea if you are training an image classifier with 300 classes with only some of them being subsets of the classes in the pretrained model.
What is the intuition around this?
Lower layers learn features that are not necessarily specific to your application/dataset: corners, edges , simple shapes, etc. So it does not matter if your data is strictly a subset of the categories that the original network can predict.
Depending on how much data you have available for training, and how similar the data is to the one used in the pretrained network, you can decide to freeze the lower layers and learn only the higher ones, or simply train a classifier on top of your pretrained network.
Check here for a more detailed answer

Image classification, narrow domain with custom labels

Let's suppose I would like to classify motorbikes by model.
there are couple of hundreds models of motorbikes I'm interested in.
I do have tens, sometimes hundreds of pictures of each motorbike model.
Can you please point me to the practical example that demonstrates how to train model on your data and then use it to classify images? It needs to be a deep learning model, not simple logistic regression.
I'm not sure about it, but it seems like I can't use pre-trained neural net because it has been trained on wide range of objects like cat, human, cars etc. They may be not too good at distinguishing the motorbike nuances I'm interested in.
I found couple of such examples (tensorflow has one), but sadly, all of them were using pre-trained model. None of it had example how to train it on your own dataset.
In cases like yours you either use transfer learning or fine tuning. If you have more then thousand images of motorbikes I would use fine tuning and if you have less transfer learning.
Fine tuning is using a pre trained model and using a different classifier part. Then the new classifier part maybe the last 1-2 layers of the trained model are trained to your dataset.
Transfer learning means using a pre trained model and letting it output features for an input image. Now you use a new classifier based on those features. Maybe a SVM or a logistic regression.
An example for this can be seen here: https://github.com/cpra/dlvc2016/blob/master/lectures/lecture10.pdf. slide 33.
This paper Quick, Draw! Doodle Recognition from a kaggle challenge may be similar enough to what you are doing. The code is on github. You may need some data augmentation if you only have a few hundred images for each category.
What you want is pretty EZ. Follow the darknet YOLO implementation
Instruction: https://pjreddie.com/darknet/yolo/
Code https://github.com/pjreddie/darknet
Training YOLO on COCO
You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.
Get The COCO Data
To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:
cp scripts/get_coco_dataset.sh data
cd data
bash get_coco_dataset.sh
Add your data inside and make sure it is same as testing samples.
Now you should have all the data and the labels generated for Darknet.
Then call training script with the pre-trained weight.
Keep in mind that only training on your motorcycle may not result in good estimation. There would be biased result coming out, I red it somewhere b4.
The rest is all inside the link. Good luck

Image similarity detection with TensorFlow

Recently I started to play with tensorflow, while trying to learn the popular algorithms i am in a situation where i need to find similarity between images.
Image A is supplied to the system by me, and userx supplies an image B and the system should retrieve image A to the userx if image B is similar(color and class).
Now i have got few questions:
Do we consider this scenario to be supervised learning? I am asking
because i don't see it as a classification problem(confused!!)
What algorithms i should use to train etc..
Re-training should be done quite often, how should i tackle this
problem so i don't train everytime from scratch( fine-tuning??)
Do we consider this scenario to be supervised learning?
It is supervised learning when you have labels to optimize your model. So for most neural networks, it is supervised.
However, you might also look at the complete task. I guess you don't have any ground truth for image pairs and the "desired" similarity value your model should output?
One way to solve this problem which sounds inherently unsupervised is to take a CNN (convolutional neural network) trained (in a supervised way) on the 1000 classes of image net. To get the similarity of two images, you could then simply take the euclidean distance of the output probability distribution. This will not lead to excellent results, but is probably a good starter.
What algorithms i should use to train etc..
First, you should define what "similar" means for you. Are two images similar when they contain the same object (classes)? Are they similar if the general color of the image is the same?
For example, how similar are the following 3 pairs of images?
Have a look at FaceNet and search for "Content based image retrieval" (CBIR):
Wikipedia
Google Scholar
This can be a supervised learning. You can classify the images into categories, if two images are in the same categories (or close in a category), you can think of them as similar.
You can use the deep conventional neural networks for imagenet such as inception model. The inception model outputs a probability map for 1000 classes (which is a vector whose values sum to 1). You can calculate the distance of vectors of two images to get their similarity.
On the same page of the inception model, you will also find the instructions to retrain a model: https://github.com/tensorflow/models/tree/master/inception#how-to-fine-tune-a-pre-trained-model-on-a-new-task

Resources