I am confusing between SSD and mobilenet. As far as I know, both of them are neural network. SSD provides localization while mobilenet provides classification. Thus the combination of SSD and mobilenet can produce the object detection. The image is taken from SSD paper. The default classification network of SSD is VGG-16. So, for SSD Mobilenet, VGG-16 is replaced with mobilenet. Are my statements correct?
Where can I get more information about SSD Mobilenet especially that one available on Tensorflow model zoo?
SSD - single shot detector - is a NN architecture designed for detection purposes - which means localization(bounding boxes) and classification at once.
Mobilenet- (https://arxiv.org/abs/1704.04861) - efficient architecture introduced by Google (using depthwise and pointwise convolutions). It can be used for classification purposes, or as a feature extractor for other (i.e. detection).
In the SSD paper they present the use of VGG NN as the feature extractor for the detection, the features maps are being taken from several different layers (resolutions) and being fed to their corresponding classification and localization layers (Classification head and Regression head).
So actually, one can decide to use a different kind of feature extractor - like MobileNet-SSD - which means you use SSD arch. while your feature extractor is mobilenet arch.
By reading the SSD paper, and the mobilenet paper you would be able to understand the model exist in the model zoo.
There are two types of deep neural networks, Base network, and detection network. MobileNet, VGG-Net, LeNet are base networks.
The base network provides high-level features for classification or detection. If you use an entirely connected layer at the end of these networks, you have a classification. But you can remove a fully connected layer and replace it with detection networks, like SSD, Faster R-CNN, and so on. In general, SSD use of last convolutional layer on base networks for the detection task. MobileNet just like other base networks uses of convolution to produce high-level features.
Related
Can I use CNN architecture for binary classification on these types of images (posted below)?
Currently, I am it having 3-Conv + 2-FC layers but not getting good results. I have a sufficient amount of data as well. I tried transfer learning with Inception V3 but it is overfitting in all cases of layer locking.
Is there any separate way of classifying such images because the features to be extracted are limited here.
Semantic Segmentation converts images into a kind of pixel-wise color maps but it is total different paradigm.
Deep learning is famous for classifying images into different categories. However, I am interested to use any other machine learning model which is capable of classifying the images. The images are about 2000 and are in png format. Does anybody know any machine learning model which can be applied in python to classify images other than Deep learning models.
You can take a look to SVMs (scikit-learn). I can advise you to extract features from images first, with SIFT or SURF for example.
EDIT: SIFT and SURF are using the principle of convolution, but it exists plenty of other feature descriptors.
How can I understand what features is the Google Inception V3 model using to classify a set of images, what features or pixels of the images are more significant for classifying them?
For instance, if the classifier were to distinguish between a Cheetah and a Leopard, it would probably do so by judging based on their spots. How can I determine what aspects of my images the classifier values most?
Your question is not easily answerable, Neural nets in general compose of hierarchical features where in the initial layers the neural net may learn to detect edges and blobs and in the deeper layers it learn more abstract features, so in a n class classification problems, where n might be a large number it is notoriously difficult to interpret what exactly the network learns and uses to classify images. Having said that Obviously work has been done,But i will refer you to https://distill.pub/2017/feature-visualization/, this should help you a bit
Is it possible to feed image features, say SIFT features, to a convolutional neural network model in Tensorflow? I am trying a tensorflow implementation of this project in which a grayscale image is coloured. Will image features be a better choice than feeding the images as is to the model?
PS. I am a novice to machine learning and is not familiar with creating neural n/w models
You can feed tensorflow neural net almost anything.
If you have extra features for each pixel, then instead of using one channel (intensity) you would use multiple channels.
If you have extra features, which are about whole image, you can make separate input a merge features at some upper layer.
As for the better performance, you should try both approaches.
General intuition is that, extra features help if you don't have many samples and their effect is diminishing if you have many samples and network can learn features by itself.
Also one more point: If you are novice, I strongly recommend using higher level framework like keras.io (which is layer over tensorflow) instead of tensorflow.
Does it make any sense to perform feature extraction on images using, e.g., OpenCV, then use Caffe for classification of those features?
I am asking this as opposed to the traditional way of passing the images directly to Caffe, and letting Caffe do the extraction and classification procedures.
Yes, it does make sense, but it may not be the first thing you want to try:
If you have already extracted hand-crafted features that are suitable for your domain, there is a good chance you'll get satisfactory results by using an easier-to-use machine learning tool (e.g. libsvm).
Caffe can be used in many different ways with your features. If they are low-level features (e.g. Histogram of Gradients), then several convolutional layers may be able to extract the appropriate mid-level features for your problem. You may also use caffe as an alternative non-linear classifier (instead of SVM). You have the freedom to try (too) many things, but my advice is to first try a machine learning method with a smaller meta-parameter space, especially if you're new to neural nets and caffe.
Caffe is a tool for training and evaluating deep neural networks. It is quite a versatile tool allowing for both deep convolutional nets as well as other architectures.
Of course it can be used to process pre-computed image features.