Inception V3 Image Classification - machine-learning

How can I understand what features is the Google Inception V3 model using to classify a set of images, what features or pixels of the images are more significant for classifying them?
For instance, if the classifier were to distinguish between a Cheetah and a Leopard, it would probably do so by judging based on their spots. How can I determine what aspects of my images the classifier values most?

Your question is not easily answerable, Neural nets in general compose of hierarchical features where in the initial layers the neural net may learn to detect edges and blobs and in the deeper layers it learn more abstract features, so in a n class classification problems, where n might be a large number it is notoriously difficult to interpret what exactly the network learns and uses to classify images. Having said that Obviously work has been done,But i will refer you to https://distill.pub/2017/feature-visualization/, this should help you a bit

Related

Feeding image features to tensorflow for training

Is it possible to feed image features, say SIFT features, to a convolutional neural network model in Tensorflow? I am trying a tensorflow implementation of this project in which a grayscale image is coloured. Will image features be a better choice than feeding the images as is to the model?
PS. I am a novice to machine learning and is not familiar with creating neural n/w models
You can feed tensorflow neural net almost anything.
If you have extra features for each pixel, then instead of using one channel (intensity) you would use multiple channels.
If you have extra features, which are about whole image, you can make separate input a merge features at some upper layer.
As for the better performance, you should try both approaches.
General intuition is that, extra features help if you don't have many samples and their effect is diminishing if you have many samples and network can learn features by itself.
Also one more point: If you are novice, I strongly recommend using higher level framework like keras.io (which is layer over tensorflow) instead of tensorflow.

Using Caffe to classify "hand-crafted" image features

Does it make any sense to perform feature extraction on images using, e.g., OpenCV, then use Caffe for classification of those features?
I am asking this as opposed to the traditional way of passing the images directly to Caffe, and letting Caffe do the extraction and classification procedures.
Yes, it does make sense, but it may not be the first thing you want to try:
If you have already extracted hand-crafted features that are suitable for your domain, there is a good chance you'll get satisfactory results by using an easier-to-use machine learning tool (e.g. libsvm).
Caffe can be used in many different ways with your features. If they are low-level features (e.g. Histogram of Gradients), then several convolutional layers may be able to extract the appropriate mid-level features for your problem. You may also use caffe as an alternative non-linear classifier (instead of SVM). You have the freedom to try (too) many things, but my advice is to first try a machine learning method with a smaller meta-parameter space, especially if you're new to neural nets and caffe.
Caffe is a tool for training and evaluating deep neural networks. It is quite a versatile tool allowing for both deep convolutional nets as well as other architectures.
Of course it can be used to process pre-computed image features.

How to do machine learning when the inputs are of different sizes?

In standard cookbook machine learning, we operate on a rectangular matrix; that is, all of our data points have the same number of features. How do we cope with situations in which all of our data points have different numbers of features? For example, if we want to do visual classification but all of our pictures are of different dimensions, or if we want to do sentiment analysis but all of our sentences have different amounts of words, or if we want to do stellar classification but all of the stars have been observed a different number of times, etc.
I think the normal way would be to extract features of regular size from these irregularly sized data. But I attended a talk on deep learning recently where the speaker emphasized that instead of hand-crafting features from data, deep learners are able to learn the appropriate features themselves. But how do we use e.g. a neural network if the input layer is not of a fixed size?
Since you are asking about deep learning, I assume you are more interested in end-to-end systems, rather then feature design. Neural networks that can handle variable data inputs are:
1) Convolutional neural networks with pooling layers. They are usually used in image recognition context, but recently were applied to modeling sentences as well. ( I think they should also be good at classifiying stars ).
2) Recurrent neural networks. (Good for sequential data, like time series,sequence labeling tasks, also good for machine translation).
3) Tree-based autoencoders (also called recursive autoencoders) for data arranged in tree-like structures (can be applied to sentence parse trees)
Lot of papers describing example applications can readily be found by googling.
For uncommon tasks you can select one of these based on the structure of your data, or you can design some variants and combinations of these systems.
You can usually make the number of features the same for all instances quite easily:
if we want to do visual classification but all of our pictures are of different dimensions
Resize them all to a certain dimension / number of pixels.
if we want to do sentiment analysis but all of our sentences have different amounts of words
Keep a dictionary of the k words in all of your text data. Each instance will consist of a boolean vector of size k where the i-th entry is true if word i from the dictionary appears in that instance (this is not the best representation, but many are based on it). See the bag of words model.
if we want to do stellar classification but all of the stars have been observed a different number of times
Take the features that have been observed for all the stars.
But I attended a talk on deep learning recently where the speaker emphasized that instead of hand-crafting features from data deep learners are able to learn the appropriate features themselves.
I think the speaker probably referred to higher level features. For example, you shouldn't manually extract the feature "contains a nose" if you want to detect faces in an image. You should feed it the raw pixels, and the deep learner will learn the "contains a nose" feature somewhere in the deeper layers.

which classifier would be better for scanned images?

I am presently working with 500 of scanned images and have done feature extraction on them and I'm using the obtained offset values for classifying the images. I tried using k nearest neighbor classifier and want to know if i am proceeding in the right way?My main objective is to classify the images
Any help would be appreciated... Thank you
There are many way to classify images. Most of time I do it in 3 steps :
Preprocessing (denoising, deskewing, etc.)
Extracting features (colors, shapes, textures, SIFT, ...)
Doing classification (kNN, SVM, Neural Network, ...)
Depending on the content of your images (natural images, biological images, document images, ...), you have to chose the features and then depending on the features you have to chose your classifier.
Be aware that supervised classification need some examples in order to learn.
KNN is one of the solutions.
What's the goal of your classification projects? For example:Do you need very high accuracy or recall? Do you need supervised or unsupervised solution.
Once you setup your goal, then you can do some related feature engineering and choose the right algorithm.

OpenCV: Training a soft cascade classifier

I've built an algorithm for pedestrian detection using openCV tools. To perform classification I use a boosted classifier trained with the CvBoost class.
The problem of this implementation is that I need to feed my classifier the whole set of features I used for training. This makes the algorithm extremely slow, so much that each image takes around 20 seconds to be fully analysed.
I need a different detection structure, and openCV has this Soft Cascade class that seems like exactly what I need. Its basic principle is that there is no need to examine all the features of a testing sample, since a detector can reject most negative samples using a small number of features. The problem is that I have no idea how to train one given a fully labeled set of negative and positive examples.
I find no information about this online, so I am looking for any tips you can give me on how to use this soft cascade to make classification.
Best regards

Resources