Feeding image features to tensorflow for training - machine-learning

Is it possible to feed image features, say SIFT features, to a convolutional neural network model in Tensorflow? I am trying a tensorflow implementation of this project in which a grayscale image is coloured. Will image features be a better choice than feeding the images as is to the model?
PS. I am a novice to machine learning and is not familiar with creating neural n/w models

You can feed tensorflow neural net almost anything.
If you have extra features for each pixel, then instead of using one channel (intensity) you would use multiple channels.
If you have extra features, which are about whole image, you can make separate input a merge features at some upper layer.
As for the better performance, you should try both approaches.
General intuition is that, extra features help if you don't have many samples and their effect is diminishing if you have many samples and network can learn features by itself.
Also one more point: If you are novice, I strongly recommend using higher level framework like keras.io (which is layer over tensorflow) instead of tensorflow.

Related

Relation between CNN and gabor filter

I am learning to use gabor filters to extract orientation and scale related features from images. On the other hand, Convolution Neural Network can also extract features including orientation and scale, is there any evidence that filters in CNN performs a similar function as gabor filters? Or pros and cons of both of them.
In my personal experience, in a traditional deep learning architecture (such as AlexNet) , when the layers near the beginning are visualized, they resemble gabor filters a lot.
Take this visualization of the first two layers of a pretrained AlexNet (taken from Andrej Karpathy's cs231n.github.io ). Some of the learnt filters look exactly like the Gabor Filters. So yes, there is evidence that CNN works (partly) the same way as Gabor Filters.
One possible explanation is that since the layers towards the beginning of a deep CNN are used to extract low level features (such as changes in texture), they perform the same functions as Gabor Filters. Features such as those detecting changes in frequency are so fundamental that they are present irrespective of the type of dataset the model is trained on. (Part of the reason why transfer learning is possible) .
But if you have more more data, you could possibly make a deep CNN learn much more high-level features than Gabor Filters, which might be more useful for the task you're extracting these features for (such as classification). I hope this provides some clarification.

Affinity Propagation for Image Clustering

The link here describes a method for image classification using affinity propagation. I'm confused as to how they got the feature vectors, i.e, the data structure of the images, e.g, arrays?
Additionally, how would I accomplish this given that I can't use Places365 as it's custom data (audio spectrograms)?
Finally, how would I plot the images as they've done in the diagram?
The images are passed through a neural network. The activations of neural network layer for an image is the feature vector. See https://keras.io/applications/ for examples.
Spectrograms can be treated like images.
Sometimes even when domain is very different, the neural network features can extract useful information that can help you with clustering/classification tasks.

Inception V3 Image Classification

How can I understand what features is the Google Inception V3 model using to classify a set of images, what features or pixels of the images are more significant for classifying them?
For instance, if the classifier were to distinguish between a Cheetah and a Leopard, it would probably do so by judging based on their spots. How can I determine what aspects of my images the classifier values most?
Your question is not easily answerable, Neural nets in general compose of hierarchical features where in the initial layers the neural net may learn to detect edges and blobs and in the deeper layers it learn more abstract features, so in a n class classification problems, where n might be a large number it is notoriously difficult to interpret what exactly the network learns and uses to classify images. Having said that Obviously work has been done,But i will refer you to https://distill.pub/2017/feature-visualization/, this should help you a bit

Using Caffe to classify "hand-crafted" image features

Does it make any sense to perform feature extraction on images using, e.g., OpenCV, then use Caffe for classification of those features?
I am asking this as opposed to the traditional way of passing the images directly to Caffe, and letting Caffe do the extraction and classification procedures.
Yes, it does make sense, but it may not be the first thing you want to try:
If you have already extracted hand-crafted features that are suitable for your domain, there is a good chance you'll get satisfactory results by using an easier-to-use machine learning tool (e.g. libsvm).
Caffe can be used in many different ways with your features. If they are low-level features (e.g. Histogram of Gradients), then several convolutional layers may be able to extract the appropriate mid-level features for your problem. You may also use caffe as an alternative non-linear classifier (instead of SVM). You have the freedom to try (too) many things, but my advice is to first try a machine learning method with a smaller meta-parameter space, especially if you're new to neural nets and caffe.
Caffe is a tool for training and evaluating deep neural networks. It is quite a versatile tool allowing for both deep convolutional nets as well as other architectures.
Of course it can be used to process pre-computed image features.

Text detection on images

I am a very new student on machine learning. I just wanted to ask what are possible ways to improve a method (Naive Bayes for example) to get better results classifying images into text or non-text images, instead of just inputing a x number of images and telling the system which have text and which do not?
Thanks in advance
The state of the art in such problems are deep neural networks with several convolutional layers. See this article for an example of image classification using deep convolutional nets. Your problem (just determining if an image has text or not) is much easier than the general image classification problem the authors consider, so you'd probably get away with using a much simpler network architecture.
Nowadays you don't need to implement these things yourself, there are efficient and GPU-accelerated implementations freely available, for instance Caffe, Torch7, keras...

Resources