I am working on a project that deals with classifying images based only on the shape obtained (binary image) after background subtraction. I want to extract shape context descriptors from the two classes and train an SVM classifier.
How can I extract shape context descriptors ? Please tell me if there is any implementation or implementation guide to extract shape context descriptors for training SVM.
These links might help you find code for shape context: (1) and (2).
This tutorial is quite clear on how to use OpenCV's implementation of SVM for classification.
Take a look here http://answers.opencv.org/question/1668/shape-context-implementation-in-opencv/
Hope this helps,
Alex
Digits classification can be done based on shapes, using HOG features. The link below contains Matlab code for this :
http://in.mathworks.com/help/vision/examples/digit-classification-using-hog-features.html
Related
I am new to deep learning. I was trying to understand the basics of image classification and followed some tutorials on MNIST data set classification. I saw various standard models used 224,224 as the image size. I got stuck at a point when it comes to image size which can be used for classification. Is it possible to use images as small as 4x4 to perform classification task in deep learning using ANN or other techniques? or is there any lower image dimensions limit which must be followed strictly? please guide me on this. Thanks in advance.
I am quite new in the deep learning game, I was wondering why do we flatten the last layer of the encoder in a VAE and then give the flattened output to a linear layer, which then approximates a location and scale parameter for the prior? Can't we just split the output of a convolutional layer and get the location and scale from here directly, or do the spatial information captured by a convolution mess up the scale and location?
Thanks a lot!
Why do we flatten the last layer of the encoder in a VAE?
There isn't really a good reason other than to make it convenient for printing or reporting. If right before flattening the encoder is of shape [BatchSize,2,2,32] , flattening it to [BatchSize,128] just makes it handy to just list all 128 encoded values per sample. When the decoder then reshapes it to [BatchSize,2,2,32] all the spacial information is put back where it was. No spacial information was lost.
Of course, one may decide to use the encoder of a trained VAE as an image feature extractor. This is actually very useful when we have a LOT of unlabeled images to train a VAE with, but only a few labeled images. After training the VAE on the large unlabeled image set, the encoder effectively becomes a feature extractor. We can then feed the feature extractor into a dense layer whos purpose is to learn the labels. Having the encoder output a flattened data set is very useful in this situation.
I have a Deep learning model ( transfer learning based in keras) to do regression problem on medical images. Does it help or have any logical idea or doing some image enhancements like strengthening the edges or doing histogram equalization before feeding the inputs to the CNN?
It is possible to train model accurately by using something you told.
For training CNN model with data, they almost use image augmentation in pre-processing phase.
There are list usually used in augmentation.
color noise
transform
rotate
whitening
affine
crop
flip
etc...
You can refer to here
I am working on Convolution Neural Network using satellite images. I want to try Multi-scale problem. Can you please suggest me how can I make the multi-scale dataset. As the input of the CNN is fixed image is fixed (e.g. 100x100)
how can the images of different scale to train the system for multi-scale problem.
There is a similar question about YOLO9000:Multi-Scale Training?
Since there are only convolutional and pooling layers, so when you input multi-scale image, the weight parameter amount is same. Thus, multi-scale images can use a CNN model to train.
In different tasks, the methods are different. for example, in classification task, we can we can add a global pooling after the last layer; in detection task, the output is not fixed if we input multi-scale images.
I am interested in the possibility of training a TensorFlow model to modify images, but I'm not quite sure where to get started. Almost all of the examples/tutorials dealing with images are for image classification, but I think I am looking for something a little different.
Image classification training data typically includes the images plus a corresponding set of classification labels, but I am thinking of a case of an image plus a "to-be" version of the image as the "label". Is this possible? Is it really just a classification problem in disguise?
Any help on where to get started would be appreciated. Also, the solution does not have to use TensorFlow, so any suggestions on alternate machine learning libraries would also be appreciated.
For example, lets say we want to train TensorFlow to draw circles around objects in a picture.
Example Inbound Image:
(source: pbrd.co)
Label/Expected Output:
(source: pbrd.co)
How could I accomplish that?
I can second that, its really hard to find information about Image modification with tensorflow :( But have a look here: https://affinelayer.com/pix2pix/
From my understanding, you do use a GAN, but insead of feeding the Input of the generator with random data during training, you use a sample Input.
Two popular ways (the ones that I know about) to make models generate/edit images are:
Deep Convolutional Generative Adversarial Networks
Back-Propagation through a pre-trained image classification model (in a similar manner to deep dream) but you can start from the final layer to feed back the wanted label and the gradient descent should be applied to the image only. This was explained in more details in the following course: CS231n (this lecture)
But I don't think they fit the circle around "3" example that you gave. I think object detection and instance segmentation would be more helpful. Detect the object you are looking for, extract its boundaries via segmentation and post-process it to make the circle that you wish for (or any other shape).
Reference for the images: Intro to Deep Learning for Computer Vision