I have been working on a project which would identify diseases from a leaf. I did search and worked out a few things. However some confusions remains.
I believe following should be the flow (suggestions required)
Crop diseased area (Manually) from leafs for building Vocabulary.
Use SIFT to get keypoints and descriptors
Create Bag of Words Vocabulary and Cluster (K means)
Train SVM from descriptors obtained above
To Evaluate/Classify Take input image of entire leaf and crop it to extract diseased area using HarCascade
Use SIFT to get keypoints and Descriptors and then use SVM to Predict.
Questions are
Is above workflow reasonable ? or i am missing something?
I am confused about how does SVM learns the Name of object or disease for example where does SVM get Name of object it learned or detected?
How does SVM outputs the Name of object it identified ?
Related
Recently I started to play with tensorflow, while trying to learn the popular algorithms i am in a situation where i need to find similarity between images.
Image A is supplied to the system by me, and userx supplies an image B and the system should retrieve image A to the userx if image B is similar(color and class).
Now i have got few questions:
Do we consider this scenario to be supervised learning? I am asking
because i don't see it as a classification problem(confused!!)
What algorithms i should use to train etc..
Re-training should be done quite often, how should i tackle this
problem so i don't train everytime from scratch( fine-tuning??)
Do we consider this scenario to be supervised learning?
It is supervised learning when you have labels to optimize your model. So for most neural networks, it is supervised.
However, you might also look at the complete task. I guess you don't have any ground truth for image pairs and the "desired" similarity value your model should output?
One way to solve this problem which sounds inherently unsupervised is to take a CNN (convolutional neural network) trained (in a supervised way) on the 1000 classes of image net. To get the similarity of two images, you could then simply take the euclidean distance of the output probability distribution. This will not lead to excellent results, but is probably a good starter.
What algorithms i should use to train etc..
First, you should define what "similar" means for you. Are two images similar when they contain the same object (classes)? Are they similar if the general color of the image is the same?
For example, how similar are the following 3 pairs of images?
Have a look at FaceNet and search for "Content based image retrieval" (CBIR):
Wikipedia
Google Scholar
This can be a supervised learning. You can classify the images into categories, if two images are in the same categories (or close in a category), you can think of them as similar.
You can use the deep conventional neural networks for imagenet such as inception model. The inception model outputs a probability map for 1000 classes (which is a vector whose values sum to 1). You can calculate the distance of vectors of two images to get their similarity.
On the same page of the inception model, you will also find the instructions to retrain a model: https://github.com/tensorflow/models/tree/master/inception#how-to-fine-tune-a-pre-trained-model-on-a-new-task
I have implemented the SIFT algorithm in OpenCV for feature detection and matching using the following steps:
Background Removal using Otsu's thresholding
Feature Detection using SIFT feature detector
Descriptor Extraction using SIFT feature extractor
Matching feature vectors using BFMatcher(L2 Norm) and using the ratio test to filter
good matches
My objective is to classify images into different categories such as shoes, shirts etc. based on their similarity. For example two different heels should be more similar to each other than a heel and a sports shoe or a heel and a t-shirt.
However this algorithm is working well only when my template image is present in the search image (in any scale and orientation). If I compare two different heels, they don't match well and the matches are also random(heel of one image matches to the flat surface of the other image). There are also many false positives when I compare a heel with a sports shoe or a heel with a t-shirt or a heel with the picture of a baby!
I would like to look at a heel and identify it as a heel and return how similar the heel is to different images in my database giving maximum similarity for other heels, then followed by other shoes. It should not produce any similarity with irrelevant objects such as shirts, phones, pens..
I understand that the SIFT algorithm produces a descriptor vector for each keypoint based on the gradient values of pixels around the keypoint and images are matched purely using this attribute. Hence it is highly possible that a keypoint located near the heel of one shoe is matched to a keypoint at the surface of the other shoe. Therefore, what I gather is that this algorithm can be used only to detect exact matches and not to detect similarity between images
Could you please tell me if this algorithm can be used for my objective and if I am doing something wrong or suggest any other approach that I should use.
For classification of similar objects, I certainly would go for cascade classifiers.
Basically, cascade classifiers is a machine learning method where you train your classifier to detect an object in different images. For it to work well, you need to train your classifier with a lot of positive (where your object is) and negative (where your object is not) images. The method was invented by Viola and Jones in 2001.
There is a ready-made implementation in OpenCV for face detection, you will have a bit more explanations on the openCV documentation (sorry, can't post the link, I'm limited to 1 link for the moment ..)
Now, for the caveats :
First, you need a lot of positive and negative images. The more images you have, the better the algorithm will perform. Beware of over-learning : if your training dataset for heels contains, for instance, too many images of a given model it is possible that others will not be detected properly
Training the cascade classifier can be long and difficult. The end-result will depend on how well you choose the parameters for training the classifier. Some info on this can be found on this webpage : http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
Well Hello everybody. I am doing a project that consist in dectect objects using kinect and svm and ann machine learning. I want if it is posible to give the names of library for svm and ann with graphical tool because I want only to train ann with that library and save in .xml then load .xml with opencv!!
SVM is a classifier used to classify samples based upon their feature vectors. So, your task is to convert the images into feature vectors which can be used by SVM for its training and testing.
Ok, to create feature vector from your images there are several possibilites and i am going to mention some very common technique:
A very easy method is to create normalized hue-histogram of your each image. Let's say, you have created hue-histogram with 5-bins. So, based upon your image color there will be some values in these 5 bins. Lets say the values look like this { 0.32 0.56 0 0 0.12 }. So, now this is your one input vector with 5 dimensions (i.e. number of bins). You have to do the same procedure for all training samples and then you will do it for test image too.
Extract some feature from your input samples (e.g. by using SIFT, SURf) and then create there descriptor using SIFT/SURF. And, then you can use these descriptors as the input to your SVM for training.
i have some doubts incase of bag of words based image classification, i will first of tell what i have done
i have extracted the features from the training image with two different categories using SURF method,
i have then made clustering of the features for the two categories.
in order to classify my test image (i.e) to which of the two category the test image belongs to. for this classifying purpose i am using SVM classifier, but here is what i have a doubt , how do we input the test image do we have to do the same step from 1 to 2 again and then use it as a test set or is there any other method to do,
also would be great to know the efficiency of the bow approach,
kindly some one provide me with an clarification
The classifier needs the representation for the test data to have the same meaning as the training data. So, when you're evaluating a test image, you extract the features and then make the histogram of which words from your original vocabulary they're closest to.
That is:
Extract features from your entire training set.
Cluster those features into a vocabulary V; you get K distinct cluster centers.
Encode each training image as a histogram of the number of times each vocabulary element shows up in the image. Each image is then represented by a length-K vector.
Train the classifier.
When given a test image, extract the features. Now represent the test image as a histogram of the number of times each cluster center from V was closest to a feature in the test image. This is a length K vector again.
It's also often helpful to discount the histograms by taking the square root of the entries. This approximates a more realistic model for image features.
I have a large set of plant images labeled with the botanical name. What would be the best algorithm to use to train on this dataset in order to classify an unlabel photo? The photos are processed so that 100% of the pixels contain the plant (e.g. either closeups of the leaves or bark), so there are no other objects/empty-space/background that the algorithm would have to filter out.
I've already tried generating SIFT features for all the photos and feeding these (feature,label) pairs to a LibLinear SVM, but the accuracy was a miserable 6%.
I also tried feeding this same data to a few Weka classifiers. The accuracy was a little better (25% with Logistic, 18% with IBk), but Weka's not designed for scalability (it loads everything into memory). Since the SIFT feature dataset is a several million rows, I could only test Weka with a random 3% slice, so it's probably not representative.
EDIT: Some sample images:
Normally, you would not train on the SIFT features directly. Cluster them (using k-means) and then train on the histogram of cluster membership identifiers (i.e., a k-dimensional vector, which counts, at position i, how many features were assigned to the i-th cluster).
This way, you obtain a single output per image (and a single, k-dimensional, feature vector).
Here's the quasi-code (using mahotas and milk in Pythonn):
from mahotas.surf import surf
from milk.unsupervised.kmeans import kmeans,assign_centroids
import milk
# First load your data:
images = ...
labels = ...
local_features = [surfs(im, 6, 4, 2) for im in imgs]
allfeatures = np.concatenate(local_features)
_, centroids = kmeans(allfeatures, k=100)
histograms = []
for ls in local_features:
hist = assign_centroids(ls, centroids, histogram=True)
histograms.append(hist)
cmatrix, _ = milk.nfoldcrossvalidation(histograms, labels)
print "Accuracy:", (100*cmatrix.trace())/cmatrix.sum()
This is a fairly hard problem.
You can give BoW model a try.
Basically, you extract SIFT features on all the images, then use K-means to cluster the features into visual words. After that, use the BoW vector to train you classifiers.
See the Wikipedia article above and the references papers in that for more details.
You probably need better alignment, and probably not more features. There is no way you can get acceptable performance unless you have correspondences. You need to know what points in one leaf correspond to points on another leaf. This is one of the "holy grail" problems in computer vision.
People have used shape context for this problem. You should probably look at this link. This paper describes the basic system behind leafsnap.
You can implement the BoW model according to this Bag-of-Features Descriptor on SIFT Features with OpenCV. It is a very good tutorial to implement the BoW model in OpenCV.