Well Hello everybody. I am doing a project that consist in dectect objects using kinect and svm and ann machine learning. I want if it is posible to give the names of library for svm and ann with graphical tool because I want only to train ann with that library and save in .xml then load .xml with opencv!!
SVM is a classifier used to classify samples based upon their feature vectors. So, your task is to convert the images into feature vectors which can be used by SVM for its training and testing.
Ok, to create feature vector from your images there are several possibilites and i am going to mention some very common technique:
A very easy method is to create normalized hue-histogram of your each image. Let's say, you have created hue-histogram with 5-bins. So, based upon your image color there will be some values in these 5 bins. Lets say the values look like this { 0.32 0.56 0 0 0.12 }. So, now this is your one input vector with 5 dimensions (i.e. number of bins). You have to do the same procedure for all training samples and then you will do it for test image too.
Extract some feature from your input samples (e.g. by using SIFT, SURf) and then create there descriptor using SIFT/SURF. And, then you can use these descriptors as the input to your SVM for training.
Related
i am new to Computer vision and Machine learning, i searched a lot but did not find quite answer for my questions.
First: I want to know what is the difference between all of these detection methods.
1)HOG.detect()
2)HOG.detectMultiScale()
3)HOG.setSvmDetector()
Second: I read that HOG.setSvmDetector() used only for detecting one object since SVM is a binary classfication, i was wondering can we train multi class SVM (One vs All) and for each single class of SVM we apply a new instance of HOG.setSvmDetector() ?
for an examlpe if i constructed 2 SVM's which means now i have Multi class SVM of 2 classes can i do smth like this:
HOGDescriptor hog1 = newHOGDescriptor()
HOGDescriptor hog2 = new HOGDescriptor()
hog1.setSvmDetector( CLASS ONE )
hog2.setSvmDetector( CLASS TWO )
HOG.detect
It detects objects in a single image.
HOG.detectMultiScale
It detects objects in image with original size then downsample the image using a certain factor e.g. using 1.2 as factor. It then detects objects in downsampled image and further downsample the image. This process is repeated until image size is less than the detection window size. It then combines all the detections found over all the images.
HOG.setSvmDetector()
It is used to set the trained classifier. OpenCV provides you the pretrained classifier over different datasets like getDefaultPeopleDetector() (trained using INRIA pedestrian dataset) and getDaimlerPeopleDetector() (trained using Daimler pedestrian dataset).
You can also trained your own classifier either binary or multiclass and use it in setSvmDetector()..
I am new to machine learning and openCV. I have taken a set of 10 images for each emotion(neutral and happy) from Cohn-Kanade face database. Then I have extracted the facial features from each image and put them in my trainingData Matrix and assigned the label for the respective emotion (Example: 0 for neutral and 1 for happy).
I have used the RBF kernel with gamma = 0.1 and C = 1. Once trained, I am passing the facial features extracted from the live camera frames from a smartphone camera for prediction. The prediction always returns 1.
If I increase the number of training samples for neutral expression(example: 15 neutral expression images and 10 happy expression images), then the prediction always returns 0 and if there are equal number of images for each expression in the training samples, then SVM prediction always returns 1.
Why is the SVM behaving this way? How to check if I am using the right values for gamma and C? Also, does SVM depend on the resolution of training images and testing images?
I would request you to upload the SVM function so we can understand your code. Secondly, I have used SVM before and you need to normalize the training data and the labels. You should also make sure you are using the correct classifier as not all classifiers are supported. Follows this link for some tutorials http://docs.opencv.org/3.0-beta/modules/ml/doc/support_vector_machines.html
For answering your other questions, unfortunately you have to find the best combination for gamma and C yourself, which is kind of the drawback of SVM. https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-machine
Yes, the SVM does depend on the resolution as your features/feature vectors would change depending on the resolution and hence the inputs and the labels.
P.S. This should ideally be in comments but unfortunately i don't have enough points to do that.
I'm implementing color quantization based on k-means clustering method on some RGB images. Then, I will determine the performance the algorithm. I found some information about training and testing. As I understand, I should divide the samples of images for training and testing.
But I am confused about the terms training and testing. What does these mean ? And how to implement with a rank value ?
Training and testing are two common concepts in machine learning. Training and testing are more easily explained in the framework of supervised learning; where you have a training dataset for which you know both input data as well as additional attributes that you want to predict. Training consists in learning a relation between data and attributes from a fraction of the training dataset, and testing consists in testing predictions of this relation on another part of the dataset (since you know the prediction, you can compare the output of the relation and the real attributes). A good introductory tutorial using these concepts can be found on http://scikit-learn.org/stable/tutorial/basic/tutorial.html
However, clustering is a class of unsupervised learning, that is, you just have some input data (here, the RGB values of pixels, if I understand well), without any corresponding target values. Therefore, you can run a k-means clustering algorithm in order to find classes of pixels with similar colors, without the need to train and test the algorithm.
In image processing, training and testing is for example used for classifying pixels in order to segment different objects. A common example is to use a random forest classifier: the user selects pixels belonging to the different objects of interest (eg background and object), the classifier is trained on this set of pixels, and then the remainder of the pixels are attributed to one of the classes by the classifier. ilastik (http://ilastik.org/) is an example of software that performs interactive image classification and segmentation.
I don't know which programming language you're using, but k-means is already implemented in various libraries. For Python, both SciPy (http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans2.html#scipy.cluster.vq.kmeans2) and scikit-learn (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) have an implementation of K-means. Also note that, depending on your application, you may be interested in clustering pixels together using not only pixels values, but also spatial proximity of pixels. See for example the scikit-image gallery example http://scikit-image.org/docs/dev/auto_examples/plot_rag_mean_color.html
I am new to hog, I am using opencv2.4.4 and visual studio 2010, i am running the sample peopledetect.cpp in the package and its compiling and running, but i want to understand the the source code in detail.In peopledetect.cpp is hog descriptors constructed/ already trained for peopledetection 3780 vectors are fed into svm classifier? when i try to debug the peopledetect.cpp i could only find HOGDescriptor creates hog descriptor and detector, i basically doesnt understand what this API does HOGDescriptor as i see peopledetect.cpp doesnt go through the steps of hog processing, it loads the already trained vectors to svm classifier to detect people/no people, am i wrong?. As there is no documentation about this.
Can anyone please brief about this.
The implementation of People Detection algorithm in opencv is based on HOG descriptors as features and SVM as classifier.
1. A training database (positives samples as person, negatives samples as non-person) is used to learn to SVM parameters (it computes and store the support vectors). Cross-validation is also perform (I assume) to optimize the soft margin parameter C and the kernel parameters (it could be linear kernel).
2. To detect people on testing video data, peopledetect.cpp loads the pre-learnt SVM, computes the HOG descriptors on different positions and scales, then merges the windows with high detection scores (outputs of binary SVM classifer).
Here is a good paper (inria) to start with.
Coming to more clearer answer, peopledetect.cpp goes through all the hog steps.
digging deeper i was more clear. Basically if you debug peopledetect.cpp goes through these steps.
Initially image is divided into several scales, scale0(1.05) is coefficient for detection window increase. For each scale of the image features are extracted from window and a classifier window is run, like above it follows scale-space pyramid method. So its pretty big computational process, very expensive, so opencv team has tried to parallelise for each scale.
I was baffled before why i was not able to debug/go through the steps, This parallel_for_(Range(0, (int)levelScale.size()),HOGInvoker()) creates several threads where each thread works on each scale, depends how much threads or created something like this.
because of this i was not able to debug, what i did was freeze all the threads and debug only the main thread. for different scales of the image hog processing steps are
Here in peopledetect.cpp hog and classifier window are kinda combined.In a single window(64x128) both feature extraction and running classifier takes place. After this is done for each scale of the image. There are a number of pedestrian windows of different scales are often associated with this region, this is grouped using grouprectangle() function
Training SVM consist to find parameters of the max margin between postive and negative samples.
if the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite?
Yes. These coefficient are extracted from training databases. You don't have them. SVM stores only support vectors which are sufficient to characterise the margin. See dual form of linear SVM for example.
a number of pedestrian windows of different scales are often associated with the region
True. A merging function is apply. Different methods (such groupRectangles(..)) are available (see here) and take in arguments parameters given to detectMultiScale(..).
What i understood from different papers is that feature extraction using hog is done using several positive and negative images, these features which were extracted is fed to Linear SVM to train them,So peopledetect.cpp uses this trained linear SVM sample, so This feature extraction process is not done by peopledetect.cpp i.e HOGDescriptor::getDefaultPeopleDetector() consists of coefficients of the classifier trained for people detection. The features extracted from hog detection/window(64x128)gives a total of length 3780(4 cells x 9 bins x 7 x 15 blocks = 3780) features. These features are then used to train a linear SVM classifier. If the same feature extraction is done for 1000+ negative and positive sample there is must be millions of features rite? How do we get these co-efficients?
But The HOG descriptors are known to contain redundant information because of the different detection window sizes being used. So when the SVM classifier classifies a region as “pedestrian”, a number of pedestrian windows of different scales are often associated with the region. what peopledetect.cpp mainly does is (hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);) The detection window is scanned across the image at all positions and scales, and conventional non-maximum suppression is run on the output pyramid to detect object instances.
I have a large set of plant images labeled with the botanical name. What would be the best algorithm to use to train on this dataset in order to classify an unlabel photo? The photos are processed so that 100% of the pixels contain the plant (e.g. either closeups of the leaves or bark), so there are no other objects/empty-space/background that the algorithm would have to filter out.
I've already tried generating SIFT features for all the photos and feeding these (feature,label) pairs to a LibLinear SVM, but the accuracy was a miserable 6%.
I also tried feeding this same data to a few Weka classifiers. The accuracy was a little better (25% with Logistic, 18% with IBk), but Weka's not designed for scalability (it loads everything into memory). Since the SIFT feature dataset is a several million rows, I could only test Weka with a random 3% slice, so it's probably not representative.
EDIT: Some sample images:
Normally, you would not train on the SIFT features directly. Cluster them (using k-means) and then train on the histogram of cluster membership identifiers (i.e., a k-dimensional vector, which counts, at position i, how many features were assigned to the i-th cluster).
This way, you obtain a single output per image (and a single, k-dimensional, feature vector).
Here's the quasi-code (using mahotas and milk in Pythonn):
from mahotas.surf import surf
from milk.unsupervised.kmeans import kmeans,assign_centroids
import milk
# First load your data:
images = ...
labels = ...
local_features = [surfs(im, 6, 4, 2) for im in imgs]
allfeatures = np.concatenate(local_features)
_, centroids = kmeans(allfeatures, k=100)
histograms = []
for ls in local_features:
hist = assign_centroids(ls, centroids, histogram=True)
histograms.append(hist)
cmatrix, _ = milk.nfoldcrossvalidation(histograms, labels)
print "Accuracy:", (100*cmatrix.trace())/cmatrix.sum()
This is a fairly hard problem.
You can give BoW model a try.
Basically, you extract SIFT features on all the images, then use K-means to cluster the features into visual words. After that, use the BoW vector to train you classifiers.
See the Wikipedia article above and the references papers in that for more details.
You probably need better alignment, and probably not more features. There is no way you can get acceptable performance unless you have correspondences. You need to know what points in one leaf correspond to points on another leaf. This is one of the "holy grail" problems in computer vision.
People have used shape context for this problem. You should probably look at this link. This paper describes the basic system behind leafsnap.
You can implement the BoW model according to this Bag-of-Features Descriptor on SIFT Features with OpenCV. It is a very good tutorial to implement the BoW model in OpenCV.