Does SVM need to do learning each time when detecting people? - opencv

I'm using SVM and HOG in OpenCV to implement people detection.
Say using my own dataset: 3000 positive samples and 6000 negative samples.
My question is Does SVM need to do learning each time when detecting people?
If so, the learning time and predicting time could be so time-consuming. Is there any way to implement real-time people detection?
Thank you in advance.
Thank you for your answers. I have obtained the xml result after training(3000 positive and 6000 negative), so I can just use this result to write an other standalone program just use svm.load() and svm.predict()? That's great. Besides, I found that the predicting time for 1000 detection window size image(128x64) is also quite time-consuming(about 10 seconds), so how does it handle a normal surveillance camera capture(320x240 or higher) using 1 or 2 pixels scanning stepsize in real time?
I implemented HOG according to the original paper, 8x8 pixels per cell, 2x2 cells per block(50% overlap), so 3780 dimensions vector for one detection window(128x64). Is the time problem caused by the huge feature vector? Should I reduce the dimensions for each window?

This is a very specific question to a general topic.
Short answer: no, you don't need to learning every time you want to use a SVM. It is a two step process. The first step, learning (in your case by providing your learning algorithm with many many labeled (containing, not containing) pictures containing people or not containing people), results in a model which is used in the second step: testing (in your case detecting people).

no, you don't have to re-train an svm each and every time.
you do the training once, then svm.save() the trained model to a xml/yml file.
later you just svm.load() that instead of the (re-)training, and do your predictions

Related

an algorithm for clustering visually separable clusters

I have visualized a dataset in 2D after employing PCA. 1 dimension is time and the Y dimension is First PCA component. As figure shows, there is relatively good separation between points (A, B). But unfortunately clustering methods (DBSCAN, SMO, KMEANS, Hierarchical) are not able to cluster these points in 2 clusters. As you see in section A there is a relative continuity and this continuous process is finished and Section B starts and there is rather big gap in comparison to past data between A and B.
I will be so grateful if you can introduce me any method and algorithm (or devising any metric from data considering its distribution) to be able to do separation between A and B without visualization. Thank you so much.
This is plot of 2 PCA components for the above plot(the first one). The other one is also the plot of components of other dataset which I get bad result,too.
This is a time series, and apparently you are looking for change points or want to segment this time series.
Do not treat this data set as a two dimensional x-y data set, and don't use clustering here; rather choose an algorithm that is actually designed for time series.
As a starter, plot series[x] - series[x-1], i.e. the first derivative. You may need to remove seasonality to improve results. No clustering algorithm will do this, they do not have a notion of seasonality or time.
If PCA gives you a good separation, you can just try to cluster after projecting your data through your PCA eigenvectors. If you don't want to use PCA, then you will need anyway an alternative data projection method, because failing clustering methods imply that your data is not separable in the original dimensions. You can take a look at non linear clustering methods such as the kernel based ones or spectral clustering for example. Or to define your own non-euclidian metric, which is in fact just another data projection method.
But using PCA clearly seems to be the best fit in your case (Occam razor : use the simplest model that fits your data).
I don't know that you'll have an easy time devising an algorithm to handle this case, which is dangerously (by present capabilities) close to "read my mind" clustering. You have a significant alley where you've marked the division. You have one nearly as good around (1700, +1/3), and an isolate near (1850, 0.45). These will make it hard to convince a general-use algorithm to make exactly one division at the spot you want, although that one is (I think) still the most computationally obvious.
Spectral clustering works well at finding gaps; I'd try that first. You might have to ask it for 3 or 4 clusters to separate the one you want in general. You could also try playing with SVM (good at finding alleys in data), but doing that in an unsupervised context is the tricky part.
No, KMeans is not going to work; it isn't sensitive to density or connectivity.

How to speed up svm.predict?

I'm writing a sliding window to extract features and feed it into CvSVM's predict function.
However, what I've stumbled upon is that the svm.predict function is relatively slow.
Basically the window slides thru the image with fixed stride length, on number of image scales.
The speed traversing the image plus extracting features for each
window takes around 1000 ms (1 sec).
Inclusion of weak classifiers trained by adaboost resulted in around
1200 ms (1.2 secs)
However when I pass the features (which has been marked as positive
by the weak classifiers) to svm.predict function, the overall speed
slowed down to around 16000 ms ( 16 secs )
Trying to collect all 'positive' features first, before passing to
svm.predict utilizing TBB's threads resulted in 19000 ms ( 19 secs ), probably due to the overhead needed to create the threads, etc.
My OpenCV build was compiled to include both TBB (threading) and OpenCL (GPU) functions.
Has anyone managed to speed up OpenCV's SVM.predict function ?
I've been stuck in this issue for quite sometime, since it's frustrating to run this detection algorithm thru my test data for statistics and threshold adjustment.
Thanks a lot for reading thru this !
(Answer posted to formalize my comments, above:)
The prediction algorithm for an SVM takes O(nSV * f) time, where nSV is the number of support vectors and f is the number of features. The number of support vectors can be reduced by training with stronger regularization, i.e. by increasing the hyperparameter C (possibly at a cost in predictive accuracy).
I'm not sure what features you are extracting but from the size of your feature (3780) I would say you are extracting HOG. There is a very robust, optimized, and fast way of HOG "prediction" in cv::HOGDescriptor class. All you need to do is to
extract your HOGs for training
put them in the svmLight format
use svmLight linear kernel to train a model
calculate the 3780 + 1 dimensional vector necessary for prediction
feed the vector to setSvmDetector() method of cv::HOGDescriptor object
use detect() or detectMultiScale() methods for detection
The following document has very good information about how to achieve what you are trying to do: http://opencv.willowgarage.com/wiki/trainHOG although I must warn you that there is a small problem in the original program, but it teaches you how to approach this problem properly.
As Fred Foo has already mentioned, you have to reduce the number of support vectors. From my experience, 5-10% of the training base is enough to have a good level of prediction.
The other means to make it work faster:
reduce the size of the feature. 3780 is way too much. I'm not sure what this size of feature can describe in your case but from my experience, for example, a description of an image like the automobile logo can effectively be packed into size 150-200:
PCA can be used to reduce the size of the feature as well as reduce its "noise". There are examples of how it can be used with SVM;
if not helping - try other principles of image description, for example, LBP and/or LBP histograms
LDA (alone or with SVM) can also be used.
Try linear SVM first. It is much faster and your feature size 3780 (3780 dimensions) is more than enough of "space" to have good separation in higher dimensions if your sets are linearly separatable in principle. If not good enough - try RBF kernel with some pretty standard setup like C = 1 and gamma = 0.1. And only after that - POLY - the slowest one.

Can one conduct training an SVM with detected false positives iteratively?

I'm working on a machine learning problem in image processing. I want to get the location of an object in an image by using Histogram of Oriented Gradients (HOG) and a support vector machine (SVM). I've read a couple of articles and tutorials about training the SVM. The setup is pretty standard. I have labeled positive training images and now need to generate a set of negative training samples.
In literature, the approach to generate negative training samples by randomly choosing a position is found very often. I've also seen some approaches where in a successive step to choosing random negative samples, the false-positives of a detection are used as negative training samples once again.
However, I'm wondering if one could not use this approach generally from the start. So one would generate only one false training sample randomly, run the detection and put false-positives in the negative training set again. This seems quite an obvious strategy to me, but I wonder if I'm missing something.
The theory behind this method is laid out in Object Detection with Discriminatively Trained Part Based Models by P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan in their PAMI paper. In essence, your starting negative set does not matter, you will always converge to the same classifier if you iteratively add hard samples (with an SVM margin > -1). Starting with a single negative would simply make this convergence slower.
To me it sounds like you want to train the SVM classifier online/incrementally, i.e. updating the classifier with new samples. Such methods are generally only used if new data comes available over time. In your case it seems that you can generate a whole set of negative training samples, so there would be no need to train it incrementally. I'm inclined to say that training the classifier in one run will be better than doing this incrementally (as hinted at by larsmans).
(Again, I'm not an image processing specialist, so take this with a grain of salt.)
I'm wondering if one could not use this approach generally from the start.
You'd need some way to detect the false positives from a classification run. To do so, you need a ground truth, that is, you need a human in the loop. In effect, you'd be doing active learning. If that's what you want to do, you could just as well start with a bunch of hand-labeled negative examples.
Alternatively, you could set this up as a PU learning problem. I have no idea whether that works well with images, but for text classification, it sometimes works.

What's the difference between ANN, SVM and KNN classifiers?

I am doing remote sensing image classification. I am using the object-oriented method: first I segmented the image to different regions, then I extract the features from regions such as color, shape and texture. The number of all features in a region may be 30 and commonly there are 2000 regions in all, and I will choose 5 classes with 15 samples for every class.
In summary:
Sample data 1530
Test data 197530
How do I choose the proper classifier? If there are 3 classifiers (ANN, SVM, and KNN), which should I choose for better classification?
KNN is the most basic machine learning algorithm to paramtise and implement, but as alluded to by #etov, would likely be outperformed by SVM due to the small training data sizes. ANNs have been observed to be limited by insufficient training data also. However, KNN makes the least number of assumptions regarding your data, other than that accurate training data should form relatively discrete clusters. ANN and SVM are notoriously difficult to paramtise, especially if you wish to repeat the process using multiple datasets and rely upon certain assumptions, such as that your data is linearly separable (SVM).
I would also recommend the Random Forests algorithm as this is easy to implement and is relatively insensitive to training data size, but I would advise against using very small training data sizes.
The scikit-learn module contains these algorithms and is able to cope with large training data sizes, so you could increase the number of training data samples. the best way to know for sure would be to investigate them yourself, as suggested by #etov
If your "sample data" is the train set, it seems very small. I'd first suggest using more than 15 examples per class.
As said in the comments, it's best to match the algorithm to the problem, so you can simply test to see which algorithm works better. But to start with, I'd suggest SVM: it works better than KNN with small train sets, and generally easier to train then ANN, as there are less choices to make.
Have a look at below mind map
KNN: KNN performs well when sample size < 100K records, for non textual data. If accuracy is not high, immediately move to SVC ( Support Vector Classifier of SVM)
SVM: When sample size > 100K records, go for SVM with SGDClassifier.
ANN: ANN has evolved overtime and they are powerful. You can use both ANN and SVM in combination to classify images
More details are available #semanticscholar.org

SVM Classification - minimum number of input sets for each class

I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.
From the help that I got on this Stackoverflow question, I thought SVM is the best approach to my aim.
So, I have coded SVM and an SMO myself. The dataset which I have got from UCI data repository has 3280 instances ( Link to Dataset ) where around 400 of them are from class representing Advertisement images and rest of them representing non-advertisement images.
Right now I'm taking the first 2800 input sets and training the SVM. But after looking at the accuracy rate I realised that most of those 2800 input sets are from non-advertisement image class. So I`m getting very good accuracy for that class.
So what can I do here? About how many input set shall I give to SVM to train and how many of them for each class?
Thanks. Cheers. ( Basically made a new question because the context was different from my previous question. Optimization of Neural Network input data )
Thanks for the reply.
I want to check whether I`m deriving the C values for ad and non-ad class correctly or not.
Please give me feedback on this.
Or you u can see the doc version here.
You can see graph of y1 eqaul to y2 here
and y1 not equal to y2 here
There are two ways of going about this. One would be to balance the training data so it includes an equal number of advertisement and non-advertisement images. This could be done by either oversampling the 400 advertisement images or undersampling the thousands of non-advertisement images. Since training time can increase dramatically with the number of data points used, you should probably first try undersampling the non-advertisement images and create a training set with the 400 ad images and 400 randomly selected non-advertisements.
The other solution would be to use a weighted SVM so that margin errors for the ad images are weighted more heavily than those for non-ads, for the package libSVM this is done with the -wi flag. From your description of the data, you could try weighing the ad images about 7 times more heavily than the non-ads.
The required size of your training set depends on the sparseness of the feature space. As far as I can see, you are not discussing what image features you have chose to use. Before you can train, you need to to convert each image into a vector of numbers (features) that describe the image, hopefully capturing the aspects that you care about.
Oh, and unless you are reimplementing SVM for sport, I'd recomment just using libsvm,

Resources