classifier works fine but detection fails - opencv

I am working on a project that detect people and identify whether he is wearing a pair of protection goggles. Now I am using the tradition HOG features to detect human body based on the Dalal's algorithms. My application gave me a confusion matrix like this after I test my data (%80 data used for training and 20% for test) :
confusion matrix.
The result seems to be good but when I use my detector to detect human, It gave me a result like this:
result of human detection.
The detector even perform worse on other pics
May I ask where is the problem, is it from classifier or my detector?
Sorry that I don not have the privilege to post a image here..

You are on the right track here. You can improve the results further. Before I get into that, You may get accuracy of around 90 and if you really push hard like 93-94 %(again depending on number of images you have for training and how similar they are with actual use case)
Okay, back to the answer. You have to use Hard negative mining to reduce false positives(i.e detecting a person when there is none). You take all the false positives and add them to the negative class and retrain the classifier. This will help you improve the results.
Hope this helps.

Related

Need advice for object detection and motion classification on real time video

I'm in research for my final project, i want to make object detection and motion classification like amazon go, i have read lot of research like object detection with SSD or YOLO and video classification using CNN+LSTM, i want to propose training algorithm like this:
Real time detection for multiple object (in my case: person) with SSD/YOLO
Get the boundary object and crop the frame
Feed cropped frame info to CNN+LSTM algo to make motion prediction (if the person's walking/taking items)
is it possible to make it in real-time environment?
or is there any better method for real-time detection and motion classification
If you want to use it in real-time application, several other things must be considered which are not appeared before implementation of algorithm in real environment.
About your 3-step proposed method, it already could be result in a good method, but the first step would be very accurate. I think it is better to combine the 3 steps in one step. Because the motion type of person is a good feature of a person. Because of that, I think all steps could be gathered in one step.
My idea is as follows:
1. a video classification dataset which just tag the movement of person or object
2. cnn-lstm based video classification method
This would solve your project properly.
This answer need to more details, if u interested in, I can answer u in more details.
Had pretty much the same problem. Motion prediction does not work that well in complex real-life situations. Here is a simple one:
(See in action)
I'm building a 4K video processing tool (some examples). Current approach looks like the following:
do rough but super fast segmentation
extract bounding box and shape
apply some "meta vision magic"
do precise segmentation within identified area
(See in action)
As of now the approach looks way more flexible comparing to motion tracking.
"Meta vision" intended to properly track shape evolution:
(See in action)
Let's compare:
Meta vision disabled
Meta vision enabled

Random Perturbation of Data to get Training Data for Neural Networks

I am working on Soil Spectral Classification using neural networks and I have data from my Professor obtained from his lab which consists of spectral reflectance from wavelength 1200 nm to 2400 nm. He only has 270 samples.
I have been unable to train the network for accuracy more than 74% since the training data is very less (only 270 samples). I was concerned that my Matlab code is not correct, but when I used the Neural Net Toolbox in Matlab, I got the same results...nothing more than 75% accuracy.
When I talked to my Professor about it, he said that he does not have any more data, but asked me to do random perturbation on this data to obtain more data. I have research online about random perturbation of data, but have come up short.
Can someone point me in the right direction for performing random perturbation on 270 samples of data so that I can get more data?
Also, since by doing this, I will be constructing 'fake' data, I don't see how the neural network would be any better cos isn't the point of neural nets using actual real valid data to train the network?
Thanks,
Faisal.
I think trying to fabricate more data is a bad idea: you can't create anything with higher information content than you already have, unless you know the true distribution of the data to sample from. If you did, however, you'd be able to classify with the Bayes optimal error rate, which would be impossible to beat.
What I'd be looking at instead is whether you can alter the parameters of your neural net to improve performance. The thing that immediately springs to mind with small amounts of training data is your weight regulariser (are you even using regularised weights), which can be seen as a prior on the weights if you're that way inclined. I'd also look at altering the activation functions if you're using simple linear activations, and the number of hidden nodes in addition (with so few examples, I'd use very few, or even bypass the hidden layer entirely since it's hard to learn nonlinear interactions with limited data).
While I'd not normally recommend it, you should probably use cross-validation to set these hyper-parameters given the limited size, as you're going to get unhelpful insight from a 10-20% test set size. You might hold out 10-20% for final testing, however, so as to not bias the results in your favour.
First, some general advice:
Normalize each input and output variable to [0.0, 1.0]
When using a feedforward MLP, try to use 2 or more hidden layers
Make sure your number of neurons per hidden layer is big enough, so the network is able to tackle the complexity of your data
It should always be possible to get to 100% accuracy on a training set if the complexity of your model is sufficient. But be careful, 100% training set accuracy does not necessarily mean that your model does perform well on unseen data (generalization performance).
Random perturbation of your data can improve generalization performance, if the perturbation you are adding occurs in practice (or at least similar perturbation). This works because this means teaching your network on how the data could look different but still belong to the given labels.
In the case of image classification, you could rotate, scale, noise, etc. the input image (the output stays the same, naturally). You will need to figure out what kind of perturbation could apply to your data. For some problems this is difficult or does not yield any improvement, so you need to try it out. If this does not work, it does not necessarily mean your implementation or data are broken.
The easiest way to add random noise to your data would be to apply gaussian noise.
I suppose your measures have errors associated with them (a measure without errors has almost no meaning). For each measured value M+-DeltaM you can generate a new number with N(M,DeltaM), where n is the normal distribution.
This will add new points as experimental noise from previous ones, and will add help take into account exprimental errors in the measures for the classification. I'm not sure however if it's possible to know in advance how helpful this will be !

How to verify what's noise what's real data?

I am wondering how can I claim that I correctly catch the "noise" in my data ?
To be more specific, take Principle Component Analysis as example, we know that in PCA, after doing SVD, we can zeros out the small singular values and reconstruct the original matrix using low-rank approximation.
Then can I claim what's been ignored is indeed noise in the data ?
Is there any evaluation metric for this ?
The only method I can come up with is simply subtract the original data from the reconstructed data.
Then, try to fit a Gaussian over it, seeing if the fitness is good.
Is that conventional method in field like DSP ??
BTW, I think in typical machine learning tasks, the measurement would be the follow up classification performance, but since I am doing purely generative model, there are no labels attached.
The way I see it, the definition of noise would depend on the domain of the problem. Therefore the strategy for reducing it would be different on each domain.
For instance, having a noisy signal in problems like seismic formation classification or a noisy image on a face classification problem would be drastically different to the noise produced by improperly tagged data in a medical diagnostic problem or the noise because similar words with different meaning in a language classification problem for documents.
When the noise is because of a given (or a set of) data point, then the solution is as simple as ignore those data points (although identify those data points most of the time is the challenging part)
From your example I guess you are more concerning about the case when the noise is embedded into the features (like in the seismic example). Sometimes people tend to pre-process the data with a noise reduction filter like the median filter (http://en.wikipedia.org/wiki/Median_filter). In contrast, some other people tend to reduce the dimension of the data to reduce noise, and PCA is used in this scenario.
Both strategies are valid, and normally people try both and cross-validate them to see which one gave better results.
What you did is a good metric to check gaussian noise. However, for non-gaussian noise your metric can give you false negatives (bad fitness but still good noise reduction)
Personally, if you want to prove the efficacy of the noise reduction, I'd use a task-based evaluation. I assume you're doing this for some purpose, to solve some problem? If so, solve the task with the original noisy matrix and the new clean one. If the latter works better, what was discarded was noise, for the purposes of the task you're interested in. I think some objective measure of noise is pretty hard to define.
I have found this. it is very resoureful, needs good time to understand.
https://sci2s.ugr.es/noisydata#Introduction%20to%20Noise%20in%20Data%20Mining

Can one conduct training an SVM with detected false positives iteratively?

I'm working on a machine learning problem in image processing. I want to get the location of an object in an image by using Histogram of Oriented Gradients (HOG) and a support vector machine (SVM). I've read a couple of articles and tutorials about training the SVM. The setup is pretty standard. I have labeled positive training images and now need to generate a set of negative training samples.
In literature, the approach to generate negative training samples by randomly choosing a position is found very often. I've also seen some approaches where in a successive step to choosing random negative samples, the false-positives of a detection are used as negative training samples once again.
However, I'm wondering if one could not use this approach generally from the start. So one would generate only one false training sample randomly, run the detection and put false-positives in the negative training set again. This seems quite an obvious strategy to me, but I wonder if I'm missing something.
The theory behind this method is laid out in Object Detection with Discriminatively Trained Part Based Models by P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan in their PAMI paper. In essence, your starting negative set does not matter, you will always converge to the same classifier if you iteratively add hard samples (with an SVM margin > -1). Starting with a single negative would simply make this convergence slower.
To me it sounds like you want to train the SVM classifier online/incrementally, i.e. updating the classifier with new samples. Such methods are generally only used if new data comes available over time. In your case it seems that you can generate a whole set of negative training samples, so there would be no need to train it incrementally. I'm inclined to say that training the classifier in one run will be better than doing this incrementally (as hinted at by larsmans).
(Again, I'm not an image processing specialist, so take this with a grain of salt.)
I'm wondering if one could not use this approach generally from the start.
You'd need some way to detect the false positives from a classification run. To do so, you need a ground truth, that is, you need a human in the loop. In effect, you'd be doing active learning. If that's what you want to do, you could just as well start with a bunch of hand-labeled negative examples.
Alternatively, you could set this up as a PU learning problem. I have no idea whether that works well with images, but for text classification, it sometimes works.

How to approach machine learning problems with high dimensional input space?

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the results I get are not quite satisfactory?
1, 2 or 3 dimensional data can be visualized, along with the algorithm's results, so you can get the hang of what's going on, and have some idea how to aproach the problem. Once the data is over 3 dimensions, other than intuitively playing around with the parameters I am not really sure how to attack it?
What do you do to the data? My answer: nothing. SVMs are designed to handle high-dimensional data. I'm working on a research problem right now that involves supervised classification using SVMs. Along with finding sources on the Internet, I did my own experiments on the impact of dimensionality reduction prior to classification. Preprocessing the features using PCA/LDA did not significantly increase classification accuracy of the SVM.
To me, this totally makes sense from the way SVMs work. Let x be an m-dimensional feature vector. Let y = Ax where y is in R^n and x is in R^m for n < m, i.e., y is x projected onto a space of lower dimension. If the classes Y1 and Y2 are linearly separable in R^n, then the corresponding classes X1 and X2 are linearly separable in R^m. Therefore, the original subspaces should be "at least" as separable as their projections onto lower dimensions, i.e., PCA should not help, in theory.
Here is one discussion that debates the use of PCA before SVM: link
What you can do is change your SVM parameters. For example, with libsvm link, the parameters C and gamma are crucially important to classification success. The libsvm faq, particularly this entry link, contains more helpful tips. Among them:
Scale your features before classification.
Try to obtain balanced classes. If impossible, then penalize one class more than the other. See more references on SVM imbalance.
Check the SVM parameters. Try many combinations to arrive at the best one.
Use the RBF kernel first. It almost always works best (computationally speaking).
Almost forgot... before testing, cross validate!
EDIT: Let me just add this "data point." I recently did another large-scale experiment using the SVM with PCA preprocessing on four exclusive data sets. PCA did not improve the classification results for any choice of reduced dimensionality. The original data with simple diagonal scaling (for each feature, subtract mean and divide by standard deviation) performed better. I'm not making any broad conclusion -- just sharing this one experiment. Maybe on different data, PCA can help.
Some suggestions:
Project data (just for visualization) to a lower-dimensional space (using PCA or MDS or whatever makes sense for your data)
Try to understand why learning fails. Do you think it overfits? Do you think you have enough data? Is it possible there isn't enough information in your features to solve the task you are trying to solve? There are ways to answer each of these questions without visualizing the data.
Also, if you tell us what the task is and what your SVM output is, there may be more specific suggestions people could make.
You can try reducing the dimensionality of the problem by PCA or the similar technique. Beware that PCA has two important points. (1) It assumes that the data it is applied to is normally distributed and (2) the resulting data looses its natural meaning (resulting in a blackbox). If you can live with that, try it.
Another option is to try several parameter selection algorithms. Since SVM's were already mentioned here, you might try the approach of Chang and Li (Feature Ranking Using Linear SVM) in which they used linear SVM to pre-select "interesting features" and then used RBF - based SVM on the selected features. If you are familiar with Orange, a python data mining library, you will be able to code this method in less than an hour. Note that this is a greedy approach which, due to its "greediness" might fail in cases where the input variables are highly correlated. In that case, and if you cannot solve this problem with PCA (see above), you might want to go to heuristic methods, which try to select best possible combinations of predictors. The main pitfall of this kind of approaches is the high potential of overfitting. Make sure you have a bunch "virgin" data that was not seen during the entire process of model building. Test your model on that data only once, after you are sure that the model is ready. If you fail, don't use this data once more to validate another model, you will have to find a new data set. Otherwise you won't be sure that you didn't overfit once more.
List of selected papers on parameter selection:
Feature selection for high-dimensional genomic microarray data
Oh, and one more thing about SVM. SVM is a black box. You better figure out what is the mechanism that generate the data and model the mechanism and not the data. On the other hand, if this would be possible, most probably you wouldn't be here asking this question (and I wouldn't be so bitter about overfitting).
List of selected papers on parameter selection
Feature selection for high-dimensional genomic microarray data
Wrappers for feature subset selection
Parameter selection in particle swarm optimization
I worked in the laboratory that developed this Stochastic method to determine, in silico, the drug like character of molecules
I would approach the problem as follows:
What do you mean by "the results I get are not quite satisfactory"?
If the classification rate on the training data is unsatisfactory, it implies that either
You have outliers in your training data (data that is misclassified). In this case you can try algorithms such as RANSAC to deal with it.
Your model(SVM in this case) is not well suited for this problem. This can be diagnozed by trying other models (adaboost etc.) or adding more parameters to your current model.
The representation of the data is not well suited for your classification task. In this case preprocessing the data with feature selection or dimensionality reduction techniques would help
If the classification rate on the test data is unsatisfactory, it implies that your model overfits the data:
Either your model is too complex(too many parameters) and it needs to be constrained further,
Or you trained it on a training set which is too small and you need more data
Of course it may be a mixture of the above elements. These are all "blind" methods to attack the problem. In order to gain more insight into the problem you may use visualization methods by projecting the data into lower dimensions or look for models which are suited better to the problem domain as you understand it (for example if you know the data is normally distributed you can use GMMs to model the data ...)
If I'm not wrong, you are trying to see which parameters to the SVM gives you the best result. Your problem is model/curve fitting.
I worked on a similar problem couple of years ago. There are tons of libraries and algos to do the same. I used Newton-Raphson's algorithm and a variation of genetic algorithm to fit the curve.
Generate/guess/get the result you are hoping for, through real world experiment (or if you are doing simple classification, just do it yourself). Compare this with the output of your SVM. The algos I mentioned earlier reiterates this process till the result of your model(SVM in this case) somewhat matches the expected values (note that this process would take some time based your problem/data size.. it took about 2 months for me on a 140 node beowulf cluster).
If you choose to go with Newton-Raphson's, this might be a good place to start.

Resources