Improving the accuracy on Haar Training with OpenCV binary - opencv

I've been using Haar Cascades and LBP cascades trained with the opencv_traincascade tool which is brilliant.
I'd like to hear some purposes about how to generate a bigger database which in fact improves the accuracy. What I mean is: let's imagine we've got 2,000 positive images and 10,000 negative images. For CNN (Convolutional Neural Networks) I've rotated, translated and scaled pictures in order to multiplicate those 2,000 into a 8,000 positive samples which really improves the results, but I don't really have clear what I could do for Cascade Training.
My purposes are:
Generate a part of the positive set with noise. For instance:
Generate a part of the positive set with highlights or blenders.
Have you used anything else or tried something which could improve the accuracy?
Thank you in advance.
Rafael.

Related

Reducing pixels in large data set (sklearn)

Im currently working on a classification project but I'm in doubt about how I should start off.
Goal
Accurately classifying pictures of size 80*80 (so 6400 pixels) in the correct class (binary).
Setting
5260 training samples, 600 test samples
Question
As there are more pixels than samples, it seems logic to me to 'drop' most of the pixels and only look at the important ones before I even start working out a classification method (like SVM, KNN etc.).
Say the training data consists of X_train (predictors) and Y_train (outcomes). So far, I've tried looking at the SelectKBest() method from sklearn for feature extraction. But what would be the best way to use this method and to know how many k's I've actually got to select?
It could also be the case that I'm completely on the wrong track here, so correct me if I'm wrong or suggest an other approach to this if possible.
You are suggesting to reduce the dimension of your feature space. That is a method of regularization to reduce overfitting. You haven't mentioned overfitting is an issue so I would test that first. Here are some things I would try:
Use transfer learning. Take a pretrained network for image recognition tasks and fine tune it to your dataset. Search for transfer learning and you'll find many resources.
Train a convolutional neural network on your dataset. CNNs are the go-to method for machine learning on images. Check for overfitting.
If you want to reduce the dimensionality of your dataset, resize the image. Going from 80x80 => 40x40 will reduce the number of pixels by 4x, assuming your task doesn't depend on fine details of the image you should maintain classification performance.
There are other things you may want to consider but I would need to know more about your problem and its requirements.

Caffe accuracy increases too fast

I'm doing a AlexNet fine tuning for face detection following this: link
The only difference with the link is that I am using another dataset (facescrub and some images from imagenet as negative examples).
I noticed the accuracy increasing too fast, in 50 iterations it goes from 0.308 to 0.967 and when it is about 0.999 I stop the training and use the model using the same python script as the above link.
I use for testing an image from the dataset and the result is nowhere near good, test image result. As you can see the box in the faces is too big (and the dataset images are tightly cropped), not to mention the box not containing a face.
My solver and train_val files are exactly the same, only difference is batch sizes and max iter size.
The reason was that my dataset has way more face examples than non-face examples. I tried the same setup with the same number of positive and negative examples and now the accuracy increases slower.

Classification with Convolutional Neural Networks

I am trying to classify samples into two classes, positive and negative.
I have 500 positive samples and 100 negative. examples:
http://imgur.com/a/3XDAP
(I dont have enough reputation to post images or multiple links)
I am using a convolutional neural network to do the classification. For negative samples which have an obvious distortion, the network works fine. The problem is with negative samples that have a small distortion, they are classified as positive.
Does anyone have any suggestions ? Maybe pre-processing steps or completely another methode rather than CNN ?
Thank you

One-class Support Vector Machine Sensitivity Drops when the number of training sample increase

I am using One-Class SVM for outlier detections. It appears that as the number of training samples increases, the sensitivity TP/(TP+FN) of One-Class SVM detection result drops, and classification rate and specificity both increase.
What's the best way of explaining this relationship in terms of hyperplane and support vectors?
Thanks
The more training examples you have, the less your classifier is able to detect true positive correctly.
It means that the new data does not fit correctly with the model you are training.
Here is a simple example.
Below you have two classes, and we can easily separate them using a linear kernel.
The sensitivity of the blue class is 1.
As I add more yellow training data near the decision boundary, the generated hyperplane can't fit the data as well as before.
As a consequence we now see that there is two misclassified blue data point.
The sensitivity of the blue class is now 0.92
As the number of training data increase, the support vector generate a somewhat less optimal hyperplane. Maybe because of the extra data a linearly separable data set becomes non linearly separable. In such case trying different kernel, such as RBF kernel can help.
EDIT: Add more informations about the RBF Kernel:
In this video you can see what happen with a RBF kernel.
The same logic applies, if the training data is not easily separable in n-dimension you will have worse results.
You should try to select a better C using cross-validation.
In this paper, the figure 3 illustrate that the results can be worse if the C is not properly selected :
More training data could hurt if we did not pick a proper C. We need to
cross-validate on the correct C to produce good results

Hog descriptor traininun using SVMs

I am trying to classify road signs. For this reason I want to train Hog descriptors with the use of SVMs. I have extracted the hog descriptors for training data with dimensions 64x64. The positive training data are 60% and the negative 40% of the whole sample.When I am traing using SVM of opencv (with a linear kernel) evereything seems fine, but when I am trying to predict, the results fail and show only one class (the result is always 1). I have tried to feed my data into SVMlight as well, and all the negatives are missclassified.Any ideas what could be possible wrong? Maybe the small number of training data? (I am just trying to implement the code and see that everything is fine without using a the training data).

Resources