Negative sample image dimensions for training cascaded classifier in OpenCV? - opencv

So following up from here, I now need to collect negative samples, for cascaded classification using OpenCV.
With positive samples, I know that all samples should have the same aspect ratio.
What about negative samples?
Should they all be larger than positive samples (since OpenCV is going to paste positives on top of negatives to create the test images).
Should all be the same size?
Can they be arbitrary sizes?
Should they too have the same aspect ratio among themselves?

From OpenCV doc on Cascade Classifier Training:
Negative samples are taken from arbitrary images. These images must not contain detected objects. [...] Described images may be of different sizes. But each image should be (but not nessesarily) larger then a training window size, because these images are used to subsample negative image to the training size.

Related

Haar Classifier positive image set clarification

Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.

Accuracy in landmarks prediction using dlib

I'm trying to find landmarks using dlib. So, I'm fitting my model with HELEN dataset, there are 2000 items downloaded from here. But accuracy is very very low. But when I use shape_predictor_68_face_landmarks.dat accuracy is high. I've read the Kazemi paper and set nu as 0.1, depth=4 and oversampling_amount=20, but it's still work badly. What's wrong?
Dlib landmark trainer uses default parameters if size of the dataset is large enough but bounding box for the detected face should be square. Since faces are almost symmetric, augmenting the dataset by flipping the images vertically may increase the accuracy.

Poor performance on digit recognition with CNN trained on MNIST dataset

I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.

Training HOG-linear svm base on TUD-brussel data set

is there anyone ever tried to train an HOG-liner svm pedestrian detector base on The TUD-brussel dataset(which is introduced from this website):
https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/people-detection-pose-estimation-and-tracking/multi-cue-onboard-pedestrian-detection/
I tried to implement it on opencv through visual studio 2012. I cropped positive samples from the original positive images base on their annotations(about 1777 samples for total). Negative samples was cropped from original negative images randomly, 20 samples for each image(about 3840 samples for total).
I also adapted two round bootstrapping(checking for hardexamples and retrain) to improve its performance. However, the test result for this detector on TUD-brussel was awful, about 97% miss rate when FPPG(false positive per image) equals to 1. I found another paper which achieved a reasonable result when trainning on TUD-brussel with HOG(on Figure3(a)):
https://www1.ethz.ch/igp/photogrammetry/publications/pdf_folder/walk10cvpr.pdf.
Is anybody have any idea on training HOG+linear SVM on TUD-brussel?
I have to face with a similar situation recently.I developed an image classifier with hog and linear svm in python using pycharm.Problem i faced was it took lot of time to train.
Solution:
Simple I resized each image to 250*250.it really incresed performance in my situation
Resize each image
convert to gray scale
find PCA
flat that and append it to training list
append labels to training labels
for file in listing1:
img = cv2.imread(path1 + file)
res=cv2.resize(img,(250,250))
gray_image = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)
xarr=np.squeeze(np.array(gray_image).astype(np.float32))
m,v=cv2.PCACompute(xarr)
arr= np.array(v)
flat_arr= arr.ravel()
training_set.append(flat_arr)
training_labels.append(1)
Now Training
trainData=np.float32(training_set)
responses=np.float32(training_labels)
svm = cv2.SVM()
svm.train(trainData,responses, params=svm_params)
svm.save('svm_data.dat')

More questions about OpenCV's HoG and CvSVM

I've managed to extract HoG features from positive and negative images (from INRIA's person dataset ) using OpenCV's HOGDescriptor::compute function.
I've also managed to pack the data correctly and feed it into CvSVM for training purposes.
I have several questions:
While extracting features, I used positive images with dimension of 96 x 128, while the negative images are on average 320 x 240. I have been using window size of 64 x 128 for HoG extraction, should I use other window size ?
The size of extracted features for positive images are around 28800 features, while the negative ones are around 500000+. I have been truncating the features from negative ones to 28800, I think this is wrong, since I believe I'm losing too much information when feeding these features to SVM. How should I go and tackle this ? (It seems like I can only feed the same sample size for negative and positive features)
While doing prediction on images bigger than 64 x 128 (or 96 x 160), should I use a sliding window to do prediction ? Since large negative images still gives me more than 500000 features, but I can't feed it into SVM due to sample size.
Why you can't just resize all your patches to the same size? Hog descriptor depends on windows size, blocks and cells sizes. You should try different combinations. With small cells you can capture small details, but you will lose in generality and vice versa.
1.) Don't understand the question
2.) Make all descriptors the same size, extracting hog from resized images.
3.) Don't understand the question

Resources