Training HOG-linear svm base on TUD-brussel data set - opencv

is there anyone ever tried to train an HOG-liner svm pedestrian detector base on The TUD-brussel dataset(which is introduced from this website):
https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/people-detection-pose-estimation-and-tracking/multi-cue-onboard-pedestrian-detection/
I tried to implement it on opencv through visual studio 2012. I cropped positive samples from the original positive images base on their annotations(about 1777 samples for total). Negative samples was cropped from original negative images randomly, 20 samples for each image(about 3840 samples for total).
I also adapted two round bootstrapping(checking for hardexamples and retrain) to improve its performance. However, the test result for this detector on TUD-brussel was awful, about 97% miss rate when FPPG(false positive per image) equals to 1. I found another paper which achieved a reasonable result when trainning on TUD-brussel with HOG(on Figure3(a)):
https://www1.ethz.ch/igp/photogrammetry/publications/pdf_folder/walk10cvpr.pdf.
Is anybody have any idea on training HOG+linear SVM on TUD-brussel?

I have to face with a similar situation recently.I developed an image classifier with hog and linear svm in python using pycharm.Problem i faced was it took lot of time to train.
Solution:
Simple I resized each image to 250*250.it really incresed performance in my situation
Resize each image
convert to gray scale
find PCA
flat that and append it to training list
append labels to training labels
for file in listing1:
img = cv2.imread(path1 + file)
res=cv2.resize(img,(250,250))
gray_image = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)
xarr=np.squeeze(np.array(gray_image).astype(np.float32))
m,v=cv2.PCACompute(xarr)
arr= np.array(v)
flat_arr= arr.ravel()
training_set.append(flat_arr)
training_labels.append(1)
Now Training
trainData=np.float32(training_set)
responses=np.float32(training_labels)
svm = cv2.SVM()
svm.train(trainData,responses, params=svm_params)
svm.save('svm_data.dat')

Related

Accuracy in landmarks prediction using dlib

I'm trying to find landmarks using dlib. So, I'm fitting my model with HELEN dataset, there are 2000 items downloaded from here. But accuracy is very very low. But when I use shape_predictor_68_face_landmarks.dat accuracy is high. I've read the Kazemi paper and set nu as 0.1, depth=4 and oversampling_amount=20, but it's still work badly. What's wrong?
Dlib landmark trainer uses default parameters if size of the dataset is large enough but bounding box for the detected face should be square. Since faces are almost symmetric, augmenting the dataset by flipping the images vertically may increase the accuracy.

Caffe accuracy increases too fast

I'm doing a AlexNet fine tuning for face detection following this: link
The only difference with the link is that I am using another dataset (facescrub and some images from imagenet as negative examples).
I noticed the accuracy increasing too fast, in 50 iterations it goes from 0.308 to 0.967 and when it is about 0.999 I stop the training and use the model using the same python script as the above link.
I use for testing an image from the dataset and the result is nowhere near good, test image result. As you can see the box in the faces is too big (and the dataset images are tightly cropped), not to mention the box not containing a face.
My solver and train_val files are exactly the same, only difference is batch sizes and max iter size.
The reason was that my dataset has way more face examples than non-face examples. I tried the same setup with the same number of positive and negative examples and now the accuracy increases slower.

Poor performance on digit recognition with CNN trained on MNIST dataset

I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.

Digit Recognition on CNN

I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?
Here is "2" number Images sample -
I resized every image 28 x 28.
Here is more detail :-
Training data size = 28 x 28 images.
Network parameters - As LeNet5
Architecture of Network -
Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)
This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.
I see two likely problems:
Preprocessing: MNIST is not only 28px x 28px, but also:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Source: MNIST website
Overfitting:
MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)
Remarks
Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?
It may be an overfitting problem. It could happen when your network is too complex for the problem to resolve.
Check this article: http://es.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting.html
It definitely looks like an issue of overfitting. I see that you have two convolution layers, two max pooling layers and two fully connected. But how many weights total? You only have 96 examples per class, which is certainly smaller than the number of weights you have in your CNN. Remember that you want at least 5 times more instances in your training set than weights in your CNN.
You have two solutions to improve your CNN:
Shake each instance in the training set. You each number about 1 pixel around. It will already multiply your training set by 9.
Use a transformer layer. It will add an elastic deformation to each number at each epoch. It will strengthen a lot the learning by artificially increase your training set. Moreover, it will make it much more effective to predict other fonts.

Hog descriptor traininun using SVMs

I am trying to classify road signs. For this reason I want to train Hog descriptors with the use of SVMs. I have extracted the hog descriptors for training data with dimensions 64x64. The positive training data are 60% and the negative 40% of the whole sample.When I am traing using SVM of opencv (with a linear kernel) evereything seems fine, but when I am trying to predict, the results fail and show only one class (the result is always 1). I have tried to feed my data into SVMlight as well, and all the negatives are missclassified.Any ideas what could be possible wrong? Maybe the small number of training data? (I am just trying to implement the code and see that everything is fine without using a the training data).

Resources