Training Haar classifier to detect letters/digits - opencv

I have a test answer sheet with circled answers and I am trying to detect digits/letters using OpenCV. I used 10x10 image of '2' as positive image and 44 other parts of the answer sheet as negatives to create Haar classifier myself.
Apparently, I am doing something wrong as my classifier fails to detect original '2'.
$opencv_createsamples -vec a_desc.bin -info positive.txt -bg negative.txt -num 1 -w 10 -h 10
$opencv_traincascade -data classifiers -vec a_desc.bin -bg negative.txt -numStages 20 -numPos 1 -numNeg 46 -w 10 -h 10
....
===== TRAINING 4-stage =====
<BEGIN
POS count : consumed 1 : 1
NEG count : acceptanceRatio 0 : 0
Required leaf false alarm rate achieved. Branch training terminated.
What I am doing wrong:
too fee negatives
too few positives
size of negatives and positives must match
all images must be in the same format (png | jpg | gif)
Basically, what the expectation for the following approach:
A page from random books is selected
We create 10x10 image of 'e' letter
We create 20 negative images of other letters/digits
We create a classifier using these two datasets
Now we try to classify the original page.
Are we supposed to supposed to find ALL occurrences of 'e' and some false positives?

Related

OpenCV Haar Classifier: training stops prematurely

I have been trying to train image databases to detect faces using Haar cascades. I have made 2 attempts:
1) I have used the following database for positive images:
http://robotics.csie.ncku.edu.tw/Databases/FaceDetect_PoseEstimate.htm#Our_Database_ (6660 images)
For negative images I have used this database:
https://github.com/sonots/tutorial-haartraining/tree/master/data/negatives (3300 images)
I have used this command to train the samples:
opencv_createsamples -info info.dat -vec samples2.vec -w 32 -h 24 -num 6660
I have used this command to train the data:
opencv_traincascade -data ./classifier3 -vec samples2.vec -bg bg.txt -numPos 6000 -numNeg 12000 -numStages 30 -precalcValBufSize 5120 -precalcIdxBufSize 5120 -numThreads 12 -acceptanceRatioBreakValue 10e-5 -w 32 -h 24 -minHitRate 0.99 -maxFalseAlarmRate 0.5 -mode ALL
The training goes on up to stage 9. Then the acceptanceRatio break value is crossed.(The required acceptanceRatio for the model has been reached to avoid over-fitting of training data. Branch training terminated.)
I don't understand the issue here. I have only used the recommended values for the parameters. I had tried changing the minHitRate to 0.95, yet the result is the same. I can think of some potential reasons:
i) I had used the positive images directly without cropping. But I don't
think that should be an issue, as the background is completely plain.
ii) The image database contains faces in different poses. That could lead
to complications while training. Is it a good idea to train faces
under different poses using the same cascade classifier? Or should I
use different classifiers for each pose?
iii) My negative images might be too different compared to the positive
images. Is that the case? If yes, what kind of negative images should
I be looking for?
I tried testing the cascade.xml file on a few sample images, but nothing is detected at all.
2) Keeping in mind the potential reason i), I used this database already cropped, for positive images: http://conradsanderson.id.au/lfwcrop/ (around 13000 images)
But still the problem persists. This time it trains upto stage 11. In this case I used -numPos as 8000 -numNeg as 20000( increased the ratio to give the training more negative samples), -w as 24 and -h as 24.
Can anyone please guide me here?

Haar Cascade Training - Positive Images Size

I'm trying to train my own Haar Cascade classifier for apple according to this article. I collected 1000 positive images and 500 negative images from the internet. Each image has different sizes and I cropped the images to create "info.txt" file. While I creating samples like this,
createsamples.exe -info positive/info.txt -vec vector/applevector.vec -
num 1000 -w 24 -h 24
there is some parameters -w and -h. What does it mean ? Should I resize all my positive and negative images ? I tried to train my classifier with default parameters (-w 24 and -h 24) but accuracy of my classifier is so weak. Can it be related with this parameters ? Thank you for advice.
UPDATE
There is some examples of my positive images. I collected them from the internet.

opencv train cascade in less than 2 hours

I'm new to opencv I want to detect a fire object and I'm training a fire classifier using opencv traincascade, as I read various tutorials everyone said that training will use up days or even weeks.
I have 700 positives and 3k negatives, I read that I should not use all positive to train and ratio of 1:2 in positive negative so this is what I inputted
opencv_traincascade -data classifier -positive.vec -bg negatives.txt -numPos 500 -numNeg 1000 -numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -precalcValBufSize 1024\ -precalcIdxBufSize 1024 -mode ALL PAUSE
the training only took 2 hours, Do I need to worry about it? is there something wrong with my samples?

Infinite loop: Haar, LBP, HOG traincascade of opencv stuck

I am trying to build a classifier to detect faces in Thermal images. So I tried training using Haar, LBP and HOG classifiers. I am working with OpenCV 2.4.8 on windows.
opencv_traincascade.exe -data haarcascades -vec pos.vec -bg neg.txt -numPos 250 -numStages 24 -numNeg 900 -w 24 -h 24
I have 307 positive samples in total. The negative samples are of size 75x75. For each of the three cases the training gets stuck at a particular stage-earlier for Haar (stage-12) and later for LBP (stage-14/15). I reduced the number of negatives (upto 200) but that means the training gets stuck at a later stage. The training hasn't progressed since 2 days. No negatives are being consumed and the command window looks like this-
===== TRAINING 14-stage =====
<BEGIN
POS count : consumed 255 : 262
Also
What do POS count consumed and NEG count consumed signify?
When I reduce the minHitRate to say 0.7 why do the number of POS consumed increase?
Please let me know what I am doing wrong.
Thanks.
I had the similar problem myself. The thing is that classifier at each stage takes those negative examples which are classified as positive in the previous stages. So the thing that happens is that none of the negative samples are classified as positive and the code goes in the infinite loop trying to find one. I solved this by changing the source code so that the algorithm terminates after it cant find any negative example and just use the previous stages for the classifier.
If you dont want to change the code try adding more negative examples or reducing the number of stages.
Count consumed is the amount of possitve and negative images that are used in each stages. And you need to use more possitive and negatives images around 1000 positives and 2000 negative to get a good result

Unable to create positive samples from a limited set of samples for Haar training

I am building a Haar classifier. I have a set of 109 positive samples and 3000 negative samples. To increase my number of positive samples (to say 600), I try using the following command:
opencv_createsamples -vec out.vec -w 24 -h 24 -bg bg.txt -num 600 -info positives.dat
But I get the following error message:
positives.dat(109) : parse errorDone. Created 108 samples
How can I "force" opencv to produce the 600 samples from those 109 I have?
Opencv's default function for creating samples can only create as many as you have in your info file. A
lso, you should know if you use -info as opposed to -img it only resizes them to what you specified as h and w and grayscales all the images. It doesn't actually apply any transforms on them or superimpose them on background images. I'm not entirely sure what the point of the function is really...

Resources