I have been trying to train image databases to detect faces using Haar cascades. I have made 2 attempts:
1) I have used the following database for positive images:
http://robotics.csie.ncku.edu.tw/Databases/FaceDetect_PoseEstimate.htm#Our_Database_ (6660 images)
For negative images I have used this database:
https://github.com/sonots/tutorial-haartraining/tree/master/data/negatives (3300 images)
I have used this command to train the samples:
opencv_createsamples -info info.dat -vec samples2.vec -w 32 -h 24 -num 6660
I have used this command to train the data:
opencv_traincascade -data ./classifier3 -vec samples2.vec -bg bg.txt -numPos 6000 -numNeg 12000 -numStages 30 -precalcValBufSize 5120 -precalcIdxBufSize 5120 -numThreads 12 -acceptanceRatioBreakValue 10e-5 -w 32 -h 24 -minHitRate 0.99 -maxFalseAlarmRate 0.5 -mode ALL
The training goes on up to stage 9. Then the acceptanceRatio break value is crossed.(The required acceptanceRatio for the model has been reached to avoid over-fitting of training data. Branch training terminated.)
I don't understand the issue here. I have only used the recommended values for the parameters. I had tried changing the minHitRate to 0.95, yet the result is the same. I can think of some potential reasons:
i) I had used the positive images directly without cropping. But I don't
think that should be an issue, as the background is completely plain.
ii) The image database contains faces in different poses. That could lead
to complications while training. Is it a good idea to train faces
under different poses using the same cascade classifier? Or should I
use different classifiers for each pose?
iii) My negative images might be too different compared to the positive
images. Is that the case? If yes, what kind of negative images should
I be looking for?
I tried testing the cascade.xml file on a few sample images, but nothing is detected at all.
2) Keeping in mind the potential reason i), I used this database already cropped, for positive images: http://conradsanderson.id.au/lfwcrop/ (around 13000 images)
But still the problem persists. This time it trains upto stage 11. In this case I used -numPos as 8000 -numNeg as 20000( increased the ratio to give the training more negative samples), -w as 24 and -h as 24.
Can anyone please guide me here?
Related
I'm trying to train my own Haar Cascade classifier for apple according to this article. I collected 1000 positive images and 500 negative images from the internet. Each image has different sizes and I cropped the images to create "info.txt" file. While I creating samples like this,
createsamples.exe -info positive/info.txt -vec vector/applevector.vec -
num 1000 -w 24 -h 24
there is some parameters -w and -h. What does it mean ? Should I resize all my positive and negative images ? I tried to train my classifier with default parameters (-w 24 and -h 24) but accuracy of my classifier is so weak. Can it be related with this parameters ? Thank you for advice.
UPDATE
There is some examples of my positive images. I collected them from the internet.
I have been fiddling around with OpenCV's cascade trainer in an attempt to train my own classifier. The problem is that it has been training for 25+hours now and it is yet to even pass stage 1.
Initially, I ran it with the following command
nohup opencv_traincascade -data data -vec board.vec -bg bg.txt -numPos 580 -numNeg 1160 -numStages 2 -w 115 -h 153 -featureType LBP &
After about 24 hours, it wasn't able to get through even stage 1. A look into the nohup.out file, I realized that the default precalcValBufSize was set to 1024Mb. I figured maybe increasing this to 4096Mb will help with the processing so I went ahead and re-started the training with the following command
nohup opencv_traincascade -data data -vec board.vec -bg bg.txt -numPos 580 -numNeg 1160 -numStages 2 -w 115 -h 153 -featureType LBP -precalcIdxBufSize 4096 -precalcValBufSize 4096 &
The training has been running for almost 25 hours now and it also hasn't even produced the XML file for stage 0.
A look into the process itself states that its using 8284M of virtual memory but 930M of physical memory and this shows all the files currently in use by the process. Its doing a great job burning through my cores but none at producing any results or even letting me know how far its got.
My question(s) is/are, is there any way of making it use more of my actual physical memory in attempt to speed it up? If not, are there any adjustments I need to make on my training dataset?
Side Note: I know the general standard for dataset size is 24x24 but I already tried that out and it was really horrible even after 10 stages.
At that size, my object's outline no longer attains its features correctly. At 24x24 or even 48x48 it looks like a giant horizontally distorted blob of black pixels without even some of its unique features being visible.
I bet that the problem is in samples size.
The bigger size requires muuuuch bigger memory buffers and much more time. Detecting features is quite a difficult operation.
You have to minimize your samples (don't forget to rerum cv_createsamples(...)). Samples shouldn't be square it may be 25*15 (make sure proportions are saved and the biggest side is about 30px).
You are using featureType LBP which itself is faster than Haar.
I used typical haar-cascade of OpenCV.
And setup stages as 5 in training process,but in xml & cascade folder only 3 stages were found.
Why I got fewer stages than expected?
Any solutions?
Take this example training command:
opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt\
-numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1000\
-numNeg 600 -w 80 -h 40 -mode ALL -precalcValBufSize 1024\
-precalcIdxBufSize 1024
This has a maxFalseAlarmRate of 0.5, when the classifier reaches this value it will finish.
For your problem, I imagine you have set the numStages to 5 but after 3 stages it has reached the maxFalseAlarmRate and completed the training.
In order to confirm/dispell this you would need to provide:
Your training command (as above)
The output from your last training stage.
You most likely have not provided traincascade enough information to learn from. This is most likely because we humans are incredibly lazy and hate to work. It would have kept going if it thought it could learn more from the data you specified.
Take more positives. Remember that you can take multiple images of your object a slightly tilted angles (10º-20º or so). And be sure to provide at least hundreds of your objects, especially if there is quite a bit of variation between your objects, like there are with faces.
If you're still stuck, see this tutorial I wrote that can hopefully help you and others: http://johnallen.github.io/opencv-object-detection-tutorial/
I'm working with OpenCV 2.4.7 on windows. I'm using TrainCascade to train a new Haar cascade for eyeglasses using the following command:
opencv_traincascade -data trainCascade20 -vec vector3.vec -bg infofile3.txt -numStages 40 -minHitRate 0.999 maxFalseAlarmRate 0.5 -numPos 170 -numNeg 1000 -w 20 -h 20 -mode ALL -precalcValBufSize 1024 -precalcIdxBufSize 1024
It's stuck (or progressing very slow) on stage 24 on the phase of getting new negatives. The negative images file "infofile3.txt" contains about 12K negative image.
Can someone please explain why it's progressing so slowly and what can I do make it progress (a lot) faster?
Thanks in advance,
Gil.
Around 24 hours sounds normal to me. Haar training can actually take up to days depending on size and number of samples. And of course on the computer as well. The longest my training took was approximately a week for hand detection.
If you are really worried, to check whether the haar training is still on-going, you can try to generate an intermediate haar cascade xml file, from the data available. If you are able to generate the xml file, it would show that it's still running(albeit slow) and not stuck.
How to improve the haar training speed, the only solution I know or used before is "paralleling"
A quick search on google about that leads to a few link, here's one of them: http://www.computer-vision-software.com/blog/2009/06/parallel-world-of-opencv/
I have used such methods, and it's pretty efficient in cutting the time taken to train the Haar Cascade. So hope this method suits you well. Do try my method of generating an immediate xml file from the current data available first though. If there is any needs, do comment, I try get back to you soon. Cheers.
I am building a Haar classifier. I have a set of 109 positive samples and 3000 negative samples. To increase my number of positive samples (to say 600), I try using the following command:
opencv_createsamples -vec out.vec -w 24 -h 24 -bg bg.txt -num 600 -info positives.dat
But I get the following error message:
positives.dat(109) : parse errorDone. Created 108 samples
How can I "force" opencv to produce the 600 samples from those 109 I have?
Opencv's default function for creating samples can only create as many as you have in your info file. A
lso, you should know if you use -info as opposed to -img it only resizes them to what you specified as h and w and grayscales all the images. It doesn't actually apply any transforms on them or superimpose them on background images. I'm not entirely sure what the point of the function is really...