Haar-cascade training took very little time and no xml was produced - opencv

I'm trying to train a new haar-cascade for faces.
I have a positive dataset of 2000 cropped face images (just the face) and 3321 negative random images.
I created positive's list using the following command:
opencv_createsamples.exe -info info.txt -vec vector.vec -num 2000 -w 10 -h 10
Where the file info.txt contains the following lines:
AJ_Cook_0001.ppm 1 0 0 64 64
AJ_Lamas_0001.ppm 1 0 0 64 64
Aaron_Eckhart_0001.ppm 1 0 0 64 64
Aaron_Guiel_0001.ppm 1 0 0 64 64
Aaron_Patterson_0001.ppm 1 0 0 64 64
Aaron_Peirsol_0001.ppm 1 0 0 64 64
Afterwords, I ran haar_training using the following command:
opencv_haartraining.exe -data harrcascade -vec vector.vec -bg infofile.txt -nstages 20 -minhitrate 0.9999 -maxfalsealarm 0.5 -npos 2000 -nneg 3321 -w 10 -h 10 -nonsym -mem 1024
Where the file infofile.txt contains the names of the background images:
Training took about only an two hours and no xml file was generated. The folder harrcascade contains 20 folder with a txt file named 'AdaBoostCARTHaarClassifier.txt' but no xml was generated.
I have two questions:
Why did training took so very little time?
Why no xml file was generated?
What am I missing here?

See my answer to your other question. If no xml file was produced, it is very likely that you have run out of positive samples. Try using 1500 instead of 2000.
Better yet, check out trainCascadeObjectDetector, a function in the Computer Vision System Toolbox for Matlab, which lets you generate an xml file compatible with OpenCV.


The Difference between One Hot Encoding and LabelEncoder?

I am working on a ML problem to predict house prices and Zip Code is one feature which will be useful. I am also trying to use Random Forest Regressor to predict the log of the price.
However, should I use One Hot Encoding or Label Encoder for Zip Code? Because I have about 2000 Zip Codes in my dataset and performing One Hot Encoding will expand the columns significantly.
To rephrase: does it make sense to use LabelEncoder instead of One Hot Encoding on Zip Codes
Like the link says:
LabelEncoder can turn [dog,cat,dog,mouse,cat] into [1,2,1,3,2], but
then the imposed ordinality means that the average of dog and mouse is
cat. Still there are algorithms like decision trees and random forests
that can work with categorical variables just fine and LabelEncoder
can be used to store values using less disk space.
And yes, you are right, when you have 2000 categories for zip codes, one hot may blow up your feature set massively. In many cases when I had such problems, I opted for binary encoding and it worked out fine most of the times and hence is worth a shot for you perhaps.
Imagine you have 9 features, and you mark them from 1 to 9 and now binary encode them, you will get:
cat 1 - 0 0 0 1
cat 2 - 0 0 1 0
cat 3 - 0 0 1 1
cat 4 - 0 1 0 0
cat 5 - 0 1 0 1
cat 6 - 0 1 1 0
cat 7 - 0 1 1 1
cat 8 - 1 0 0 0
cat 9 - 1 0 0 1
There you go, you overcome the LabelEncoder problem, and you also get 4 feature columns instead of 8 unlike one hot encoding. This is the basic intuition behind Binary Encoder.
**PS:** Give 2 power 11 is 2048 and you have 2000 categories for zipcodes, you can reduce your feature columns to 11 instead of 1999 in the case of one hot encoding!

OpenCV Error: Bad argument (Can not get new positive sample

I am trying to train my own OpenCV Haar Classifier for cup detection.
I have 100 images which contain cup and 400 images which do not contain cup, So,
No of Positive Images = 100
No. of Negative Images = 400
At first I created dat for both of them by
find ./Negative_Images -name '*.jpg' >negatives.dat
find ./Positive_Images -name '*.jpg' >positives.dat
Next, I run the following command to generate samples (I put value for sample 100 as no of my positive images are 100. Is it right? )
perl createtrainsamples.pl positives.dat negatives.dat samples 100 "opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 80 -h 60"
Now 100 samples (*.jpg.vec) are created in samples folder. Next, I run the following command to generate samples.vac
python ./tools/mergevec.py -v samples/ -o samples.vec
mergevec.py found in the tutorial by mrnugget
Now for the next command is "opencv_haartraining",
opencv_traincascade -data classifier -vec samples.vec -bg negatives.dat -precalcValBufSize 2500 -precalcIdxBufSize 2500 -numPos 100 -numNeg 400 -numStages 15 -minhitrate 0.99 -maxfalsealarm 0.5 -w 80 -h 60
I am receiving error Error: Can not get new positive sample
Someone solved it by numPos = noOfPositiveImages*0.9, But it did not work for me
From different sources, I found a formula to calculate the value for numPose.
vec-file has to contain >= (numPose + (numStages-1) * (1 - minHitRate) * numPose) + S
So far I understand, for me
vec-file has to contain = 100 (As I had 100 positive Images, and from those 100 samples were created)
numStage = 4 (Or it can be any other value, as I want)
minHitRate = 0.99
S = count of samples from vec-file.(Some other place says, the count of all the skipped samples from vec-file (for all stages))
I do not understand, what value should I put for S?
Can anyone explain this formula with example? What value should I put in the command to solve this error?

opencv haartraining didnt finish

i use 1292 positive image its for face
and 5712 negative image
and this my code for harrtraining
C:\opencv\build\x64\vc11\bin\opencv_haartraining.exe -data cascades -vec vector/facevector.vec -bg negative/bg.txt -npos 1033 -nneg 5712 -nstages 20 -nsplits 2 -nonsym -minhitrate 0.999 -maxfalsealarm 0.5 -mem 1024 -mode ALL -w 24 -h 24 PAUSE
its 20 stage and its finish stage 6 on stage 7 it still like this in image
for 1 day
i think its will still like this for ever what the problem
the face on positive image its Clear
and this is a snip of positive image i used it
(source: imagetitan.com)
![haartraining still like this ][1]
(source: imagetitan.com)
![snip of positivae image i used it][1]
Change your min hit rate and maxFalseAlarm rate.
I would suggest using something like 0.4 & 0.95 to get going.
reason for this is that it will take forever for it to hit 0.999 & 0.5 if ever (if finish at all)

Opencv: train cascade image reader

i try to training a classifier, i have create a file .vec whit create sample and it's ok.
Info file name: C:\OpenCV\positive.txt
Img file name: (NULL)
Vec file name: C:\OpenCV\sample.vec
BG file name: (NULL)
Num: 20
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 50
Height: 50
Create training samples from images collection...
Done. Created 20 samples
and now use training.bat, this is the file:
C:\OpenCV\opencv-2_4\build\x86\vc10\bin\opencv_traincascade.exe -data classifier -vec "C:\OpenCV\samples.vec" -bg "C:\OpenCV\negative.txt" -npos 20 -nneg 16 -numStages 4 -minHitRate 0.999 -maxFalseAllarmRate 0.5 -w 74 -h 100 -mode ALL -precalcvalBuffSize 256 -precalcdxBufSize 256
But when i call training.bat in dos give me this error:
Image reader can not be created from -vec C:\OpenCV\samples.vec and -bg C:\OpenCV\negative.txt.
can someone help?
It generally pops when the files do not exist in the directory you are calling, make sure you wrote the file name and path correctly, and make sure the vector file you are calling has the ".vec" extension.

Cascade classifier can't be trained. Check the used training parameters

I need to detect special image (something like symbol +) in scanned document. I'm going to train cascade using opencv_traincascade program (opencv 3.0)
This is my file structure:
This my Bad.dat:
This is my Good.dat (every good file fully contains the special image and nothing more)
Good\1.bmp 1 0 0 60 59
Good\100.bmp 1 0 0 27 28
I've successfuly created vec file.
-info C:\imgs\learn1\Good.dat
-vec samples.vec
-w 10 -h 10
Info file name: C:\imgs\learn1\Good.dat
Img file name: (NULL)
Vec file name: samples.vec
BG file name: (NULL)
Num: 1000
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 10
Height: 10
Create training samples from images collection...
C:\imgs\learn1\Good.dat(101) : parse errorDone. Created 100 samples
This is call and result of opencv_traincascade
-data haarcascade
-vec C:\opencv\build\x64\vc12\bin\samples.vec
-bg C:\imgs\learn1\Bad.dat
-numStages 16
-minhiteate 0.99
-maxFalseAlarmRate 0.5
-numPos 80
-numNeg 199
-w 10
-h 10
-mode ALL
-precalcValBufSize 1024
-precalcIdxBufSize 1024
cascadeDirName: haarcascade
vecFileName: C:\opencv\build\x64\vc12\bin\samples.vec
bgFileName: C:\imgs\learn1\Bad.dat
numPos: 80
numNeg: 199
numStages: 16
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 10
sampleHeight: 10
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: ALL
===== TRAINING 0-stage =====
POS count : consumed 80 : 80
Train dataset for temp stage can not be filled. Branch training terminated.
Cascade classifier can't be trained. Check the used training parameters.
As you can see, there is some error. Can you help me what is wrong exactly? "Check the used training parameters" is very general phrase.
(The folder C:\opencv\build\x64\vc12\bin\haarcascade exists)
I don't know what was wrong, but I've done it.
1)I've increased number of positive examples to 400
2)I've increased number of negative examples to 398
3)I found that if an image size 61 x 60, I shoud write in Good.dat
Good\1.bmp 1 0 0 60 59
(Image coordinates begin from 0 and end at width-1 and height-1 values)
4)I found type error: minhiteate - > minHitRate
and nothing helps...
5)I try to train in openvc 2.4 and i've got my cascade.xml file
But now I can't use it because of other error, but it's offtopic. (now I,m googling)
