I am trying to train my own OpenCV Haar Classifier for cup detection.
I have 100 images which contain cup and 400 images which do not contain cup, So,
No of Positive Images = 100
No. of Negative Images = 400
At first I created dat for both of them by
find ./Negative_Images -name '*.jpg' >negatives.dat
find ./Positive_Images -name '*.jpg' >positives.dat
Next, I run the following command to generate samples (I put value for sample 100 as no of my positive images are 100. Is it right? )
perl createtrainsamples.pl positives.dat negatives.dat samples 100 "opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 80 -h 60"
Now 100 samples (*.jpg.vec) are created in samples folder. Next, I run the following command to generate samples.vac
python ./tools/mergevec.py -v samples/ -o samples.vec
mergevec.py found in the tutorial by mrnugget
Now for the next command is "opencv_haartraining",
opencv_traincascade -data classifier -vec samples.vec -bg negatives.dat -precalcValBufSize 2500 -precalcIdxBufSize 2500 -numPos 100 -numNeg 400 -numStages 15 -minhitrate 0.99 -maxfalsealarm 0.5 -w 80 -h 60
I am receiving error Error: Can not get new positive sample
Someone solved it by numPos = noOfPositiveImages*0.9, But it did not work for me
From different sources, I found a formula to calculate the value for numPose.
vec-file has to contain >= (numPose + (numStages-1) * (1 - minHitRate) * numPose) + S
So far I understand, for me
vec-file has to contain = 100 (As I had 100 positive Images, and from those 100 samples were created)
numStage = 4 (Or it can be any other value, as I want)
minHitRate = 0.99
S = count of samples from vec-file.(Some other place says, the count of all the skipped samples from vec-file (for all stages))
I do not understand, what value should I put for S?
Can anyone explain this formula with example? What value should I put in the command to solve this error?
Related
I trained a rank 40 model on the movielens data, but cannot retrieve the weights from the trained model with gd_mf_weights. I'm following the syntax from the VW matrix factorization example but it is giving me errors. Please advise.
Model training call:
vw --rank 40 -q ui --l2 0.1 --learning_rate 0.015 --decay_learning_rate 0.97 --power_t 0 --passes 50 --cache_file movielens.cache -f movielens.reg -d train.vw
Weights generating call:
library/gd_mf_weights -I train.vw -O '/data/home/mlteam/notebooks/Recommenders-master/notebooks/Outputs/movielens' --vwparams '-q ui --rank 40 -i movielens.reg'
Error:
WARNING: model file has set of {-q, --cubic, --interactions} settings stored, but they'll be OVERRIDEN by set of {-q, --cubic, --interactions} settings from command line.
creating quadratic features for pairs: ui
finished run
number of examples = 0
weighted example sum = 0
weighted label sum = 0
average loss = -nan
total feature number = 0
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorboost::program_options::multiple_occurrences >'
what(): option '--rank' cannot be specified more than once
Aborted (core dumped)
If I just run it without specifying rank and interaction variables, it doesn't return the same trained model, since the parameters displayed are different from before.
library/gd_mf_weights -I train.vw -O '/data/home/mlteam/notebooks/Recommenders-master/notebooks/Outputs/movielens' --vwparams '-i movielens.reg'
creating quadratic features for pairs: ui
Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
using no cache
Reading datafile =
num sources = 0
Segmentation fault (core dumped)
If I run weights generation with the entire set of model training parameters, it just ignores my extra parameters (and finishes much faster than 50 passes would take) and returns same weights from a randomly initiated rank 40 model.
library/gd_mf_weights -I train.vw -0 '/data/home/mlteam/notebooks/Recommenders-master/notebooks/Outputs/movielens' --vwparams '--rank 40 -q ui --l2 0.1 --learning_rate 0.015 --decay_learning_rate 0.97 --power_t 0 --passes 50 --cache_file movielens.cache -f movielens.reg -d train.vw'
I have made a classifier before and didn't have any issues with the opencv_traincascade. I set the numstage at 10, and should expect training 9-stage. However, it surpasses 10 and got killed at training 16-stage.
I looked at my parameters and noticed that the numstage was 20 instead of 10... as what I have shown below.
May someone explain, what I am doing wrong? Why is the parameters saying numstage 20 when I only wanted 10?
/workspace$ opencv_traincascade -data data -vec p.vec -bg bg2.txt -numPos 250 -numNeg 800 numstages 10 -w 50 -h 150
Training parameters are pre-loaded from the parameter file in data folder!
Please empty this folder if you want to use a NEW set of training parameters.
PARAMETERS:
cascadeDirName: data
vecFileName: p.vec
bgFileName: bg2.txt
numPos: 250
numNeg: 800
numStages: 20 <-- *******THIS ONE!********
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 50
sampleHeight: 150
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: BASIC
Stages 0-15 are loaded
===== TRAINING 16-stage =====
<BEGIN
POS count : consumed 250 : 260
you missed the "-" before numstages (and maybe capital letter, not sure) so the application uses default value 20
please try
opencv_traincascade -data data -vec p.vec -bg bg2.txt -numPos 250 -numNeg 800 -numStages 10 -w 50 -h 150
i use 1292 positive image its for face
and 5712 negative image
and this my code for harrtraining
C:\opencv\build\x64\vc11\bin\opencv_haartraining.exe -data cascades -vec vector/facevector.vec -bg negative/bg.txt -npos 1033 -nneg 5712 -nstages 20 -nsplits 2 -nonsym -minhitrate 0.999 -maxfalsealarm 0.5 -mem 1024 -mode ALL -w 24 -h 24 PAUSE
its 20 stage and its finish stage 6 on stage 7 it still like this in image
for 1 day
i think its will still like this for ever what the problem
the face on positive image its Clear
and this is a snip of positive image i used it
(source: imagetitan.com)
![haartraining still like this ][1]
(source: imagetitan.com)
![snip of positivae image i used it][1]
Change your min hit rate and maxFalseAlarm rate.
I would suggest using something like 0.4 & 0.95 to get going.
reason for this is that it will take forever for it to hit 0.999 & 0.5 if ever (if finish at all)
I need to detect special image (something like symbol +) in scanned document. I'm going to train cascade using opencv_traincascade program (opencv 3.0)
This is my file structure:
C:\imgs\learn1
Bad
1.bmp
....
Good
1.bmp
....
Bad.dat
Good.dat
This my Bad.dat:
Bad\1.bmp
...
Bad\53.bmp
Bad\img001.jpg
...
Bad\img146.jpg
This is my Good.dat (every good file fully contains the special image and nothing more)
Good\1.bmp 1 0 0 60 59
...
Good\100.bmp 1 0 0 27 28
I've successfuly created vec file.
C:\opencv\build\x64\vc12\bin>opencv_createsamples.exe
-info C:\imgs\learn1\Good.dat
-vec samples.vec
-w 10 -h 10
Info file name: C:\imgs\learn1\Good.dat
Img file name: (NULL)
Vec file name: samples.vec
BG file name: (NULL)
Num: 1000
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 10
Height: 10
Create training samples from images collection...
C:\imgs\learn1\Good.dat(101) : parse errorDone. Created 100 samples
This is call and result of opencv_traincascade
C:\opencv\build\x64\vc12\bin>
-opencv_traincascade.exe
-data haarcascade
-vec C:\opencv\build\x64\vc12\bin\samples.vec
-bg C:\imgs\learn1\Bad.dat
-numStages 16
-minhiteate 0.99
-maxFalseAlarmRate 0.5
-numPos 80
-numNeg 199
-w 10
-h 10
-mode ALL
-precalcValBufSize 1024
-precalcIdxBufSize 1024
PARAMETERS:
cascadeDirName: haarcascade
vecFileName: C:\opencv\build\x64\vc12\bin\samples.vec
bgFileName: C:\imgs\learn1\Bad.dat
numPos: 80
numNeg: 199
numStages: 16
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 10
sampleHeight: 10
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: ALL
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 80 : 80
Train dataset for temp stage can not be filled. Branch training terminated.
Cascade classifier can't be trained. Check the used training parameters.
As you can see, there is some error. Can you help me what is wrong exactly? "Check the used training parameters" is very general phrase.
(The folder C:\opencv\build\x64\vc12\bin\haarcascade exists)
I don't know what was wrong, but I've done it.
1)I've increased number of positive examples to 400
2)I've increased number of negative examples to 398
3)I found that if an image size 61 x 60, I shoud write in Good.dat
Good\1.bmp 1 0 0 60 59
(Image coordinates begin from 0 and end at width-1 and height-1 values)
4)I found type error: minhiteate - > minHitRate
and nothing helps...
5)I try to train in openvc 2.4 and i've got my cascade.xml file
But now I can't use it because of other error, but it's offtopic. (now I,m googling)
I'm trying to train a new haar-cascade for faces.
I have a positive dataset of 2000 cropped face images (just the face) and 3321 negative random images.
I created positive's list using the following command:
opencv_createsamples.exe -info info.txt -vec vector.vec -num 2000 -w 10 -h 10
Where the file info.txt contains the following lines:
AJ_Cook_0001.ppm 1 0 0 64 64
AJ_Lamas_0001.ppm 1 0 0 64 64
Aaron_Eckhart_0001.ppm 1 0 0 64 64
Aaron_Guiel_0001.ppm 1 0 0 64 64
Aaron_Patterson_0001.ppm 1 0 0 64 64
Aaron_Peirsol_0001.ppm 1 0 0 64 64
Afterwords, I ran haar_training using the following command:
opencv_haartraining.exe -data harrcascade -vec vector.vec -bg infofile.txt -nstages 20 -minhitrate 0.9999 -maxfalsealarm 0.5 -npos 2000 -nneg 3321 -w 10 -h 10 -nonsym -mem 1024
Where the file infofile.txt contains the names of the background images:
Bing_000527adc064a067a7f7986f00b140fe.jpg
Bing_002744f85b0bee37f489f43fad5f613f.jpg
Bing_0048e7e5e487203dedba9feb03696b1e.jpg
Bing_00513e8879f4f544717df2c8ea0494b1.jpg
Bing_00543a6cf117f559a05f0fb7e10bd361.jpg
Training took about only an two hours and no xml file was generated. The folder harrcascade contains 20 folder with a txt file named 'AdaBoostCARTHaarClassifier.txt' but no xml was generated.
I have two questions:
Why did training took so very little time?
Why no xml file was generated?
What am I missing here?
Thanks,
Gil.
See my answer to your other question. If no xml file was produced, it is very likely that you have run out of positive samples. Try using 1500 instead of 2000.
Better yet, check out trainCascadeObjectDetector, a function in the Computer Vision System Toolbox for Matlab, which lets you generate an xml file compatible with OpenCV.