I am trying to train a cascade to detect an area with specifically structured text (MRZ).
I've gathered 200 positive samples and 572 negative samples.
Trainig went as the following:
opencv_traincascade.exe -data cascades -vec vector/vector.vec -bg bg.txt -numPos 200 -numNeg 572 -numStages 3 -precalcValBufSize 2048 -precalcIdxBufSize 2048 -featureType LBP -mode ALL -w 400 -h 45 -maxFalseAlarmRate 0.8 -minHitRate 0.9988
PARAMETERS:
cascadeDirName: cascades
vecFileName: vector/vector.vec
bgFileName: bg.txt
numPos: 199
numNeg: 572 numStages: 3 precalcValBufSize[Mb] : 2048 precalcIdxBufSize[Mb] : 2048 acceptanceRatioBreakValue : -1 stageType: BOOST featureType: LBP sampleWidth: 400 sampleHeight: 45 boostType: GAB minHitRate: 0.9988 maxFalseAlarmRate: 0.8 weightTrimRate: 0.95 maxDepth: 1 maxWeakCount: 100 Number of unique features given windowSize [400,45] : 8778000
===== TRAINING 0-stage ===== <BEGIN POS count : consumed 199 : 199 NEG count : acceptanceRatio 572 : 1 Precalculation time: 26.994
+----+---------+---------+ | N | HR | FA |
+----+---------+---------+ | 1| 1| 1|
+----+---------+---------+ | 2| 1|0.0244755|
+----+---------+---------+ END>
Training until now has taken 0 days 0 hours 36 minutes 35 seconds.
===== TRAINING 1-stage ===== <BEGIN POS count : consumed 199 : 199 NEG count : acceptanceRatio
0 : 0 Required leaf false alarm rate achieved.
Branch training terminated.
The process was running ~35 minutes and produces a 2 kB file with only 45 lines that seems too small for a good cascade.
Needless to say, it doesn't detect the needed area.
I tried to tune the arguments but to no avail.
I know that it is better to use a larger set of samples, but I think that the result with this samples number should also produce a somewhat reasonable result, not so accurate though.
Is a haar cascade a good approach for detecting areas with specific text (MRZ)?
If so how better accuracy can be achieved?
Thanks in advance.
you want to produce 3 stages with maximum false alarm rate 0.8 per stage, this means after 3 stages the classifier will have a maximum of 0.8^3 false alarm rate = 0.512 but after your first stage, the classifier already reaches false alarm rate of 0.0244755 which is much better than your final aim (0.512) so the classifier is already good enough and does not need any more stages.
If that's not fine for you, increase numStages or decrease maxFalseAlarmRate to some amount that you don't reach the "final quality" within your first stage.
You will probably have to collect more samples and samples that represent the environment better, reaching such low false alarm rates is typically a sign for bad training data (too simple or too similar?).
I can't tell you, whether haar cascades are appropriate for solving your task.
Related
I have made a classifier before and didn't have any issues with the opencv_traincascade. I set the numstage at 10, and should expect training 9-stage. However, it surpasses 10 and got killed at training 16-stage.
I looked at my parameters and noticed that the numstage was 20 instead of 10... as what I have shown below.
May someone explain, what I am doing wrong? Why is the parameters saying numstage 20 when I only wanted 10?
/workspace$ opencv_traincascade -data data -vec p.vec -bg bg2.txt -numPos 250 -numNeg 800 numstages 10 -w 50 -h 150
Training parameters are pre-loaded from the parameter file in data folder!
Please empty this folder if you want to use a NEW set of training parameters.
PARAMETERS:
cascadeDirName: data
vecFileName: p.vec
bgFileName: bg2.txt
numPos: 250
numNeg: 800
numStages: 20 <-- *******THIS ONE!********
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 50
sampleHeight: 150
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: BASIC
Stages 0-15 are loaded
===== TRAINING 16-stage =====
<BEGIN
POS count : consumed 250 : 260
you missed the "-" before numstages (and maybe capital letter, not sure) so the application uses default value 20
please try
opencv_traincascade -data data -vec p.vec -bg bg2.txt -numPos 250 -numNeg 800 -numStages 10 -w 50 -h 150
Background: I am trying to train my own OpenCV Haar Classifier for face detection. I am working on a VM with Ubuntu 16.04, my working directory has 2 sub-directories: face containing 2429 images of positives, non-face containing 4548 images of negatives. All images are png, gray scale and have both width and height 19 pixels. I have generated a file positives.info that contains the absolute path to every positive image followed by " 1 0 0 18 18", like so:
/home/user/ML-Trainer/face/face1.png 1 0 0 18 18
/home/user/ML-Trainer/face/face2.png 1 0 0 18 18
/home/user/ML-Trainer/face/face3.png 1 0 0 18 18
and another file negatives.txt that contains the absolute path to every positive image
/home/user/ML-Trainer/non-face/other1.png
/home/user/ML-Trainer/non-face/other2.png
/home/user/ML-Trainer/non-face/other3.png
First I ran the following command:
opencv_createsamples -info positives.info -vec positives.vec -num 2429 -w 19 -h 19
and I get the positives.vec as expected, I then created a empty directory data and ran the following:
opencv_traincascade -data data -vec positives.vec -bg negatives.txt -numPos 2429 -numNeg 4548 -numStages 10 -w 19 -h 19 &
It seems to run smoothly:
PARAMETERS:
cascadeDirName: data
vecFileName: positives.vec
bgFileName: negatives.txt
numPos: 2429
numNeg: 4548
numStages: 10
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 19
sampleHeight: 19
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: BASIC
Number of unique features given windowSize [19,19] : 63960
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 2429 : 2429
NEG count : acceptanceRatio 4548 : 1
Precalculation time: 13
+----+---------+---------+
| N | HR | FA |
+----+---------+---------+
| 1| 1| 1|
+----+---------+---------+
| 2| 1| 1|
+----+---------+---------+
| 3| 0.998765| 0.396218|
+----+---------+---------+
END>
Training until now has taken 0 days 0 hours 1 minutes 7 seconds.
But then I get the following error:
===== TRAINING 1-stage =====
<BEGIN
POS current samplOpenCV Error: Bad argument (Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
) in get, file /home/user/opencv-3.4.0/apps/traincascade/imagestorage.cpp, line 158
terminate called after throwing an instance of 'cv::Exception'
what(): /home/user/opencv-3.4.0/apps/traincascade/imagestorage.cpp:158: error: (-5) Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
in function get
How do I solve this:
samplOpenCV Error: Bad argument
Any help would be greatly appreciated.
EDIT:
I have modified -numPos to a smaller number: 2186 (0.9 * 2429), I did this after reading this answer and it got me to
===== TRAINING 3-stage =====
and it gives me the same error. How should I tune the parameters for the opencv_createsamples command?
I eventually managed to make it work by respecting this formula:
vec-file >= (numPos + (numStages-1) * (1-minHitRate) * (numPose) + S)
numPose - number of positive samples which is used to train each stage
numStages - the count of stages which a cascade classifier will have after the training
S - the count of all the skipped samples from vec-file (for all stages)
so I've searched this online and this is a pretty common error but I've tried the given solutions to no avail. My cmd log is:
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata>opencv_traincascade -data my_trained -vec positives.vec -bg negativedata.txt -numPos 30 -numNeg 76 -numStages 15 -minHitRate 0.995 -w 197 -h 197 -featureType LBP -precalcValBufSize 1024 -precalcIdxBufSize 1024
PARAMETERS:
cascadeDirName: my_trained
vecFileName: positives.vec
bgFileName: negativedata.txt
numPos: 30
numNeg: 76
numStages: 15
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: LBP
sampleWidth: 197
sampleHeight: 197
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
Number of unique features given windowSize [197,197] : 41409225
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 30 : 30
Train dataset for temp stage can not be filled. Branch training terminated.
Cascade classifier can't be trained. Check the used training parameters.
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata>
and my negativedata.txt file has 76 lines of info in the form:
negatives/1411814567410.jpg 1 2 2 199 199
negatives/20131225_192702.jpg 1 2 2 199 199
negatives/20131225_193214.jpg 1 2 2 199 199
negatives/20131225_193325.jpg 1 2 2 199 199
negatives/20131225_193327.jpg 1 2 2 199 199
negatives/20131225_193328.jpg 1 2 2 199 199
Please can someone help me pinpoint the issue because I'm still not sure why I'm getting this error. I'm doing this on a windows system. Thank you.
Found out the issue, apparently the bg file shouldn't contain constraints so now my file is in the form
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/ff.JPG
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/fifa.JPG
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/fred.JPG
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG-20140718-WA0008-1.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG-20150102-WA0013.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG-20150120-WA0005.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG_20140109_012313.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG_20140405_205621.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG_20140405_214225.jpg
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG_20140405_214225_transparent.png
C:\Users\kosyn_000\Dropbox\OpenCVtrainingdata\negatives/IMG_20140405_214225_transparent_small.png
and it outputted my xml file fine; albeit taking a bit of time. Lol I can't believe it was something so simple holding me back.
I need to detect special image (something like symbol +) in scanned document. I'm going to train cascade using opencv_traincascade program (opencv 3.0)
This is my file structure:
C:\imgs\learn1
Bad
1.bmp
....
Good
1.bmp
....
Bad.dat
Good.dat
This my Bad.dat:
Bad\1.bmp
...
Bad\53.bmp
Bad\img001.jpg
...
Bad\img146.jpg
This is my Good.dat (every good file fully contains the special image and nothing more)
Good\1.bmp 1 0 0 60 59
...
Good\100.bmp 1 0 0 27 28
I've successfuly created vec file.
C:\opencv\build\x64\vc12\bin>opencv_createsamples.exe
-info C:\imgs\learn1\Good.dat
-vec samples.vec
-w 10 -h 10
Info file name: C:\imgs\learn1\Good.dat
Img file name: (NULL)
Vec file name: samples.vec
BG file name: (NULL)
Num: 1000
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 10
Height: 10
Create training samples from images collection...
C:\imgs\learn1\Good.dat(101) : parse errorDone. Created 100 samples
This is call and result of opencv_traincascade
C:\opencv\build\x64\vc12\bin>
-opencv_traincascade.exe
-data haarcascade
-vec C:\opencv\build\x64\vc12\bin\samples.vec
-bg C:\imgs\learn1\Bad.dat
-numStages 16
-minhiteate 0.99
-maxFalseAlarmRate 0.5
-numPos 80
-numNeg 199
-w 10
-h 10
-mode ALL
-precalcValBufSize 1024
-precalcIdxBufSize 1024
PARAMETERS:
cascadeDirName: haarcascade
vecFileName: C:\opencv\build\x64\vc12\bin\samples.vec
bgFileName: C:\imgs\learn1\Bad.dat
numPos: 80
numNeg: 199
numStages: 16
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 10
sampleHeight: 10
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: ALL
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 80 : 80
Train dataset for temp stage can not be filled. Branch training terminated.
Cascade classifier can't be trained. Check the used training parameters.
As you can see, there is some error. Can you help me what is wrong exactly? "Check the used training parameters" is very general phrase.
(The folder C:\opencv\build\x64\vc12\bin\haarcascade exists)
I don't know what was wrong, but I've done it.
1)I've increased number of positive examples to 400
2)I've increased number of negative examples to 398
3)I found that if an image size 61 x 60, I shoud write in Good.dat
Good\1.bmp 1 0 0 60 59
(Image coordinates begin from 0 and end at width-1 and height-1 values)
4)I found type error: minhiteate - > minHitRate
and nothing helps...
5)I try to train in openvc 2.4 and i've got my cascade.xml file
But now I can't use it because of other error, but it's offtopic. (now I,m googling)
I am using OpenCV 2.4.3 on Ubuntu 12.10 64bit and when I run opencv_training I get an error message shown below. The training continues so I don't think it is a critical error but nonetheless it blatantly says 'Error'. I can't seem to find any solutions for this - what does it mean ( what is AdaBoost ) , why is it complaining about a 'misclass' , and how can I fix it? Anything I found on Google referred to this as simply a 'warning' and basically to forget about it. Thanks!
cd dots ; nice -20 opencv_haartraining -data dots_haarcascade -vec samples.vec -bg negatives.dat -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 13 -nneg 10 -w 10 -h 10 -nonsym -mem 4000 -mode ALL
Data dir name: dots_w10_h10_haarcascade
Vec file name: samples.vec
BG file name: negatives.dat, is a vecfile: no
Num pos: 13
Num neg: 10
Num stages: 20
Num splits: 2 (tree as weak classifier)
Mem: 4000 MB
Symmetric: FALSE
Min hit rate: 0.999000
Max false alarm rate: 0.500000
Weight trimming: 0.950000
Equal weights: FALSE
Mode: ALL
Width: 10
Height: 10
Applied boosting algorithm: GAB
Error (valid only for Discrete and Real AdaBoost): misclass
Max number of splits in tree cascade: 0
Min number of positive samples per cluster: 500
Required leaf false alarm rate: 9.53674e-07
Stage 0 loaded
Stage 1 loaded
Stage 2 loaded
Stage 3 loaded
Stage 4 loaded
Stage 5 loaded
Stage 6 loaded
Stage 7 loaded
Tree Classifier
Stage
+---+---+---+---+---+---+---+---+
| 0| 1| 2| 3| 4| 5| 6| 7|
+---+---+---+---+---+---+---+---+
0---1---2---3---4---5---6---7
Number of features used : 7544
Parent node: 7
*** 1 cluster ***
POS: 13 96 0.135417
I don't think this is an error message, rather it is a print out describing how the algorithm will measure it's internal error rate. In this case it is using misclassification of the examples. Real and discrete adaboost will map input samples onto the output range [0,1] so there is a meaningful way of measuring the inaccuracy of the algorithm. If a different variant of adaboost is being used, this error measure might cease to be meaningful.