Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.
Related
I have image patches from DDSM Breast Mammography that are 150x150 in size. I would like to augment my dataset by randomly cropping these images 2x times to 120x120 size. So, If my dataset contains 6500 images, augmenting it with random crop should get me to 13000 images. Thing is, I do NOT want to lose potential information in the image and possibly change ground truth label.
What would be best way to do this? Should I crop them randomly from 150x150 to 120x120 and hope for the best or maybe pad them first and then perform the cropping? What is the standard way to approach this problem?
If your ground truth contains the exact location of what you are trying to classify, use the ground truth to crop your images in an informed way. I.e. adjust the ground truth, if you are removing what you are trying to classify.
If you don't know the location of what you are classifying, you could
attempt to train a classifier on your un-augmented dataset,
find out, what the regions of the images are that your classifier reacts to,
make note of these location
crop your images in an informed way
train a new classifier
But how do you "find out, what regions your classifier reacts to"?
Multiple ways are described in Visualizing and Understanding Convolutional Networks by Zeiler and Fergus:
Imagine your classifier classifies breast cancer or no breast cancer. Now simply take an image that contains positive information for breast cancer and occlude part of the image with some blank color (see gray square in image above, image by Zeiler et al.) and predict cancer or not. Now move the occluded square around. In the end you'll get rough predictions scores for all parts of your original image (see (d) in the image above), because when you covered up the important part that is responsible for a positive prediction, you (should) get a negative cancer prediction.
If you have someone who can actually recognize cancer in an image, this is also a good way to check for and guard against confounding factors.
BTW: You might want to crop on-the-fly and randomize how you crop even more to generate way more samples.
If the 150x150 is already the region of interest (ROI) you could try the following data augmentations:
use a larger patch, e.g. 170x170 that always contains your 150x150 patch
use a larger patch, e.g. 200x200, and scale it down to 150x150
add some gaussian noise to the image
rotate the image slightly (by random amounts)
change image contrast slightly
artificially emulate whatever other (image-)effects you see in the original dataset
I am preparing to classify my own object using openCV Haar Cascade. I understand that negative images are photos without your object. Positive images are with you object included. The part that confuses me is how my positive images need to be setup. I have read numerous explanations. Its still a bit confusing to me. I've read 3 different methods on preparing samples.
1) Positive images are actual(take up full size of image) and converted to .vec file.
2) Images are apart of background and object box dimension are noted in file then converted a .vec file
3) Positive image is distorted and added to negative background.
Here are some links of articles I've read
https://www.academia.edu/9149928/A_complete_guide_to_train_a_cascade_classifier_filter
https://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
http://note.sonots.com/SciSoftware/haartraining.html#w0a08ab4
Do I crop my positive images for training or do I keep as is and include the rectangle dimension of the object within the image?
I am working on " Controlling Raspberry Pi's GPIO pins according to change in traffic lights (Red, Green, Yellow)". Right now, I am focusing only on Traffic light detection part. For that, I am using Cascade classifier for Haar features.
I have 2000 negative sample images, which I have converted to grayscale and reshaped to 120 X 120. Also, I have ONE positive image of traffic signal (40 X 120), from which I am generating 2000 positive samples. And finally, I am training my classifier using 2000 positive samples and 1000 negative samples with 10 stages.
My output for some test images looks like following:
Output 1
output 2
Output 3
Image from which I created positive samples:
positive image
I have some questions/doubts and need some suggestions to improve or modify my classifier.
1) Do I need to use more than one image as positive image to create samples?
2) Why I am not able to detect all the traffic signals in above images?
3) Am I doing wrong in image shape or anything?
4) Please correct me in this point if I am wrong - To draw a rectangle over traffic signal, I am using cv2.rectangle function and provided constant height/width parameter, and thats the ONLY reason why it is drawing a big rectangle regardless of how near/far my traffic signal is in an image! Any suggestions to change this size dynamically?
Thank you.
To me, it looks like your network has not learned enough.
1) I strongly suggest taking 20-50 samples of traffic lights, instead of one sample. You still can generate thousands of samples using them, for training.
2) most likely because of inadequate training, but you should also check the parameters in the detection stage. What are the minimum and maximum sizes that you have set for detection?
3) You don't have to re-shape or re-size the image, so that should not be a problem.
The detector returns the position (x,Y) and the size (width, height) of all objects that were detected. So you should be able to change the size dynamically instead of using constant width and height. Please refer to the opencv example of Haar Face Detection, of the language of your choice.
I have a question regarding the preprocessing step "Image mean subtraction".
I use the UCSD Dataset for my training.
One popular preprocessing step is the mean subtraction. Now I wonder if I am doing it right.
What I am doing is the following:
I have 200 gray scaled Train images
I put all images in a list and compute the mean with numpy:
np.mean(ImageList, axis=0)
This returns me a mean image
Now I subtract the mean image from all Train images
When I now visualize my preprocessed train images they are mostly black and have also negative values in them.
Is this correct? Or is my understanding of subtracting the mean image incorrect?
Here is one of my training images:
And this is the "mean image":
It seems like you are doing it right.
As for the negative values: they are to be expected. Your original images had intensity values in range [0..1], once you subtract the mean (that should be around ~0.5), you should have values in range (roughly) [-.5..0.5].
Please note that you should save the "mean image" you got for test time as well: once you wish to predict using the trained net you need to subtract the same mean image from the test image.
Update:
In your case (static camera) the mean subtracted removes the "common" background. These settings seems to be in your favor as they focus the net to the temporal changes in the frame. this method will work well for you, as long as you test on the same image set (i.e., frames from the same static camera).
I read this post to create a custom xml file haartraining
positive and negative images must have the same dimensions (width x
height)?
positive images must have the same size as the negative ones?
in createsamples and opencv_traincascade What should I put in the
parameters -h and -w?
negative samples can be any size, the training will select subwindows and scaled versions of the subwindows automatically (because maaaaany negative samples are needed). For each negative image you'll need a line with the path in a .txt file. If you want to use "hard samples" which exactly fit the detector size, you'll have to crop the negative images to the relevant region and resize them manually to fit the target size.
The positive samples can be any size, but you have to prepare a .txt file with information of where inside of the image the object is located.
For example if the image is C:\haartraining\positives\image00001.png and a single object is at roi position (x,y,width,height) then your .txt file must contain the line C:\haartraining\positives\image00001.png 1 x y width height. This width/height can be any size (needn't be the same for each image), it will be scaled later by opencv_createsamples to the target size.
For haar training you'll need a .vec file instead of that .txt file and you can create the .vec file with opencv_createsamples binary. this binary has at least 3 different modes to create samples:
1. Just transform your positive images to a .vec file by scaling and formatting.
2. Create a lot of positive images by providing some background images, a single positive image and constraint about how the sample may be transformed/distorted.
3. Create Test images which can be used to test the training result, by placing positive images inside of some bigger scene background images.
I only have experience with 1. so I can't help with the other ones. You just enter the positives text file and the target window size together with the number of positive images you have, and opencv_createsamples creates the .vec file which you'll need for training with opencv_trainingcascade.
In opencv_traincascade you must take care to provide good number for numPos and numNeg since in each stage you'll "lose" samples, which can be a bit confusing without experience. For example if your vec file contains 8000 positive images, you can't tell opencv_haartraining to use 8000 positive images in each stage, because with a minHitRate of for example 0.995 there might only be 7960 positive samples be left after the first stage (worst case). Same for negative samples, but there it's not so easy to tell how many samples you effectively HAVE...
afaik, width and height must be the same in createsamples and traincascade! It must be the minimum size that you want to detect objects after training, but it must be big enough to hold the relevant information. Maybe you have to test a bit until you find the optimum size. But unfortunately the size is quite limited because of memory problems during training. For example I couldn't train bigger than approx. 30x30. Afaik face detection was scientifically evaluated to produce good results for either 24x24 without or 20x20 with some tree-splits.
Please tell me whether this answers your question.