I have around 2000 images and I have to find the images which are having the given "L" shaped object in it as shown below.
i have used the below images as positive images
ans the below ones as negatives which is not having the L shaped object.
and created the xml file(classifier) using using opencv_traincascade method of opencv.
But it is giving some false detection as well along with correct detection any way to increase accuracy
edit: some of the false cases
Related
I am preparing to classify my own object using openCV Haar Cascade. I understand that negative images are photos without your object. Positive images are with you object included. The part that confuses me is how my positive images need to be setup. I have read numerous explanations. Its still a bit confusing to me. I've read 3 different methods on preparing samples.
1) Positive images are actual(take up full size of image) and converted to .vec file.
2) Images are apart of background and object box dimension are noted in file then converted a .vec file
3) Positive image is distorted and added to negative background.
Here are some links of articles I've read
https://www.academia.edu/9149928/A_complete_guide_to_train_a_cascade_classifier_filter
https://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
http://note.sonots.com/SciSoftware/haartraining.html#w0a08ab4
Do I crop my positive images for training or do I keep as is and include the rectangle dimension of the object within the image?
Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.
I am working on a project which requires detection of people in a scene.
Initially after running the HOG detector on the original frames a particular background object was being detected as a person on all the frames, giving me 3021 false positive detections.
So I took the logical step to remove the static background by applying a background subtracter (BackgroundSubtractorMOG2) to all the frames.
The resulting frames looked like this:
Then these mask images were added (using bitwise_and) to the original image so the white pixels are replaced the pixels constituting the person.
Sample:
Then I ran the HOG detector on these images which gave the results like this:
As you can see there are a lot of false positive detections for some reason. I thought doing background subtraction will give me better results than using HOG on the original images.
Can someone please tell me why there are so many false positives in this method? And what can be done to improve the detection on background subtracted images?
The problem is that you changed the nature of your image by removing the background. So, the HOG detector was trained with normal images, without artificial black pixels, and now you are feeding it artificially altered images, so it is normal that it will perform in an weird way (still don't understand that detection at the top of the image though..)
If you want to use HOG detector on top of the background subtraction, you should train the HOG classifier with features taken from the background subtracted images.
One thing you can try (if this doesn't kill the performance of you application), is to use HOG detector on both images, with and without background, and accept only detections that overlap significantly on both, this may remove some false positives from both images.
PS: HOG was specially designed to work on raw images by detecting strong edges and test them against an SVM model. By removing background, we are creating artificial edges that kinda defeat the purpose of using HOG. But I think you can use it to remove false detections by doing what I suggested in the previous paragraph.
I am planning on making a cascade detector for a white cup, a red ball, and a blue puck. With how simple these objects are in their shape, I was wondering if there are any parameter differences I will have to have in the training vs finding complex objects such as cars / faces? Also, within the training pos images I have the objects in different lighting conditions and instances where the objects are under shadow.
For training negative images I noticed the image sizes may vary. However, for positive images they MUST be a fixed size.
I plan on using 100x100 pos images to help detect the objects from 20-30 feet, the 200x200 pos images to detect the objects when I am within 5ft / am directly overhead of the object (3 ft off the ground appx). Does this mean that I will have to train 6 different XMLs? 2 for each object as it is trained for 100x100 and 200x200?
Short answer: Yes
Long Answer: Probably:
You have to think about it like this, the classifier is going to build up a set of features for the positive images and then use these to determine whether your detection image is the same or not. If you are drastically moving the angle of your detection, then you are going to need a different classifier.
Let me example with pictures:
If at 20ft away your cup looks like this:
with associated background/lighting etc, then it is going to be a very different classifier if your cup looks like this(maybe 5ft away but different angle):
Now, with all that being said, if you only have larger and smaller versions of your cup, then you may only need one. However you will need a different classifier for each object (cup/ball/puck)
Images not mine - Taken from Google
I am looking for parabolas in some radar data. I am using the OpenCV Haar cascaded classifier. My positive images are 20x20 PNGs where all of the pixels are black, except for those that trace a parabolic shape--one parabola per positive image.
My question is this: will these positives train a classifier to look for black boxes with parabolas in them, or will they train a classifier to look for parabolic shapes?
Should I add a layer of medium value noise to my positive images, or should they be unrealistically crisp and high contrast?
Here is an example of the original data.
Here is an example of my data after I have performed simple edge detection using GIMP. The parabolic shapes are highlighted in the white boxes
Here is one of my positive images.
I figured out a way to do detect parabolas initially using the MatchTemplate method from OpenCV. At first, I was using the Python cv, and later cv2 libraries, but I had to make sure that my input images were 8-bit unsigned integer arrays. I eventually obtained a similar effect with less fuss using scipy.signal.correlate2d( image, template, mode='same'). The mode='same' resizes the output to the size of image. When I was done I performed thresholding, using the numpy.where() function, and opening and closing to eliminate salt and pepper noise using the scipy.ndimage module.
Here's the output, before thresholding.