OpenCv: size of images to create an xml training haartraining? - opencv

I read this post to create a custom xml file haartraining
positive and negative images must have the same dimensions (width x
height)?
positive images must have the same size as the negative ones?
in createsamples and opencv_traincascade What should I put in the
parameters -h and -w?

negative samples can be any size, the training will select subwindows and scaled versions of the subwindows automatically (because maaaaany negative samples are needed). For each negative image you'll need a line with the path in a .txt file. If you want to use "hard samples" which exactly fit the detector size, you'll have to crop the negative images to the relevant region and resize them manually to fit the target size.
The positive samples can be any size, but you have to prepare a .txt file with information of where inside of the image the object is located.
For example if the image is C:\haartraining\positives\image00001.png and a single object is at roi position (x,y,width,height) then your .txt file must contain the line C:\haartraining\positives\image00001.png 1 x y width height. This width/height can be any size (needn't be the same for each image), it will be scaled later by opencv_createsamples to the target size.
For haar training you'll need a .vec file instead of that .txt file and you can create the .vec file with opencv_createsamples binary. this binary has at least 3 different modes to create samples:
1. Just transform your positive images to a .vec file by scaling and formatting.
2. Create a lot of positive images by providing some background images, a single positive image and constraint about how the sample may be transformed/distorted.
3. Create Test images which can be used to test the training result, by placing positive images inside of some bigger scene background images.
I only have experience with 1. so I can't help with the other ones. You just enter the positives text file and the target window size together with the number of positive images you have, and opencv_createsamples creates the .vec file which you'll need for training with opencv_trainingcascade.
In opencv_traincascade you must take care to provide good number for numPos and numNeg since in each stage you'll "lose" samples, which can be a bit confusing without experience. For example if your vec file contains 8000 positive images, you can't tell opencv_haartraining to use 8000 positive images in each stage, because with a minHitRate of for example 0.995 there might only be 7960 positive samples be left after the first stage (worst case). Same for negative samples, but there it's not so easy to tell how many samples you effectively HAVE...
afaik, width and height must be the same in createsamples and traincascade! It must be the minimum size that you want to detect objects after training, but it must be big enough to hold the relevant information. Maybe you have to test a bit until you find the optimum size. But unfortunately the size is quite limited because of memory problems during training. For example I couldn't train bigger than approx. 30x30. Afaik face detection was scientifically evaluated to produce good results for either 24x24 without or 20x20 with some tree-splits.
Please tell me whether this answers your question.

Related

What is the optimal image resolution to train an image classifier with CreateML?

I want to create an image classifier model using CreateML. I have images available in very high resolution but that comes at a cost in terms of data traffic and processing time, so I prefer to use images as small as possible.
The docs say that:
The images (...) don’t have to be a particular size, nor do they need to be the same size as each other. However, it’s best to use images that are at least 299 x 299 pixels.
I trained a test model with images of various sizes > 299x299px and the model parameters in Xcode show the dimension 299x299px which I understand is the normalized image size:
This dimension seems to be determined by the CreateML Image Classifier algorithm and is not configurable.
Does it make any sense to train the model with images that are larger than 299x299px?
If the image dimensions are not a square (same height as width) will the training image be center cropped to 299x299px during the process of normalization, or will the parts of the image that are outside the square influence the model?
From reading and experience training image classification models (but no direct inside Apple knowledge), it appears that Create ML scales incoming images to fit a square image 299 x 299. You would be wasting disk space and preprocessing time by providing larger images.
The best documentation I can find is to look at the mlmodel file created by CreateML for an image classifier template. The input is explicitly defined as color image 299 x 299. No option to change that setting in the stand-alone app.
Here is some documentation (applies to Classifer template which uses ScenePrint by default):
https://developer.apple.com/documentation/createml/mlimageclassifier/featureextractortype/sceneprint_revision
There may be a Center/Crop option in the Playground workspace, but I never found it in the standalone app version of Create ML.

How to prepare images for Haar Cascade? Positive vs Training samples

I am preparing to classify my own object using openCV Haar Cascade. I understand that negative images are photos without your object. Positive images are with you object included. The part that confuses me is how my positive images need to be setup. I have read numerous explanations. Its still a bit confusing to me. I've read 3 different methods on preparing samples.
1) Positive images are actual(take up full size of image) and converted to .vec file.
2) Images are apart of background and object box dimension are noted in file then converted a .vec file
3) Positive image is distorted and added to negative background.
Here are some links of articles I've read
https://www.academia.edu/9149928/A_complete_guide_to_train_a_cascade_classifier_filter
https://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
http://note.sonots.com/SciSoftware/haartraining.html#w0a08ab4
Do I crop my positive images for training or do I keep as is and include the rectangle dimension of the object within the image?

keras - flow_from_directory function - target_size parameter

Keras has this function called flow_from_directory and one of the parameters is called target_size. Here is the explanation for it:
target_size: Tuple of integers (height, width), default: (256, 256).
The dimensions to which all images found will be resized.
The thing that is unclear to me is whether it is just cropping the original image into 256x256 matrix (in this case we do not take the entire image) or it is just reducing the resolution of the image (while still showing us the entire image)?
If it is -let's say - just reducing the resolution:
Assume that I have some xray images with the size 1024x1024 each (for breast cancer detection). And if I want to apply transfer learning to a pretrained Convolutional Neural Network which only takes 224x224 input images, will I not be loosing important data/information when I reduce the size of the image (and resolution) from 1024x1024 down to 224x224? Isn't there any such risk?
Thank you in advance!
Reducing the resolution (risizing)
Yes, you are loosing data
The best way for you is to rebuild your CNN to work with your original image size, i.e. 1024*1024
It is reducing the resolution of the image (while still showing us the entire image)
That is true that you are losing data, but you can work with an image size a bit larger than 224224 like 512 * 512 512 as it will keep most of the information and will train in comparatively less time and resources than the original image(10241024).

Haar Classifier positive image set clarification

Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.

OpenCV positive samples dimensions?

So I've come across lots of tutorials about OpenCV's haartraining and cascaded training tools. In particular I'm interested in training a car classifier using the createsamples tool but there seem to be conflicting statements all over the place regarding the -w and -h parameters, so I'm confused.
I'm referring to the command:
$ createsamples -info samples.dat -vec samples.vec -w 20 -h 20
I have the following three questions:
I understand that the aspect ratio of the positive samples should be the same as the aspect ratio you get from the -w and -h parameters above. But do the -w and -h parameters of ALL of the positive samples have to be the same size, as well? Eg. I have close to 1000 images. Do all of them have to be the same size after cropping?
If it is not the size but the aspect ratio that matters, then how precisely matching must the aspect ratio be of the positive samples, compared to the -w and -h parameters mentioned in the OpenCV tools? I mean, is the classifier very sensitive, so that even a few pixels off here and there would affect its performance? Or would you say that it's safe to work with images as long as they're all approximately the same ratio by eye.
I have already cropped several images to the same size. But in trying to make them all the same size, some of them have a bit more background included in the bounding boxes than others, and some have slightly different margins. (For example, see the two images below. The bigger car takes up more of the image, but there's a wider margin around the smaller car). I'm just wondering if having a collection of images like this is fine, or if it will lower the accuracy of the classifier and that I should therefore ensure tighter bounding boxes around all objects of interest (in this case, cars)?
First Question: Yes, all the images to be used for training have to be the same size. (at least for the last time I did face detection sample training. Should be the same here. If I am not wrong, there will be an error if the images are not of same size. But u can try it out and see if time permits.)
Second Question: Not really sure what you are asking here. But the classifier is not that sensitive as u think. A few pixels off the object of interest, let's say the hand for instance, if the little finger is missing a few pixels(due to cropping) and other images have few pixels missing for the thumb, etc... the classifier will still be able to detect the hand. So a few pixels missing here and there or a few background pixels added in, will not affect the classifier much at the end of the day.
Third Question: You should crop the image to consist of the car only for maximum result. try eliminate as much background as possible. I did a research based on samples with noisy background, black background and cropped samples with minimum background. Cropped samples with minimum background shows the best results in terms of false positive and false negative, from what I remember.
U can use object marker to do it: http://achuwilson.wordpress.com/2011/02/13/object-detection-using-opencv-using-haartraining/
The tedious way would be to use paint to resize all the image to the same pixel value after cropping.
This link should also answer your question: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
I also agree with GilLevi that there are much better detection methods compared to Haar, HoG, LBP cascade. training of the images can take days(depends on number of images trained). If you really have to use the cascade methods and you are looking to minimise training time,
training with Haar-like features takes much longer than with HoG or LBP. But results wise, I am not really sure which will ensure better performance and robustness.
Hope my answer helped you. Should there be more questions, do comment.

Resources