How to prepare images for Haar Cascade? Positive vs Training samples - opencv

I am preparing to classify my own object using openCV Haar Cascade. I understand that negative images are photos without your object. Positive images are with you object included. The part that confuses me is how my positive images need to be setup. I have read numerous explanations. Its still a bit confusing to me. I've read 3 different methods on preparing samples.
1) Positive images are actual(take up full size of image) and converted to .vec file.
2) Images are apart of background and object box dimension are noted in file then converted a .vec file
3) Positive image is distorted and added to negative background.
Here are some links of articles I've read
https://www.academia.edu/9149928/A_complete_guide_to_train_a_cascade_classifier_filter
https://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
http://note.sonots.com/SciSoftware/haartraining.html#w0a08ab4
Do I crop my positive images for training or do I keep as is and include the rectangle dimension of the object within the image?

Related

What is the optimal image resolution to train an image classifier with CreateML?

I want to create an image classifier model using CreateML. I have images available in very high resolution but that comes at a cost in terms of data traffic and processing time, so I prefer to use images as small as possible.
The docs say that:
The images (...) don’t have to be a particular size, nor do they need to be the same size as each other. However, it’s best to use images that are at least 299 x 299 pixels.
I trained a test model with images of various sizes > 299x299px and the model parameters in Xcode show the dimension 299x299px which I understand is the normalized image size:
This dimension seems to be determined by the CreateML Image Classifier algorithm and is not configurable.
Does it make any sense to train the model with images that are larger than 299x299px?
If the image dimensions are not a square (same height as width) will the training image be center cropped to 299x299px during the process of normalization, or will the parts of the image that are outside the square influence the model?
From reading and experience training image classification models (but no direct inside Apple knowledge), it appears that Create ML scales incoming images to fit a square image 299 x 299. You would be wasting disk space and preprocessing time by providing larger images.
The best documentation I can find is to look at the mlmodel file created by CreateML for an image classifier template. The input is explicitly defined as color image 299 x 299. No option to change that setting in the stand-alone app.
Here is some documentation (applies to Classifer template which uses ScenePrint by default):
https://developer.apple.com/documentation/createml/mlimageclassifier/featureextractortype/sceneprint_revision
There may be a Center/Crop option in the Playground workspace, but I never found it in the standalone app version of Create ML.

How do we transfer a particular structure(defined by mask) in an image to another image?

Image showing required content transfer between source and target images
Essentially is there a way to grow an image patch defined by mask and keep it realistic?
For your first question, you are talking about image style transfer. In that case, CNNs may help you.
For the second, if I understand correctly, by growing you mean introducing variations in the image patch while keeping it realistic. If that's the goal, you may use GANs for generating images, provided you have a reasonable sized dataset to train with:
Image Synthesis with GANs
Intuitively, conditional GANs model the joint distribution of the input dataset (which in your case, are images you want to imitate) and can draw new samples (images) from the learned distribution, thereby allowing you to create more images having similar content.
Pix2Pix is the open-source code of a well-known paper that you can play around to generate images. Specifically, let X be your input image and Y be a target image. You can train the network and feed X to observe the output O of the generator. Thereafter, by tweaking the architecture a bit or by changing the skip connections (read the paper) train again and you can generate variety in the output images O.
Font Style Transfer is an interesting experiment with text on images (rather than image on image, as in your case).

Haar Classifier positive image set clarification

Could you please help understand several points related to Haar Classifier training:
1) Should positive image contain only the training object or they can contain some other objects in it? Like I want to recognize some traffic sign, should the positive image contain only traffic sign or it can contain highway also?
2) There are 2 ways of creating samples vector file, one is using info file, which contains the detected object coordinates in positive image, another just giving the list of positives and negatives. Which one is better?
3) How usually you create info file, which contains the detected object coordinates in positive image? Can image clipper generate object cordinates?
And does dlib histogram of adaptive gradient provides better results than Haar classifier?
My target is traffic sign detection in raspberry pi.
Thanks
the positive sample (not necessarily the image) should contain only the object. Sometimes it is not possible to get the right aspect ratio for each positive sample, then you would either add some background or crop some of the object boundary. The final detector will detect regions of your positive sample aspect ratio, so if you use a lot of background around all of your positive samples, your final detector will probably not detect a region of your traffix sign, but a region with a lot of background around your traffic sign.
Afaik, the positive samples must be provided by a .vec file which is created with opencv_createsamples.exe and you'll need a file with the description (where in the images are your positive samples?). I typically go the way that I preprocess my labeled training samples, crop away all the background, so that there are only intermediate images where the positive sample fills the whole image and the image is already the right aspect ratio. I fill a text file with basically "folder/filename.png 0 0 width height" for each of those intermediate images and then create a .vec file from that intermediate images. But the other way, using a real roi information out of full-size images should be of same quality.
Be aware that if you don't fix the same aspect ratio for each positive sample, you'll stretch your objects, which might or might not be a problem in your task.
And keep in mind, that you can create additional positive samples from warping/transforming your images. opencv_createsamples can do that for you, but I never really used it, so I'm not sure whether training will benefit from using such samples.

Image preprocessing mean image subtraction

I have a question regarding the preprocessing step "Image mean subtraction".
I use the UCSD Dataset for my training.
One popular preprocessing step is the mean subtraction. Now I wonder if I am doing it right.
What I am doing is the following:
I have 200 gray scaled Train images
I put all images in a list and compute the mean with numpy:
np.mean(ImageList, axis=0)
This returns me a mean image
Now I subtract the mean image from all Train images
When I now visualize my preprocessed train images they are mostly black and have also negative values in them.
Is this correct? Or is my understanding of subtracting the mean image incorrect?
Here is one of my training images:
And this is the "mean image":
It seems like you are doing it right.
As for the negative values: they are to be expected. Your original images had intensity values in range [0..1], once you subtract the mean (that should be around ~0.5), you should have values in range (roughly) [-.5..0.5].
Please note that you should save the "mean image" you got for test time as well: once you wish to predict using the trained net you need to subtract the same mean image from the test image.
Update:
In your case (static camera) the mean subtracted removes the "common" background. These settings seems to be in your favor as they focus the net to the temporal changes in the frame. this method will work well for you, as long as you test on the same image set (i.e., frames from the same static camera).

OpenCv: size of images to create an xml training haartraining?

I read this post to create a custom xml file haartraining
positive and negative images must have the same dimensions (width x
height)?
positive images must have the same size as the negative ones?
in createsamples and opencv_traincascade What should I put in the
parameters -h and -w?
negative samples can be any size, the training will select subwindows and scaled versions of the subwindows automatically (because maaaaany negative samples are needed). For each negative image you'll need a line with the path in a .txt file. If you want to use "hard samples" which exactly fit the detector size, you'll have to crop the negative images to the relevant region and resize them manually to fit the target size.
The positive samples can be any size, but you have to prepare a .txt file with information of where inside of the image the object is located.
For example if the image is C:\haartraining\positives\image00001.png and a single object is at roi position (x,y,width,height) then your .txt file must contain the line C:\haartraining\positives\image00001.png 1 x y width height. This width/height can be any size (needn't be the same for each image), it will be scaled later by opencv_createsamples to the target size.
For haar training you'll need a .vec file instead of that .txt file and you can create the .vec file with opencv_createsamples binary. this binary has at least 3 different modes to create samples:
1. Just transform your positive images to a .vec file by scaling and formatting.
2. Create a lot of positive images by providing some background images, a single positive image and constraint about how the sample may be transformed/distorted.
3. Create Test images which can be used to test the training result, by placing positive images inside of some bigger scene background images.
I only have experience with 1. so I can't help with the other ones. You just enter the positives text file and the target window size together with the number of positive images you have, and opencv_createsamples creates the .vec file which you'll need for training with opencv_trainingcascade.
In opencv_traincascade you must take care to provide good number for numPos and numNeg since in each stage you'll "lose" samples, which can be a bit confusing without experience. For example if your vec file contains 8000 positive images, you can't tell opencv_haartraining to use 8000 positive images in each stage, because with a minHitRate of for example 0.995 there might only be 7960 positive samples be left after the first stage (worst case). Same for negative samples, but there it's not so easy to tell how many samples you effectively HAVE...
afaik, width and height must be the same in createsamples and traincascade! It must be the minimum size that you want to detect objects after training, but it must be big enough to hold the relevant information. Maybe you have to test a bit until you find the optimum size. But unfortunately the size is quite limited because of memory problems during training. For example I couldn't train bigger than approx. 30x30. Afaik face detection was scientifically evaluated to produce good results for either 24x24 without or 20x20 with some tree-splits.
Please tell me whether this answers your question.

Resources