Data Augmentation - Is shifting needed? - machine-learning

I can understand the power of data augmentation and the different ways of data augmentation like rotating, flipping, normalization, etc.
Is shifting the object around the image really needed? Will there be an difference in results of the convolution?

If you are sure that in the test data the images are always centered, then shifting might not be needed. But in real world, that is not the case.
For example you cant expect a cat to stay always at center of the image. In test data it might appear at any position. Your model will learn better if you consider such cases in your training data.
Note: The image above is just for easy understanding. Data has been augmented using rotation as well, not just shift. But it serves the purpose, so included it. (Image Source)
As far as difference in results is concerned, we can't be sure how significant will the change in the performance be until you try it out. But people find that shift helps improving performance, usually used along with flip, rotate, scale, etc.

Centered Objects doesnot require shifting, but in real-world test data you might have objects that are not centred so in that case it becomes preety imortant.
gen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.15,
zoom_range=0.1,
channel_shift_range=10.,
horizontal_flip=True)

Yes, shifting an object is quite needed.
If the object that you are trying to detect/classify is most of the times around the center of an image, then your model will probably adjust its weights, so that it focuses on searching the center of an image.
You can force your model to search all the regions of an image, by shifting the targeted object around the image. Also, You can improve the model's training by changing the object's shape as well (e.g. you can zoom in the image).
This repository is a quite good introductory of commonly used data augmentations in object detection:
https://github.com/kochlisGit/random-data-augmentations

Related

Recognize objects while falling - viewpoint variation

I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.

Data Augmentation for Object Detection using Deep Learning

I have a question regarding data augmentation for training the deep neural network for object detection.
I have quite limited data set (nearly 300 images). I augmented the data by rotating each image from 0-360 degrees with stepsize of 15 degree. Consequently I got 24 rotated images out of just one. So in total, I got around 7200 images. Then I drew bounding box around the object of interest in each augmented image.
Does it seem to be a reasonable approach to enhance the data?
Best Regards
In order to train a good model you need lots of representative data. Your augmentation is representative only for rotations, so yes, it is a good method, if you are concerned about having not enough object rotations. However, it will not help in any sense with generalization to other objects/transformations.
It seems like you are on the right track, rotation is usually a very useful transformation for augmenting the training data. I would suggest to try other transformations like shift (you most probably want to detect partially present objects), zoom (makes your model invariant to the scale), shear, flip, etc. By combining different transformations you can introduce additional diversity in your training data. Training set of 300 images is a very small number, so you would definitely need more than one transformation to augment so tiny training set.
This is a good approach as long as you don't implicitly change the labels when you do rotation. E.g. An image containing the digit 6 will become digit 9 on rotation of 180 deg. So, you've to pay some attention in such scenarios.
But, you could also do other geometric transformations like scaling, translation
Other augmentation that you can consider is using the pre-trained model such as ImageNet, if your problem domain has some resemblance to the ImageNet data. This will allow you to train deeper models even for your data scarce situation.
Even though rotation increases the representational complexity of your image, it might be not enough. Instead you probably need to add other types of augmentation as well.
Color augmentations are useful if they still represent the real distribution of your data.
Spatial augmentations work very good. Keep in mind that most modern systems use a lot of cropping, so that might help.
Actually I have a few scripts that I am trying to turn into a library that might work for you. Check them https://github.com/lozuwa/impy if you would like to.

Principal Component Analysis and Rotation

I have implemented a PCA in order to assign rotation information to connected 2D points extracted from images (edge fragments, see data points in image below for examples). I want the information to be robustly reproducible under rotation of the data so that I can use it for recognition purposes (comparable to 1). For this purpose, I want the principal components (eigenvectors) to rotate with the points (+- 180 deg).
My implementation includes a mean centring of the data. I have also tested the implementations of OpenCV and one in Python which yield to the same results. This is why I assume that my implementation is correct and that the problem is the method itself. I had quite good results for other 2D distributions. Nonetheless, for these specific data points, it does not seem to work.
I have done all the tests with and without normalization to the standard deviation (ie., dividing the data of the x and y values by their standard deviations).
Here are my results for different rotations of the data (extracted from images):
PCA Results
As can be seen, the method does not allow to find a reproducible rotation. The data is affected by quantization (because it is extracted from images) which is why I had the idea that this is the origin of the problem. Therefore I repeated the experiment with added random noise (4th column). As can be seen, this does not seem to be the problem.
I have no precise idea how to explain the displayed effects. I note that the general orientation of the principal axes seems to be similar in the first and second row, respectively. I think that this means something, but what exactly? Can I somehow solve the problem or are there possibly better methods for such a problem? Due to some preprocessing it can be assumed that there are no outliers.
Thanks for your help!
For symmetrycal shapes like you shown you can try symmetry detector like this: https://github.com/subokita/Sandbox/tree/master/FSD
On examples it give results like this:

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

Using flipped images for machine learning dataset

I'v got a binary classification problem. I'm trying to train a neural network to recognize objects from images. Currently I've about 1500 50x50 images.
The question is whether extending my current training set by the same images flipped horizontally is a good idea or not? (images are not symetric)
Thanks
I think you can do this to a much larger extent, not just flipping the images horizontally, but changing the angle of the image by 1 degree. This will result in 360 samples for every instance that you have in your training set. Depending on how fast your algorithm is, this may be a pretty good way to ensure that the algorithm isn't only trained to recognize images and their mirrors.
It's possible that it's a good idea, but then again, I don't know what's the goal or the domain of the image recognition. Let's say the images contain characters and you're asking the image recognition software to determine if an image contains a forward slash / or a back slash \ then flipping the image will make your training data useless. If your domain doesn't suffer from such issues, then I'd think it's a good idea to flip them and even rotate with varying degrees.
I have used flipped images in AdaBoost with great success in the course: http://www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/Schedule.php
from the zip "TrainingImages.tar.gz".
I know there are some information on pros/cons with using flipped images somewhere in the slides (at the homepage) but I can't find it. Also a great resource is http://www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/DownloadMaterial/FaceLab/Manual.pdf (together with the slides) going thru things like finding things in different scales and orientation.
If the images patches are not symmetric I don't think its a good idea to flip. Better idea is to do some similarity transforms to the training set with some limits. Another way to increase the dataset is to add gaussian smoothed templates to it. Make sure that the number of positive and negative samples are proportional. Too many positive and too less negative might skew the classifier and give bad performance on testing set.
It depends on what your NN is based on. If you are extracting rotation invariant features or features that do not depend on the spatial position within the the image (like histograms or whatever) and train your NN with these features, then rotating will not be a good idea.
If you are training directly on pixel values, then it might be a good idea.
Some more details might be useful.

Resources