Shifting as data augmentation in Machine learning - machine-learning

I have few images with that I want data augmentation for CNN (convolution neural network) training.
As I know some of the operations for data augmentation are:
rotation, vertical and horizontal flipping, shifting(position of object) and many more.
But my doubt is whether shifting of the object in the image really matters in CNN. If it does, then how does it matter.

If all the objects are centered, then there is no problem. But if the objects can be in different parts of the image, then shifting could be relevant.

Related

Does Deep learning requires image registration?

I have a general question regarding bio-medical image analysis. As bio-medical images require registration for alignment of images in same space and for better feature extraction. My question is does deep learning based classification also requires image registration of images for training dataset?
As in deep learning the architecture define best features by itself, does registration required for abdominal CT scan images classification using Deep Neural Networks ?
As we perform data augmentation for better training of data does image registration still required in this case?
Generally deep learning approaches for image data is done using convolutional neural networks (CNNs) which are at least shift invariant. By using image pyramids or specially constructed neural network layouts, they can also be made scale invariant. Generally they are not rotation invariant.
This does not mean that they cannot work with differently rotated input images, but you may need much bigger models and more training data to get it to work well. The neural net will learn the differently rotated features of whatever you're trying to detect. If the range of rotation is small, this is probably not a big issue.
In summary, you don't necessarily need registration, but it can improve your final results.

semantic segmentation for large images

I am working on a limited number of large size images, each of which can have 3072*3072 pixels. To train a semantic segmentation model using FCN or U-net, I construct a large sample of training sets, each training image is 128*128.
In the prediction stage, what I do is to cut a large image into small pieces, the same as trainning set of 128*128, and feed these small pieces into the trained model, get the predicted mask. Afterwards, I just stitch these small patches together to get the mask for the whole image. Is this the right mechanism to perform the semantic segmentation against the large images?
Your solution is often used for this kind of problem. However, I would argue that it depends on the data if it truly makes sense. Let me give you two examples you can still find on kaggle.
If you wanted to mask certain parts of satellite images, you would probably get away with this approach without a drop in accuracy. These images are highly repetitive and there's likely no correlation between the segmented area and where in the original image it was taken from.
If you wanted to segment a car from its background, it wouldn't be desirable to break it into patches. Over several layers the network will learn the global distribution of a car in the frame. It's very likely that the mask is positive in the middle and negative in the corners of the image.
Since you didn't give any specifics what you're trying to solve, I can only give a general recommendation: Try to keep the input images as large as your hardware allows. In many situation I would rather downsample the original images than breaking it down into patches.
Concerning the recommendation of curio1729, I can only advise against training on small patches and testing on the original images. While it's technically possible thanks to fully convolutional networks, you're changing the data to an extend, that might very likely hurt performance. CNNs are known for their extraction of local features, but there's a large amount of global information that is learned over the abstraction of multiple layers.
Input image data:
I would not advice feeding the big image (3072x3072) directly into the caffe.
Batch of small images will fit better into the memory and parallel programming will too come into play.
Data Augmentation will also be feasible.
Output for big Image:
As for the output of big Image, you better recast the input size of FCN to 3072x3072 during test phase. Because, layers of FCN can accept inputs of any size.
Then you will get 3072x3072 segmented image as output.

Data Augmentation for Object Detection using Deep Learning

I have a question regarding data augmentation for training the deep neural network for object detection.
I have quite limited data set (nearly 300 images). I augmented the data by rotating each image from 0-360 degrees with stepsize of 15 degree. Consequently I got 24 rotated images out of just one. So in total, I got around 7200 images. Then I drew bounding box around the object of interest in each augmented image.
Does it seem to be a reasonable approach to enhance the data?
Best Regards
In order to train a good model you need lots of representative data. Your augmentation is representative only for rotations, so yes, it is a good method, if you are concerned about having not enough object rotations. However, it will not help in any sense with generalization to other objects/transformations.
It seems like you are on the right track, rotation is usually a very useful transformation for augmenting the training data. I would suggest to try other transformations like shift (you most probably want to detect partially present objects), zoom (makes your model invariant to the scale), shear, flip, etc. By combining different transformations you can introduce additional diversity in your training data. Training set of 300 images is a very small number, so you would definitely need more than one transformation to augment so tiny training set.
This is a good approach as long as you don't implicitly change the labels when you do rotation. E.g. An image containing the digit 6 will become digit 9 on rotation of 180 deg. So, you've to pay some attention in such scenarios.
But, you could also do other geometric transformations like scaling, translation
Other augmentation that you can consider is using the pre-trained model such as ImageNet, if your problem domain has some resemblance to the ImageNet data. This will allow you to train deeper models even for your data scarce situation.
Even though rotation increases the representational complexity of your image, it might be not enough. Instead you probably need to add other types of augmentation as well.
Color augmentations are useful if they still represent the real distribution of your data.
Spatial augmentations work very good. Keep in mind that most modern systems use a lot of cropping, so that might help.
Actually I have a few scripts that I am trying to turn into a library that might work for you. Check them https://github.com/lozuwa/impy if you would like to.

image augmentation algorithms for preparing deep learning training set

To prepare large amounts of data sets for training deep learning-based image classification models, we usually have to rely on image augmentation methods. I would like to know what are the usual image augmentation algorithms, are there any considerations when choosing them?
The litterature on data augmentation is very very large and very dependent on your kind of applications.
The first things that come to my mind are the galaxy competition's rotations and Jasper Snoeke's data augmentation.
But really all papers have their own tricks to get good scores on special datasets for exemples stretching the image to a specific size before cropping it or whatever and this in a very specific order.
More practically to train models on the likes of CIFAR or IMAGENET use random crops and random contrast, luminosity perturbations additionally to the obvious flips and noise addition.
Look at the CIFAR-10 tutorial on TF website it is a good start. Plus TF now has random_crop_and_resize() which is quite useful.
EDIT: The papers I am referencing here and there.
It depends on the problem you have to address, but most of the time you can do:
Rotate the images
Flip the image (X or Y symmetry)
Add noise
All the previous at the same time.

Why use sliding windows with convolutional neural nets in object detection?

I read that CNNs (with both convolution and max-pooling layers) are shift-invariant, but most object detection methods used a sliding window detector with non-maximum suppression. Is it necessary to use sliding windows with CNNs when doing object detection?
Basically, instead of training the network on small 50x50 patches of images containing the desired object, why not train on entire images where the object is present somewhere? All I can think of is practical/performance reasons (doing forward pass on smaller patches instead of whole images), but is there also a theoretical explanation I'm overlooking?
internally, CNN is doing a sliding window. Convolution in terms of 2d image is nothing more than a linear filter applied in the sliding window manner. This is simply nice, mathematical expression of the very same operation, which helps us do neat optimization. Max pooling on the other hand helps us to be robust in terms of small shifts/noise. So efficiently feeding image to the network is using (many!) sliding windows on it. Can we pass big images instead of small ones? Sure, but you wil get extremely big tensors (just compute how many numbers you will need, this is huge), and you will get really complex optimization problem. Nowadays we optimize in milions-dimensional space. Working with whole images might lead to bilions (or even bigger) number of dimensions. Optimization complexity grows exponentialy with the growth of the dimension, thus you will end up with extremely slow method (not in terms of computation itself - but convergence).

Resources