Whether Data augmentation really needed in Machine Learning [closed] - machine-learning

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am interested in knowing the importance of data augmentation(rotation at various angles, flipping the images) while providing a dataset to a Machine Learning problem.
Whether it is really needed? Or the CNN networks using will handle that as well no matter how different the data are transformed?
So I took a classification task with 2 classes to conclude some results
Arrow shapes
Circle shapes
The idea is to train the shapes with only one orientation(I have taken arrows pointing right) and check the model with a different orientation(I have taken arrows pointing downwards) which is not at all given during the training stage.
Some of the samples used in Training
Some of the samples used in Testing
This is the entire dataset I am using in for creating a tensorflow model.
https://bitbucket.org/akhileshmalviya/samples/src/bab50b85d826?at=master
I am wondering with the results I got,
(i) Except a few downward arrows all others are getting predicted correctly as arrow. Does it mean data augmentation is not at all needed?
(ii) Or is this the right use case I have taken to understand the importance of data augmentation?
Kindly share your thoughts, Any help could be really appreciated!

Data augmentation is a data-depended process.
In general, you need it when your training data is complex and you have a few samples.
A neural network can easily learn to extract simple patterns like arcs or straight lines and these patterns are enough to classify your data.
In your case data augmentation can barely help, the features the network will learn to extract are easy and highly different from each other.
When you, instead, have to deal with complex structures (cats, dogs, airplanes, ...) you can't rely on simple features like edges, arcs, etc..
Instead, you have to show to your network that the instances you're trying to classify got an high variance and that the features extracted can be combined in a lot of different ways for the same subject.
Think about a cat: it can be of any color, the picture can be taken in different light conditions, its whole body can be in any position, the picture could be taken with a certain orientation...
To correctly classify instances so different, the network must learn to extract robust features that could be learned only after seeing a lot of different inputs.
In your case, instead, simple features can completely discriminate your input, thus any sort of data augmentation could help by just a little bit.

The task you are solving can be easily solved without any NN and even without machine learning.
Just because the problem is so simple it does not really matter whether you do a data augmentation or not. The need for data augmentation is task specific and depends on many things:
how easy is to augment the data with preserving the ability to correctly mark the class. For image, sounds which we used to see/hear it is not a problem (we know that adding small noise to the sound does not change the meaning, rotating the lizard is still a lizard). For other things augmenting without preserving the class/value is hard (for example in Go, randomly adding a stone can change the value of the position dramatically)
does the augmented data is drawn from the same distribution you care about. Adding random stones to Go does not work, but rotating flipping the board works and preserves distribution. But for example in a racing king game (variant of chess) it will not help. You can't flip the position (left <-> right), the evaluation stays the same, but it will never happen in real game and therefore drawn from different distribution and useless
how much data do you have and how expressive is your model. The more parameters you model have, the bigger the chance of overfitting and the more is your need for data. If you train a linear regression in n dims, you will have n + 1 params. You do not really need to augment this. Also if you already have 10bln data points, the augmentation is probably will not be helpful.
how expensive the augmentation procedure. For rotating/scaling the image it is very cheap, but for other augmentation it can be computationally expensive
something else that I forgot.

Related

Neural Network for Learning Cut VS Uncut Grass

I've got a script to take pictures like the one provided, with colored loops encircling either uncut grass, cut grass, or other background details (for purposes of rejecting non-grass regions), and generate training data in the form of a bunch of small images from inside the colored loops of those types of training data. I'm struggling to find which type of neural network that would work best for learning from this training data and telling me in real time from a video feed mounted on a lawn mower which sections of the image is uncut grass or cut grass as it is mowing though a field. Is there anyone on here experienced with neural networks, and can either tell me some I could use, or just point me in the right direction?
Try segmentation network. There are many types of segmentation.
Mind that for neuron networks, training data is necessary. Your case (to detect cut and uncut grass) is considered special, which means existing models may not fit your purpose. If so, you'll need a dataset including images and annotations. There are also tools for labeling segmentation images.
Hope it helps.

Recognize objects while falling - viewpoint variation

I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.

Data Augmentation for Object Detection using Deep Learning

I have a question regarding data augmentation for training the deep neural network for object detection.
I have quite limited data set (nearly 300 images). I augmented the data by rotating each image from 0-360 degrees with stepsize of 15 degree. Consequently I got 24 rotated images out of just one. So in total, I got around 7200 images. Then I drew bounding box around the object of interest in each augmented image.
Does it seem to be a reasonable approach to enhance the data?
Best Regards
In order to train a good model you need lots of representative data. Your augmentation is representative only for rotations, so yes, it is a good method, if you are concerned about having not enough object rotations. However, it will not help in any sense with generalization to other objects/transformations.
It seems like you are on the right track, rotation is usually a very useful transformation for augmenting the training data. I would suggest to try other transformations like shift (you most probably want to detect partially present objects), zoom (makes your model invariant to the scale), shear, flip, etc. By combining different transformations you can introduce additional diversity in your training data. Training set of 300 images is a very small number, so you would definitely need more than one transformation to augment so tiny training set.
This is a good approach as long as you don't implicitly change the labels when you do rotation. E.g. An image containing the digit 6 will become digit 9 on rotation of 180 deg. So, you've to pay some attention in such scenarios.
But, you could also do other geometric transformations like scaling, translation
Other augmentation that you can consider is using the pre-trained model such as ImageNet, if your problem domain has some resemblance to the ImageNet data. This will allow you to train deeper models even for your data scarce situation.
Even though rotation increases the representational complexity of your image, it might be not enough. Instead you probably need to add other types of augmentation as well.
Color augmentations are useful if they still represent the real distribution of your data.
Spatial augmentations work very good. Keep in mind that most modern systems use a lot of cropping, so that might help.
Actually I have a few scripts that I am trying to turn into a library that might work for you. Check them https://github.com/lozuwa/impy if you would like to.

Recognition of images with additional data

Good morning everyone, first I would like to make it clear that I began to take my first steps in machine learning yesterday.
I've read most basic items and attended some presentations.
I will participate in a project here a few months that this technology will be applied.
As a beginner I would like to ask a question that I think is silly, but I could not find answers for her.
In presentations and articles, I have seen the creation of a classifier that can classify images or data sets, but never both at the same time.
For example, Iris flower data set, which is used as an example. In this data set we have the characteristics of flowers, such as petal width, but we do not have a visual representation of it. It is possible to fit both and for example, to estimate the width of the petal of a certain image?
I imagine this is a very basic question, but I could not find something suitable for a beginner.
I would be very grateful.
Machine learning models always work on some abstract data items like vectors, points in multidimensional spaces etc. For the simplicity, let us assume for a moment that ML algorithms work on vectors. Classification therefore would be a task of assigning a label Y to a vector X(n).
Now with a data set conversion of values in a row into a vector is relatively easy - well, you have to somehow convert texts onto numbers or vice versa, but it is a standard procedure.
With images it is different. You have to now build a ML-suitable representation of an image. In other words you need to create features (e.g. numerical) describing the image, that you can later use as inputs to your ML.
Examples of such features are: colour histograms, average brightness, number of edges, various convolutions etc. There can be more complicated, semantic features like the presence of a human on the picture. Calculating these however is much more difficult.
So summing up - you can build a classifier on both the image and dataset, but it basically means transforming both into a set of features.

Histogram of Oriented Gradients object detection [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
HOG is popular in human detection. Can it be used for detecting objects like cup in the image for example.
I am sorry for not asking programming question, but I mean to get the idea if i can use hog to extract object features.
According to my research I have dont for few days I feel yes but I am not sure.
Yes, HOG (Histogram of Oriented Gradients) can be used to detect any kind of objects, as to a computer, an image is a bunch of pixels and you may extract features regardless of their contents. Another question, though, is its effectiveness in doing so.
HOG, SIFT, and other such feature extractors are methods used to extract relevant information from an image to describe it in a more meaningful way. When you want to detect an object or person in an image with thousands (and maybe millions) of pixels, it is inefficient to simply feed a vector with millions of numbers to a machine learning algorithm as
It will take a large amount of time to complete
There will be a lot of noisy information (background, blur, lightning and rotation changes) which we do not wish to regard as important
The HOG algorithm, specifically, creates histograms of edge orientations from certain patches in images. A patch may come from an object, a person, meaningless background, or anything else, and is merely a way to describe an area using edge information. As mentioned previously, this information can then be used to feed a machine learning algorithm such as the classical support vector machines to train a classifier able to distinguish one type of object from another.
The reason HOG has had so much success with pedestrian detection is because a person can greatly vary in color, clothing, and other factors, but the general edges of a pedestrian remain relatively constant, especially around the leg area. This does not mean that it cannot be used to detect other types of objects, but its success can vary depending on your particular application. The HOG paper shows in detail how these descriptors can be used for classification.
It is worthwhile to note that for several applications, the results obtained by HOG can be greatly improved using a pyramidal scheme. This works as follows: Instead of extracting a single HOG vector from an image, you can successively divide the image (or patch) into several sub-images, extracting from each of these smaller divisions an individual HOG vector. The process can then be repeated. In the end, you can obtain a final descriptor by concatenating all of the HOG vectors into a single vector, as shown in the following image.
This has the advantage that in larger scales the HOG features provide more global information, while in smaller scales (that is, in smaller subdivisions) they provide more fine-grained detail. The disadvantage is that the final descriptor vector grows larger, thus taking more time to extract and to train using a given classifier.
In short: Yes, you can use them.

Resources