Lets say I have a dataset of about 350 positive images and more than 400 negative images. They aren't the same size. Also their size is bigger than 640x320.
What should I do to create a better dataset? Do I need the images to be smaller? If yes, why?
Should I apply some normalization to the dataset? What should it be (contrast, noise reduction)?
Can I create a bigger dataset using the existing one? If yes, how?
Thanks in advance!
Optimal size of images is that you can easily classify object by
yourself.
Yes, classifiers works better after normalization, there are
options. Most popular ways is center dataset (subtract mean) and normalize range of
values say in [-1:1] range. Other popular way of normalization is similar to previous but normalize standard deviation (preferable in most cases).
Yes, you can create bigger dataset from existing on by adding
distorsions and noise to your images from existing dataset.
Have a look at INRIA dataset and their comments of how they "normalized" their input images for HoG person detection training.
http://pascal.inrialpes.fr/data/human/
one thing that wasn't mentioned yet is the fact, that for most detection techniques it isn't enough to collect a set of n images with the desired object "somewhere" within that image. Instead you should crop that image around the object (with some border).
e.g. for person detection they used this input image:
but they cropped and rescaled (and transformed) those regions (objects):
probably there are some good hints about training in the thesis too:
http://lear.inrialpes.fr/people/dalal/NavneetDalalThesis.pdf
Related
I am currently trying to make a program to differentiate rotten oranges and edible oranges solely based on their external appearance. To do this, I am planning on using a Convolutional Neural Network to train with rotten oranges and normal oranges. After some searching I could only find one database of approx. 150 rotten oranges and 150 normal oranges on a black background (http://www.cofilab.com/downloads/). Obviously, a machine learning model will need at least few thousand oranges to achieve an accuracy above 90 or so percent. However, can I alter these 150 oranges in some way to produce more photos of oranges? By alter, I mean adding different shades of orange on the citrus fruit to make a "different orange." Would this be an effective method of training a neural network?
It is a very good way to increase the number of date you have. What you'll do depends on your data. For example, if you are training on data obtained from a sensor, you may want to add some noise to the training data so that you can increase your dataset. After all, you can expect some noise coming from the sensor later on.
Assuming that you will train it on images, here is a very good github repository that provides means to use those techniques. This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images.
Link: https://github.com/aleju/imgaug
Features:
Most standard augmentation techniques available.
Techniques can be applied to both images and keypoints/landmarks on
images. Define your augmentation sequence once at the start of the
experiment, then apply it many times.
Define flexible stochastic ranges for each augmentation, e.g. "rotate
each image by a value between -45 and 45 degrees" or "rotate each
image by a value sampled from the normal distribution N(0, 5.0)".
Easily convert all stochastic ranges to deterministic values to
augment different batches of images in the exactly identical way
(e.g. images and their heatmaps).
Data augmentation is what you are looking for. In you case you can do different things:
Apply filters to get slightly different image, as has been said you can use gaussian blur.
Cut the orange and put it in different backgrounds.
Scale the oranges with different scales factors.
Rotate the images.
create synthetic rotten oranges.
Mix all different combinations of the previous mentioned. With this kind of augmentation you can easily create thousand of different oranges.
I did something like that with a dataset of 12.000 images and I can create 630.000 samples
That is indeed a good way to increase your data set. You can, for example, apply Gaussian blur to the images. They will become blurred, but different from the original. You can invert the images too. Or, in last case, look for new images and apply the cited techniques.
Data augmentation is really good way to boost training set but still not enough to train a deep network end to end on its own given the possibility that it will overfit. You should look at domain adaptation where you take a pretrained model like inception which is trained on imagenet dataset and finetune it for your problem. Since you have to learn only parameters required to classify your use case, it is possible to achieve good accuracies with relatively less training data available. I have hosted a demo of classification with this technique here. Try it out with your dataset and see if it helps. The demo takes care of pretrained model as well as data augmentation for dataset that you will upload.
I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.
To prepare large amounts of data sets for training deep learning-based image classification models, we usually have to rely on image augmentation methods. I would like to know what are the usual image augmentation algorithms, are there any considerations when choosing them?
The litterature on data augmentation is very very large and very dependent on your kind of applications.
The first things that come to my mind are the galaxy competition's rotations and Jasper Snoeke's data augmentation.
But really all papers have their own tricks to get good scores on special datasets for exemples stretching the image to a specific size before cropping it or whatever and this in a very specific order.
More practically to train models on the likes of CIFAR or IMAGENET use random crops and random contrast, luminosity perturbations additionally to the obvious flips and noise addition.
Look at the CIFAR-10 tutorial on TF website it is a good start. Plus TF now has random_crop_and_resize() which is quite useful.
EDIT: The papers I am referencing here and there.
It depends on the problem you have to address, but most of the time you can do:
Rotate the images
Flip the image (X or Y symmetry)
Add noise
All the previous at the same time.
What is the general consensus on rescaling images that have different sizes? I have read that one approach is to rescale the largest size of an image to a fixed size. It's not clear to me how only rescaling one of the dimensions would lead to uniform image shapes across the dataset.
Are there other approaches, e.g. would it work to take the average size of the two dimensions and then rescale the dimensions of each image to the mean of each dimension across the dataset?
Is it important which interpolation method is used in the rescaling?
Would it make sense to simply take an nxm part of each image and cut off the rest of each image?
Is there a list of approaches people have used and how they perform in different scenarios.
Depends on the target application of the CNNs. For object detection/classification usually a sliding window approach or cropping is used. For the first option, sliding window is moved around the image and for every patch (with different overlapping criterion) a prediction is made. This predictions are then filtered with other pooling or filter strategies.
For image segmentation (aka semantic segmentation), similar approaches are used. 1) image scaling + segmenting + scaling back to its original size. 2) different image patches + segmentation of each, or 3) sliding window segmentation + maxpooling. With the option (3) each pixel has a N = HxW votes (where N is the size of the sliding window). This N predictions are then aggregated into a maxixmum-voting classifier (similar to ensemble models on Random Forest and other classifiers).
So, in short, I believe there is no short nor unique answer to this question. The decision you take will depend in the goal you try to achieve with the CNN, and of course, the quality of your approach will have an impact in the performance of the CNN. I don't know about any study of this kind though.
The OpenCV Haar cascade classifier seems to use 24x24 images of faces as its positive training data. I have two questions regarding this:
What are the consideration that go into selecting the training image size, besides the fact that larger training images require more processing?
For non-square images, some people have chosen to keep one dimension at 24px, and expand the other dimension as necessary (to, say 100-200px). Is this the correct strategy?
How does one go about deciding the size of the training images (this is a variant of question 1)
I honestly believe that there are far better parameters to be tweaked than the image size. Even so, it's a question of fine-to-coarse detection - at finer levels, you gain detail and at coarser levels, you gain structure. Also, there is a trade off: with 24x24 detection regions, there are about ~160,000 possible rectangular (haar-like) features, so increasing or decreasing also affects this number for both training/testing (this is why boosting is used to select a small subset of discriminative features).
As you said, this is because his target was different (i.e. a pen). I think it is sensible to introduce a priori aspect ratio information to the cascade training, otherwise you would be getting detections that have square bounding boxes for a pen detector and probably suffer in performance because the training stage is picking up a larger background region around the pen.
See my first answer. I think this is largely empirical. There are techniques for either feature scaling or building image pyramids (e.g. see this work) that also mitigate the usefulness of highly controlling the choice of training target image sizes too.