how decorrelating image effect image classification? - image-processing

How decorrelating image effect image classification?
In many website, they use whitening(decorrelating image) for image preprocessing.
What is the benefit?
In machine learning, decorrelation data can calculate easily?

Whitening can be seen as the process of removing the mean values, the correlation between pixel to keep only interesting data. That way, the ML algorithm only sees interesting data and can train faster and more efficiently.
Here are some useful links to understand why whitening can be useful :
Standford - PCA whitening
Statistical whitening
Exploring ZCA and color image whitening

Related

Image enhancement before CNN helpful?

I have a Deep learning model ( transfer learning based in keras) to do regression problem on medical images. Does it help or have any logical idea or doing some image enhancements like strengthening the edges or doing histogram equalization before feeding the inputs to the CNN?
It is possible to train model accurately by using something you told.
For training CNN model with data, they almost use image augmentation in pre-processing phase.
There are list usually used in augmentation.
color noise
transform
rotate
whitening
affine
crop
flip
etc...
You can refer to here

Poor performance on digit recognition with CNN trained on MNIST dataset

I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.

image augmentation algorithms for preparing deep learning training set

To prepare large amounts of data sets for training deep learning-based image classification models, we usually have to rely on image augmentation methods. I would like to know what are the usual image augmentation algorithms, are there any considerations when choosing them?
The litterature on data augmentation is very very large and very dependent on your kind of applications.
The first things that come to my mind are the galaxy competition's rotations and Jasper Snoeke's data augmentation.
But really all papers have their own tricks to get good scores on special datasets for exemples stretching the image to a specific size before cropping it or whatever and this in a very specific order.
More practically to train models on the likes of CIFAR or IMAGENET use random crops and random contrast, luminosity perturbations additionally to the obvious flips and noise addition.
Look at the CIFAR-10 tutorial on TF website it is a good start. Plus TF now has random_crop_and_resize() which is quite useful.
EDIT: The papers I am referencing here and there.
It depends on the problem you have to address, but most of the time you can do:
Rotate the images
Flip the image (X or Y symmetry)
Add noise
All the previous at the same time.

Data augmentation techniques for small image datasets?

Currently i am training small logo datasets similar to Flickrlogos-32 with deep CNNs. For training larger networks i need more dataset, thus using augmentation. The best i'm doing right now is using affine transformations(featurewise normalization, featurewise center, rotation, width height shift, horizontal vertical flip). But for bigger networks i need more augmentation. I tried searching on kaggle's national data science bowl's forum but couldn't get much help. There's code for some methods given here but i'm not sure what could be useful. What are some other(or better) image data augmentation techniques that could be applied to this type of(or in any general image) dataset other than affine transformations?
A good recap can be found here, section 1 on Data Augmentation: so namely flips, random crops and color jittering and also lighting noise:
Krizhevsky et al. proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB channels in training images.
Alternatively you can also have a look at the Kaggle Galaxy Zoo challenge: the winners wrote a very detailed blog post. It covers the same kind of techniques:
rotation,
translation,
zoom,
flips,
color perturbation.
As stated they also do it "in realtime, i.e. during training".
For example here is a practical Torch implementation by Facebook (for ResNet training).
I've collected a couple of augmentation techniques in my masters thesis, page 80. It includes:
Zoom,
Crop
Flip (horizontal / vertical)
Rotation
Scaling
shearing
channel shifts (rgb, hsv)
contrast
noise,
vignetting

Image Similarity - Deep Learning vs hand-crafted features

I am doing research in the field of computer vision, and am working on a problem related to finding visually similar images to a query image. For example, finding t-shirts of similar colour with similar patterns (Striped/ Checkered), or shoes of similar colour and shape, and so on.
I have explored hand-crafted image features such as Color Histograms, Texture features, Shape features (Histogram of Oriented Gradients), SIFT and so on. I have also read up literature about Deep Neural Networks (Convolutional Neural Networks), which have been trained on massive amounts of data and are currently state of the art in Image Classification.
I was wondering if the same features (extracted from the CNN's) can also be used for my project - finding fine-grained similarities between images. From what I understand, the CNNs have learnt good representative features that can help classify images - for example, be it a red shirt or a blue shirt or an orange shirt, it is able to identify that the image is a shirt. However it doesn't understand that an orange shirt looks more similar to a red shirt than a blue shirt does, and hence it is not able to capture these similarities.
Please correct me if I am wrong. I would like to know if there are any Deep Neural Networks that capture these similarities, and have proven to be superior to the hand-crafted features. Thanks in advance.
For your task, a CNN is definitely worth a try!
Many researchers used networks which are pretrained for Image Classification and obtained state-of-the-art results on fine-grained classification. For example, trying to classify birds species or cars.
Now, your task is not classification, but it is related. You can think about similarity as some geometric distance between features, which are basically vectors. Thus, you may carry out some experiments computing the distance between the feature vectors for all your training images (the reference) and the feature vector extracted from the query image.
CNNs features extracted from the first layers of the net should be more related to color or other graphical traits, rather than more "semantical" ones.
Alternatively, there is some work on learning directly a similarity metric through CNN, see here for example.
A little bit out-dated, but it can still be useful for other people. Yes, CNNs can be used for image similarity and I used before. As Flavio pointed out, for a simple start, you can use a pre-trained CNN of your choice such as Alexnet,GoogleNet etc.. and then use it as feature extractor. You can compare the features based on the distance, similar pictures will have a smaller distance between their feature vectors.

Resources