PCA making my image garbage - machine-learning

so I have 42,000 images. Each image is 28x28 so there are 784 features or pixels.
I want to make a handwritten digit classification system.
So I thought I should use PCA in order to reduce the dimension of the images.
Here is the code for PCA
pipeline = Pipeline([ ('scaling', StandardScaler()),('pca',PCA(n_components=676))])
X_array = pipeline.fit_transform(X_array)
Now the problem is that the PCA is making all images random type, I mean all pixels are completely random in color.
Here is an image of a number before PCA
Here is an image of a number after PCA
Here is another image reduced by PCA
I'm reducing the dimension of the image to 26x26 from 28x28
Why it is happening

Basically, what your PCA code is doing is considering your 28x28 array (you are passing one image at a time, right?) as a dataset of 28 examples of 28 numeric features. That's why the output does not make sense. PCA is a method for reducing the dimensionality of complete datasets, not for zooming out images.
For PCA to work properly, you should flatten out all the features of your images (each as an array of 784 features) and feed all of them as a single dataset (that would be a 42000 x 784 matrix). Then, from the output of the method, pick as many columns as necessary so that most of the variance of your dataset is kept (this probably won't be more than 10, 20 features in total).
The output dataset still will look weird when printing out each row as an image, but will have way less features than the original (you should end up with a matrix of size 42000 x 20 roughly, instead of 42000 x 784 - that's why PCA is used as a dimensionality reduction method), and will retain most of its predictive power.
After that, you could just feed the dataset to your favourite classifier in the next step of the pipeline.

Related

HOG vector size and dimension

I am having problem to understand about size of HOG feature vector...
scene: I took a 286x286 image.Then I calculated HOG for 8x8 patch each.Mean I got 8x8x2=128 numbers represented by a 9 bin histogram for each patch.so can I say this 9 bin histogram as a 9 dimensional vector?.After,total number of patch to estimate HOG in whole image was approx. 1225(since I have square matrix I estimated total patch by squaring(286/8)=35)).I iterated 1225 patches and calculated 9 bin histogram for each.(I didn't applied 16x16 block normalization) After that concatenating all vector together I obtained 1225x9=11,025 sized HOG of whole image.
question:
1.Then is is right to say I obtained 11,025 dimension of an HOG vector in given image?
2.Am I going in right direction?(if I opt for classification via neural network)
3.Is this concatenated HOG feature can directly feeded to PCA for dimension reduction?or need further more preprocessing?(in genral not in advance)
Thank you in advance!
Yes
Probably not. What are you trying to do? For example, if you are doing classification, you should use bag-of-words (actually, you should stop using HOG and try deep learning). If you are doing image retrieval/matching, you should compute HOG feature for local patches.
You can almost always use PCA for dimensionality reduction for your features, even for 128 dimensional SIFT.

Bag of Words with HOG descriptors

I'm not quite sure how to implement the "Bag of Words" approach with HOG descriptors.
I've checked several sources which usually provide several steps to follow:
Compute the HOGs for the set of valid training images.
Apply an clustering algorithm to retrieve n centroids from the descriptors.
Perform some magic to create histograms with the frequency of the nearest centroids of the computed HOGs or use OpenCVs implementation to do this.
Train a linear SVM with the histograms
The step which involves magic (3) is not really clear. If I don't use OpenCV, how would I implement it?
The HOGs are vectors which are calculated cell-wise. So I have a vector for each cell. I could iterate over the vector and calculate the closest centroid for each element of the vector and create the histogram accordingly. Would this be a proper way to do it? But if so, I still have vectors of different sizes and no benefit from it.
Main steps can be expressed;
1- Extract features from your entire training set. (HOG feature for your aim)
2- Cluster those features into a vocabulary V; you get K distinct cluster centers.(K-Means, K-Medoid. Your hyperparameter will be K)
3- Encode each training image as a histogram of the number of times each vocabulary element shows up in the image. Each image is then represented by a length-K vector.
For example; first element of K maybe occurs 5 times, second element of K maybe occurs 10 times in your image. Doesn't matter at the end you will have a vector which has K elements.
K[0] = 5
k[1] = 10
....
....
K[n] = 3
4- Train the classifier using this vector. (Linear SVM)
When given a test image, extract the features. Now represent the test image as a histogram of the number of times each cluster center from V was closest to a feature in the test image. This is a length K vector again.

Digit Recognition on CNN

I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?
Here is "2" number Images sample -
I resized every image 28 x 28.
Here is more detail :-
Training data size = 28 x 28 images.
Network parameters - As LeNet5
Architecture of Network -
Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)
This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.
I see two likely problems:
Preprocessing: MNIST is not only 28px x 28px, but also:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Source: MNIST website
Overfitting:
MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)
Remarks
Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?
It may be an overfitting problem. It could happen when your network is too complex for the problem to resolve.
Check this article: http://es.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting.html
It definitely looks like an issue of overfitting. I see that you have two convolution layers, two max pooling layers and two fully connected. But how many weights total? You only have 96 examples per class, which is certainly smaller than the number of weights you have in your CNN. Remember that you want at least 5 times more instances in your training set than weights in your CNN.
You have two solutions to improve your CNN:
Shake each instance in the training set. You each number about 1 pixel around. It will already multiply your training set by 9.
Use a transformer layer. It will add an elastic deformation to each number at each epoch. It will strengthen a lot the learning by artificially increase your training set. Moreover, it will make it much more effective to predict other fonts.

Feature Extraction and Cross-Validation of an image dataset

I have a dataset consisting of fMRI images. Each image belongs to one class. The dataset is as follows:
Class 1: 9 images
Class 2: 10 images
Class 3: 6 images
Class 4: 12 images
Each image is 4D (time series), i.e. 90x60x10x350 where 350 is the time dimension (i.e. 350 3D volumes). I want to train a classifier on this data.
Now I want to first extract features and then apply feature selection by applying e.g. PCA and then do clustering, like described in the paper "Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data" (http://www.hindawi.com/journals/cmmm/2013/645921/). For feature extraction I see the following possibilities:
Each voxel is a feature and the average of each voxels time series
is taken. Each image has exactly one feature vector of dimension 90*60*10 = 54'000
Each voxel is a feature and each time point (i.e. each 3D volume) is a data point. Each image has 350 feature vectors of dimension 90*60*10 = 54'000 each.
Putting all voxels of the whole time series of an image into one feature vector of
size 90*60*10*350 = 18'900'000. Each image has only one feature vector.
Take the the correlation value between the voxels as feature values. But this is
computationally not doable.
I'm preferring 2. but I'm not sure if this is a good idea.
How would you do the feature extraction? And how would a correlation based approach in a computational feasible way work?
Last but not least, how would you do cross-validation on the dataset? The problem is that the different classes are imbalanced.
Thank you so much for the answers beforehand.

Feeding HOG into SVM: the HOG has 9 bins, but the SVM takes in a 1D matrix

In OpenCV, there is a CvSVM class which takes in a matrix of samples to train the SVM. The matrix is 2D, with the samples in the rows.
I created my own method to generate a histogram of oriented gradients (HOG) off of a video feed. To do this, I created a 9 channeled matrix to store the HOG, where each channel corresponds to an orientation bin. So in the end I have a 40x30 matrix of type CV_32FC(9).
Also made a visualisation for the HOG and it's working.
I don't see how I'm supposed to feed this matrix into the OpenCV SVM, because if I flatten it, I don't see how the SVM is supposed to learn a 9D hyperplane from 1D input data.
The SVM always takes in a single row of data per feature vector. The dimensionality of the feature vector is thus the length of the row. If you're dealing with 2D data, then there are 2 items per feature vector. Example of 2D data is on this webpage:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
code of an equivalent demo in OpenCV http://sites.google.com/site/btabibian/labbook/svmusingopencv
The point is that even though you're thinking of the histogram as 2D with 9-bin cells, the feature vector is in fact the flattened version of this. So it's correct to flatten it out into a long feature vector. The result for me was a feature vector of length 2304 (16x16x9) and I get 100% prediction accuracy on a small test set (i.e. it's probably slightly less than 100% but it's working exceptionally well).
The reason this works is that the SVM is working on a system of weights per item of the feature vector. So it doesn't have anything to do with the problem's dimension, the hyperplane is always in the same dimension as the feature vector. Another way of looking at it is to forget about the hyperplane and just view it as a bunch of weights for each item in the feature vector. In this case, it needs one weighting for every item, then it multiplies each item by its weighting and outputs the result.

Resources