Should i be using a smaller image size for input dataset with lesser number of features ( like MNIST ) and a larger image size for input dataset with higher number of features ( like face recognition ) ?!!!
Related
I am trying to design a Neural Network where I want the feature vector size equal to the input vector size. In essence, I have an image ( my input ) and I wish to perform a regression task on
each of the pixels (i.e., my output is a prediction on how I should act on each of the pixel).
However, my experience with ML ( newbie ) seems to show that the size of the output vector is usually small compared to the input vector size. Is there a reason why I must design my network in a similar manner ? Are there any pitfalls in having an output feature vector as long as the input vector ?
You can safely have the output of the network as big as the input. Look for example at UNet for semantic segmentation. In that case there is one output for each pixel which represents the category (class) of that pixel.
so I have 42,000 images. Each image is 28x28 so there are 784 features or pixels.
I want to make a handwritten digit classification system.
So I thought I should use PCA in order to reduce the dimension of the images.
Here is the code for PCA
pipeline = Pipeline([ ('scaling', StandardScaler()),('pca',PCA(n_components=676))])
X_array = pipeline.fit_transform(X_array)
Now the problem is that the PCA is making all images random type, I mean all pixels are completely random in color.
Here is an image of a number before PCA
Here is an image of a number after PCA
Here is another image reduced by PCA
I'm reducing the dimension of the image to 26x26 from 28x28
Why it is happening
Basically, what your PCA code is doing is considering your 28x28 array (you are passing one image at a time, right?) as a dataset of 28 examples of 28 numeric features. That's why the output does not make sense. PCA is a method for reducing the dimensionality of complete datasets, not for zooming out images.
For PCA to work properly, you should flatten out all the features of your images (each as an array of 784 features) and feed all of them as a single dataset (that would be a 42000 x 784 matrix). Then, from the output of the method, pick as many columns as necessary so that most of the variance of your dataset is kept (this probably won't be more than 10, 20 features in total).
The output dataset still will look weird when printing out each row as an image, but will have way less features than the original (you should end up with a matrix of size 42000 x 20 roughly, instead of 42000 x 784 - that's why PCA is used as a dimensionality reduction method), and will retain most of its predictive power.
After that, you could just feed the dataset to your favourite classifier in the next step of the pipeline.
I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.
So following up from here, I now need to collect negative samples, for cascaded classification using OpenCV.
With positive samples, I know that all samples should have the same aspect ratio.
What about negative samples?
Should they all be larger than positive samples (since OpenCV is going to paste positives on top of negatives to create the test images).
Should all be the same size?
Can they be arbitrary sizes?
Should they too have the same aspect ratio among themselves?
From OpenCV doc on Cascade Classifier Training:
Negative samples are taken from arbitrary images. These images must not contain detected objects. [...] Described images may be of different sizes. But each image should be (but not nessesarily) larger then a training window size, because these images are used to subsample negative image to the training size.
I've managed to extract HoG features from positive and negative images (from INRIA's person dataset ) using OpenCV's HOGDescriptor::compute function.
I've also managed to pack the data correctly and feed it into CvSVM for training purposes.
I have several questions:
While extracting features, I used positive images with dimension of 96 x 128, while the negative images are on average 320 x 240. I have been using window size of 64 x 128 for HoG extraction, should I use other window size ?
The size of extracted features for positive images are around 28800 features, while the negative ones are around 500000+. I have been truncating the features from negative ones to 28800, I think this is wrong, since I believe I'm losing too much information when feeding these features to SVM. How should I go and tackle this ? (It seems like I can only feed the same sample size for negative and positive features)
While doing prediction on images bigger than 64 x 128 (or 96 x 160), should I use a sliding window to do prediction ? Since large negative images still gives me more than 500000 features, but I can't feed it into SVM due to sample size.
Why you can't just resize all your patches to the same size? Hog descriptor depends on windows size, blocks and cells sizes. You should try different combinations. With small cells you can capture small details, but you will lose in generality and vice versa.
1.) Don't understand the question
2.) Make all descriptors the same size, extracting hog from resized images.
3.) Don't understand the question