How to size the image normalization in handwriting recognition? - image-processing

Handwriting number recognition problem : how can i normalize the hand wiring number image ?someone can help?

Check out how the MNIST dataset is curated here:
http://yann.lecun.com/exdb/mnist/index.html
To quote the relevant section:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
With some classification methods (particuarly template-based methods,
such as SVM and K-nearest neighbors), the error rate improves when the
digits are centered by bounding box rather than center of mass. If you
do this kind of pre-processing, you should report it in your
publications.

Related

Deep Learning - How to perform RANDOM CROP and not lose any information in data (change ground truth label)

I have image patches from DDSM Breast Mammography that are 150x150 in size. I would like to augment my dataset by randomly cropping these images 2x times to 120x120 size. So, If my dataset contains 6500 images, augmenting it with random crop should get me to 13000 images. Thing is, I do NOT want to lose potential information in the image and possibly change ground truth label.
What would be best way to do this? Should I crop them randomly from 150x150 to 120x120 and hope for the best or maybe pad them first and then perform the cropping? What is the standard way to approach this problem?
If your ground truth contains the exact location of what you are trying to classify, use the ground truth to crop your images in an informed way. I.e. adjust the ground truth, if you are removing what you are trying to classify.
If you don't know the location of what you are classifying, you could
attempt to train a classifier on your un-augmented dataset,
find out, what the regions of the images are that your classifier reacts to,
make note of these location
crop your images in an informed way
train a new classifier
But how do you "find out, what regions your classifier reacts to"?
Multiple ways are described in Visualizing and Understanding Convolutional Networks by Zeiler and Fergus:
Imagine your classifier classifies breast cancer or no breast cancer. Now simply take an image that contains positive information for breast cancer and occlude part of the image with some blank color (see gray square in image above, image by Zeiler et al.) and predict cancer or not. Now move the occluded square around. In the end you'll get rough predictions scores for all parts of your original image (see (d) in the image above), because when you covered up the important part that is responsible for a positive prediction, you (should) get a negative cancer prediction.
If you have someone who can actually recognize cancer in an image, this is also a good way to check for and guard against confounding factors.
BTW: You might want to crop on-the-fly and randomize how you crop even more to generate way more samples.
If the 150x150 is already the region of interest (ROI) you could try the following data augmentations:
use a larger patch, e.g. 170x170 that always contains your 150x150 patch
use a larger patch, e.g. 200x200, and scale it down to 150x150
add some gaussian noise to the image
rotate the image slightly (by random amounts)
change image contrast slightly
artificially emulate whatever other (image-)effects you see in the original dataset

Key Point classification in an image

I am trying to compare two image of drawings using corner features in the images. Here is a sample image:
Query image:
I used SIFT algorithm to compare images but it did not work because in SIFT we consider a window of 16X16 pixel to extract the features at point of interest but here in this case(drawing objects) we will get only corner points as feature points and SIFT feature descriptor will give very similar feature to all corner points and hence in the feature matching step it will reject the corners because of their close similarity scores.
So i am using below approach to compare the images.
I am using Shi-Tomasi algorithm based function in opencv ie. cv2.goodFeaturesToTrack() to find the corners(feature points) in an image. After finding corners i want to classify them in 4 categories and compare them in two images. Below is corner categories defined as of now which my vary because of huge variations in corner types(angle, no. of lines crossing at corners, irregular pixel variation at corner point):
Corner categories:
Type-1: L-shaped
Type-2: Line intersection
Type-3: Line-curve intersection
type-4: Curve-Curve intersectio
I am trying to solve this using below approach:
=> Take a patch of fixed window size surrounding the corner pixel say a window of 32X32
=> Find the gradient information ie. gradient magnitude and its direction in this window and use this information to classify the corner in above 4 classes.After going through image classification i came to know that Using HOG algorithm image gradient information can be converted to feature vectors.
=> HOG feature vector calculated in above step can be used to train SVM to get a model.
=> This model can be used for new feature point classification.
After implementing above algorithm i am getting poor accuracy.
If there is any other way to classify the corners please suggest.

Shape Detection using Machine Learning

I would like to detect shapes namely circle, square, rectangle, triangle, etc., using Machine Learning Techniques.
Following are the specifications for shape detection,
Convolutional Neural Network ( CNN ) is used.
For Training, Dataset contains 1000 images in each category for 10 shapes.
For Testing, Dataset contains 100 images in each category for 10 shapes.
All images are 28x28 resize with one channel ( gray channel ).
All the images in the dataset are edge-detected images.
Questions
Is it possible for the machine learning algorithm to differentiate between a square and a rectangle...?, square and a rhombus...?
How can i improve the dataset for shape detection ?
Thanks in Advance...!!!
Yes, and it is not a very hard task for a CNN to do.
One way to improve the dataset is to use image augmentation. I think you can do both horizontal and vertical flips as all these figures are still the same kind of figures when applying this transformation. You can think of other transformations as long as they don't change the axes sizes, because if you change the sizes of the axes a square becomes a rectangle, and viceversa.

Why SIFT descriptors are scale invariant?

My understanding: SIFT descriptor uses the histogram of orientation gradient calculated from 16x16 neighbourhood pixels.
16x16 area in a large image can be a very small area, e.g. 1/10 of one hair on a cat's paw,
when you resize the target image into a small size, 16x16 neighbourhood around the same key point
can be a large of part of the image, e.g. the paw of the cat
It doesn't make sense to me to compare the original image with the resized image using SIFT descriptor,
Can any one tell me what's wrong with my understanding ?
This is a rough description, but should give you an understanding of the approach.
One of the stages that SIFT uses is to create a pyramid of scales of the image. It will scale down and smooth using a low pass filter.
The feature detector then works by finding features that have a peak response not only in the image space, but in scale space too. This means that it finds the scale of the image which the feature will produce the highest response.
Then, the descriptor is calculated in that scale. So when you use a smaller/larger version, it should still find the same scale for the feature.

How to implement despeckle in OpenCV?

If histogram equalization is done on a poorly-contrasted image then its features become more visible. However there is also a large amount of grains/speckles/noise. using blurring functions already available in OpenCV is not desirable - i'll be doing text-detection on the image later on and the letters will get unrecognizable.
So what are the preprocessing techniques that should be applied?
Standard blur techniques that convolve the image with a kernel (e.g. Gaussian blur, box filter, etc) act as a low-pass filter and distort the high-frequency text. If you have not done so already, try cv::bilateralFilter() or cv::medianBlur(). If neither of these algorithms work, you should look into other edge-preserving smoothing algorithms.
If you imagine the image as a three-dimensional space, traditional filtering replaces the value of each pixel with the weighted average of all filters in a circle centered around the pixel. Bilateral filtering does the same, but uses a three-dimensional sphere centered at the pixel. Since a well-defined edge looks like a plateau, the sphere contains only one point and the pixel value remains unchanged. You can get a more detailed explanation of the bilateral filter and some sample output here.

Resources