I would like to detect shapes namely circle, square, rectangle, triangle, etc., using Machine Learning Techniques.
Following are the specifications for shape detection,
Convolutional Neural Network ( CNN ) is used.
For Training, Dataset contains 1000 images in each category for 10 shapes.
For Testing, Dataset contains 100 images in each category for 10 shapes.
All images are 28x28 resize with one channel ( gray channel ).
All the images in the dataset are edge-detected images.
Questions
Is it possible for the machine learning algorithm to differentiate between a square and a rectangle...?, square and a rhombus...?
How can i improve the dataset for shape detection ?
Thanks in Advance...!!!
Yes, and it is not a very hard task for a CNN to do.
One way to improve the dataset is to use image augmentation. I think you can do both horizontal and vertical flips as all these figures are still the same kind of figures when applying this transformation. You can think of other transformations as long as they don't change the axes sizes, because if you change the sizes of the axes a square becomes a rectangle, and viceversa.
Related
I am trying to compare two image of drawings using corner features in the images. Here is a sample image:
Query image:
I used SIFT algorithm to compare images but it did not work because in SIFT we consider a window of 16X16 pixel to extract the features at point of interest but here in this case(drawing objects) we will get only corner points as feature points and SIFT feature descriptor will give very similar feature to all corner points and hence in the feature matching step it will reject the corners because of their close similarity scores.
So i am using below approach to compare the images.
I am using Shi-Tomasi algorithm based function in opencv ie. cv2.goodFeaturesToTrack() to find the corners(feature points) in an image. After finding corners i want to classify them in 4 categories and compare them in two images. Below is corner categories defined as of now which my vary because of huge variations in corner types(angle, no. of lines crossing at corners, irregular pixel variation at corner point):
Corner categories:
Type-1: L-shaped
Type-2: Line intersection
Type-3: Line-curve intersection
type-4: Curve-Curve intersectio
I am trying to solve this using below approach:
=> Take a patch of fixed window size surrounding the corner pixel say a window of 32X32
=> Find the gradient information ie. gradient magnitude and its direction in this window and use this information to classify the corner in above 4 classes.After going through image classification i came to know that Using HOG algorithm image gradient information can be converted to feature vectors.
=> HOG feature vector calculated in above step can be used to train SVM to get a model.
=> This model can be used for new feature point classification.
After implementing above algorithm i am getting poor accuracy.
If there is any other way to classify the corners please suggest.
I have a problem with the pupil center detection. I trained a CNN to give me the pupil center location but it is not always at the center.
How can I make good processing and have the ellipse fitting algorithm detect the center?
The process is this. I cut the face on a picture with dlib then I make the prediction and after I get the results I want to predict the center.
Here are two examples of the cnn prediction. Any guidance will be appreciated .
Direct radial rays from the center you found. Compute intensity gradient along each ray. Maximal gradients will your points on edge of iris. Then use fit ellipse.
From those pictures, it appears that the variable occlusion of the iris is what is throwing off your center find. What may help is being more specific about just the edge between iris and eye white (and not with eyelid). To do this I would (but there may be better ways). Drop a point inside the iris blob and project a grid of radially spaced vectors outward looking for the first dark to light transition above a minimum contrast. For each ray measure the contrast of the edge. The contrast should be almost exactly the same for all iris to eyewhite transitions and will have variance with the eyelid. Perform whatever type of data clustering you prefer to isolate the chunk of only pupil to eyewhite edges and then only feed those edge points into the ellipse center find.
Currently i am training small logo datasets similar to Flickrlogos-32 with deep CNNs. For training larger networks i need more dataset, thus using augmentation. The best i'm doing right now is using affine transformations(featurewise normalization, featurewise center, rotation, width height shift, horizontal vertical flip). But for bigger networks i need more augmentation. I tried searching on kaggle's national data science bowl's forum but couldn't get much help. There's code for some methods given here but i'm not sure what could be useful. What are some other(or better) image data augmentation techniques that could be applied to this type of(or in any general image) dataset other than affine transformations?
A good recap can be found here, section 1 on Data Augmentation: so namely flips, random crops and color jittering and also lighting noise:
Krizhevsky et al. proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB channels in training images.
Alternatively you can also have a look at the Kaggle Galaxy Zoo challenge: the winners wrote a very detailed blog post. It covers the same kind of techniques:
rotation,
translation,
zoom,
flips,
color perturbation.
As stated they also do it "in realtime, i.e. during training".
For example here is a practical Torch implementation by Facebook (for ResNet training).
I've collected a couple of augmentation techniques in my masters thesis, page 80. It includes:
Zoom,
Crop
Flip (horizontal / vertical)
Rotation
Scaling
shearing
channel shifts (rgb, hsv)
contrast
noise,
vignetting
I am looking for parabolas in some radar data. I am using the OpenCV Haar cascaded classifier. My positive images are 20x20 PNGs where all of the pixels are black, except for those that trace a parabolic shape--one parabola per positive image.
My question is this: will these positives train a classifier to look for black boxes with parabolas in them, or will they train a classifier to look for parabolic shapes?
Should I add a layer of medium value noise to my positive images, or should they be unrealistically crisp and high contrast?
Here is an example of the original data.
Here is an example of my data after I have performed simple edge detection using GIMP. The parabolic shapes are highlighted in the white boxes
Here is one of my positive images.
I figured out a way to do detect parabolas initially using the MatchTemplate method from OpenCV. At first, I was using the Python cv, and later cv2 libraries, but I had to make sure that my input images were 8-bit unsigned integer arrays. I eventually obtained a similar effect with less fuss using scipy.signal.correlate2d( image, template, mode='same'). The mode='same' resizes the output to the size of image. When I was done I performed thresholding, using the numpy.where() function, and opening and closing to eliminate salt and pepper noise using the scipy.ndimage module.
Here's the output, before thresholding.
Handwriting number recognition problem : how can i normalize the hand wiring number image ?someone can help?
Check out how the MNIST dataset is curated here:
http://yann.lecun.com/exdb/mnist/index.html
To quote the relevant section:
The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
With some classification methods (particuarly template-based methods,
such as SVM and K-nearest neighbors), the error rate improves when the
digits are centered by bounding box rather than center of mass. If you
do this kind of pre-processing, you should report it in your
publications.