I'm generating images of handwritten numbers (of more than one digit) by horizontal concatenation images of digits taken from the MNIST dataset, with the goal that the generated numbers look somewhat natural (as in they look like they were written by a person).
For this, I sample one image for each digit from 0-9 from the dataset, and then use those images to generate an image of whatever number I want to.
One issue that I'm facing in this is that in the MNIST dataset, the digits are of varying thickness, so the final number that I generate has some digits that are too bold (as can be seen in the image attached, where the 9 is too bold, and the 5 is the opposite).
Image of a number generated by the mentioned method
What I want to know is whether there is some image processing technique using which I can process all digit images so they have the same or approximately the same thickness (perhaps the thickness can be controlled by some parameters)?
Turns out it was as simple as simply taking the skeleton of the image and then dilating it to whatever thickness I wanted it to be.
This of course only works if the images are entirely composed of very simple curves, which fortunately the MNIST dataset is.
Related
I have a binary image (only 0 and 255 pixels) like the one below.
I want to extract bounding boxes around the letters such as A,B,C and D. The image is large (around 4000x4000) and the letters can be quite small (like B and D above). Moreover, the characters are broken. That is, there are gaps of black pixels within the outline of a character (such as A below).
The image has white noise, which are like streaks of white lines, scattered around the image.
What I have tried -
Extracting contours - The issue is that, for broken characters (like "A"), multiple disconnected contours are obtained for a character. I am not able to obtain a contour for the entire character.
Dilation to join edges - This solves the disconnected contours (for large characters) to a certain extent. However, with dilation, I lose a lot of information about smaller characters which now appear like blocks of white pixels.
I thought of clustering similar pixels but am not able to come up with a well defined solution.
I kindly request for some ideas! Thanks.
How about this procedure?
Object detection (e.g. HOG algorithm): Gives you multiple objects
Resize obtained objects to equal size (e.g. 28x28 like MNIST dataset)
Character classification (e.g. SVM, kNN, deep learning)
The detail is up to you for each process.
+) Search an example of MNIST recognition. The MNIST dataset is a handwritten digit dataset. There are lots of examples about it. (Even for noisy MNIST)
I want to find the occurrence of a particular word in any webpage given as a input.
I used Pyramid-Sliding window , where I generated HOG(Histogram of Gradients) features for all the sliding windows.
For now , I am comparing the HOG features of all windows with the HOG features of the word I want to extract.
For comparison of the two HOG feature vectors, I am just taking summation(vector1(i) - vector2(i)) for all i.
However, the results are below expectations.
My query is that can there be a better comparison system for comparing the HOG-features of each window with that of the word I want to find.
Or should I train a classifier like SVM , to classify the HOG-features of a window.
For training the classifier, I can have max 100-200 elements for the word I want to find in my data-set. And since for SVM , its better to have equal number of true and false data elements in the data-set , how to restrict the non word representations(false elements) to 100-200.
For non-word data elements in the training set, I have :
1. ICDAR-2003 (this word data-set do not contain the word I want to extract)
2. CIFAR image data set
The reason I am not extracting/finding this word in the html code, is because the word can occur in an image also.
Moreover, since the word I want to find is fixed, how many images of the word should I have in the data-set.
If you have fixed font and looking only for particular word, here is simple workaround:
https://stackoverflow.com/a/9647509/8682088
You have to extract word box, resize it to for example 40x10 pixels. Grayscale pixel values could be your feature vector. Then you could train your SVM. It is primitive, but suprisingly effective.
It work perfectly fine with fixed font and simple symbols.
I would like to learn image segmentation using SLIC algorithm in Matlab. After implementing that algorithm on some images, I saw that some lines of segments are dashed. However, I can obtain straight lines on a different image with same parameters. Required superpixel parameter is 500. Compactness factor is 20. What is the correct interpretation of that difference?
Every image is different and algorithms often need to be tweaked on an image by image basis. Just try doing a simple binary threshold conversion
%matlab code
imshow(my_im < 128)
with different my_im images. you will see they are all different. One of the most difficult parts of computer-vision (which includes image segmentation) is finding ways to tune the parameters automatically without trial and error type approaches
I need to be able to determine if a shape was drawn correctly or incorrectly,
I have sample data for the shape, that holds the shape and the order of pixels (denoted by the color of the pixel)
for example, you can see of the downsampled image and color variation
I'm having trouble figuring out the network I need to define that will accept this kind of input for training.
should I convert the sampledown image to a matrix and input it? let's say my image is 64x64, I would need 64x64 input neurons (and that's if I ignore the color of the pixels, I think) is that feasible solution?
If you have any guidance, I could use it :)
I gave you an example as below.
It is a binarized 4x4 image of letter c. You can either concatenate the rows or columns. I am concatenating by columns as shown in the figure. Then each pixel is mapped to an input neuron (totally 16 input neurons). In the output layer, I have 26 outputs, the letters a to z.
Note, in the figure, I did not connect all nodes from layer i to layer i+1 for simplicity, which you probably should connect all.
At the output layer, I highlight the node of c to indicate that for this training instance, c is the target label. The expected input and output vector are listed in the bottom of the figure.
If you want to keep the intensity of color, e.g., R/G/B, then you have to triple the number of inputs. Each single pixel is replaced with three neurons.
Hope this helps more. For a further reading, I strongly suggest the deep learning tutorial by Andrew Ng at here - UFLDL. It's the state of art of such image recognition problem. In the exercise with the tutorial, you will be intensively trained to preprocess the images and work with a lot of engineering tricks for image processing, together with the interesting deep learning algorithm end-to-end.
I am trying to extract numbers from a typical scoreboard that you would find at a high school gym. I have each number in a digital "alarm clock" font and have managed to perspective correct, threshold and extract a given digit from the video feed
Here's a sample of my template input
My problem is that no one classification method will accurately determine all digits 0-9. I have tried several methods
1) Tesseract OCR - this one consistently messes up on 4 and frequently returns weird results. Just using the command line version. If I actually try to train it on an "alarm clock" font, I get unknown character every time.
2) kNearest with OpenCV - I search a database consisting of my template images (0-9) and see which one is nearest. I frequently get confusion between 3/1 and 7/1
3) cvMatchShapes - this one is fairly bad, it usually can't tell the difference between 2 of the digits for each input digit
4) Tangent Distance - This one is the closest, but the smallest tangent distance between the input and my templates ends up mapping "7" to "1" every time
I'm really at a loss to get a classification algorithm for such a simple problem. I feel I have cleaned up the input fairly well and it's a fairly simple case for classification but I can't get anything reliable enough to actually use in practice. Any ideas about where to look for classification algorithms, or how to use them correctly would be appreciated. Am I not cleaning up the input? What about a better input database? I don't know what else I'd use for input, each digit and template looks spot on at this point.
The classical digit recognition, which should work well in this case is to crop the image just around the digit and resize it to 4x4 pixels.
A Discrete Cosine Transform (DCT) can be used to further slim down the search space. You could select the first 4-6 values.
With those values, train a classifier. SVM is a good one, readily available in OpenCV.
It is not as simple as emma's or martin suggestions, but it's more elegant and, I think, more robust.
Given the width/height ratio of your input, you may choose a different resolution, like 3x4. Choose the smallest one that retains readable digits.
Given the highly regular nature of your input, you could define a set of 7 target areas of the image to check. Each area should encompass some significant portion of one of the 7 segments of each digital of the display, but not overlap.
You can then check each area and average the color / brightness of the pixels in to to generate a probability for a given binary state. If your probability is high on all areas you can then easily figure out what the digit is.
It's not as elegant as a pure ML type algorithm, but ML is far more suited to inputs which are not regular, and in this case that does not seem to apply - so you trade elegance for accuracy.
Might sound silly but have you tried simply checking for black bars vertically and then horizontally in the top and bottom halfs - left and right of the centerline ?
If you are trying text recognition with Tesseract, try passing not one digit, but a number of duplicated digits, sometimes it could produce better results, here's the example.
However, if you're planning a business software, you may want to have a look at a commercial OCR SDK. For example, try ABBYY FineReader Engine. It's not affordable for free to use applications, but when it comes to business, it can a good value to your product. As far as i know, ABBYY provides the best OCR quality, for example check out http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
You want your scorecard image inputs S feeding an algorithm that maps them to {0,1,2,3,4,5,6,7,8,9}.
Let V denote the set of n-tuples of integers.
Construct an algorithm α that maps each image S to a n-tuple
(k1,k2,...,kn)
that can differentiate between two different scoreboard digits.
If you can specify the range of α then you only have to collect the vectors in V that correspond to a digit in order to solve the problem.
I've applied this idea using Martin Beckett's idea and it works. My initial attempt was a simple injection into a 2-tuple by vertical left-to-right summing, with the first integer a image column offset and the second integer was the length of a 'nice' vertical line.
This did not work - images for 6 and 8 would map to the same vectors. So I needed another mini-info-capture for my digit input types (they are not scoreboard) and a 3-tuple info vector does the trick.